AI bots crawl your site thousands of times. How many visitors do they send back? The crawl-to-refer ratio measures the AI bargain, per engine. Definition, formula, benchmarks.
Last updated: June 2026.
Every site owner watching their logs in 2026 sees the same picture: AI bots everywhere. GPTBot reading the docs, ClaudeBot in the archives, PerplexityBot fetching last week's post. The reflexive question is "should I block them?" and the honest answer is: you cannot decide without a number.
This page defines that number. We call it the crawl-to-refer ratio, and we think it belongs in every analytics dashboard built after 2025.
Crawl-to-refer ratio = AI crawler requests received ÷ human visits referred back by that engine, over the same period.
It measures the bargain each AI engine offers you. A ratio of 50:1 means an engine made 50 crawl requests for every visitor it sent you. The lower the ratio, the better the deal. An engine that crawls heavily and refers nothing is extracting; an engine that crawls lightly and refers steadily is distributing.
The ratio only makes sense per engine, because the deal varies wildly between them, and that variance is exactly the decision-making information you need.
You need two numbers per engine, same time window (a month is right):
1. Crawl requests : from server or CDN logs, count requests by user agent family. Group them by company: OpenAI = GPTBot + OAI-SearchBot + ChatGPT-User; Anthropic = ClaudeBot + Claude-SearchBot + Claude-User; and so on (the full user agent table is here ↗).
2. Referred visits : from analytics, count sessions by AI referrer (chatgpt.com, perplexity.ai, claude.ai...) per engine ↗.
Example, one month:
OpenAI crawls: 14,200 requests | ChatGPT referrals: 310 visits → ratio 46:1
Perplexity crawls: 1,900 | Perplexity referrals: 95 → ratio 20:1
ByteDance crawls: 22,000 | referrals: 0 → ratio ∞ (pure extraction)
Two honest caveats. First, referrer-based counts undercount real influence (app clicks and post-answer brand searches carry no AI fingerprint), so your true ratios are better than measured. Second, training crawlers pay you back on a delay measured in model versions, not weeks. The ratio is a flashlight, not a courtroom.
Public data is young but directionally consistent, and the numbers are stark. Cloudflare's June 2026 data put the spread in black and white: Anthropic's ClaudeBot crawled about 11,122 pages for every single human visit it referred back (week of May 25 to June 1, 2026), and as high as ~24,000:1 across Q1, while OpenAI's GPTBot sat near 1,276:1. In the same window, bots accounted for 57.4% of web traffic to HTML content, with humans at 42.6%: the web now serves more machines than people, which is the whole reason this ratio matters. SEOmator's GEO data report and our own observations sketch the same hierarchy:
| Engine | Typical behavior | Deal quality |
|---|---|---|
| Perplexity | Light crawl, citation-driven referrals | Best ratio of the majors: citations are its product |
| OpenAI (search + user bots) | Heavy crawl, growing referrals since ChatGPT search and the May 2026 live-link update | Improving fast |
| Google (AI Overviews / Gemini) | Crawl bundled with Googlebot, referrals partly cannibalized from your own organic clicks | Hard to isolate, watch closely |
| Anthropic | Moderate crawl, modest but real referrals via Claude citations | Middling, improving |
| Common Crawl (CCBot) | Bulk crawl, zero direct referrals | Indirect only (feeds many labs) |
| ByteDance (Bytespider) | Very heavy crawl (doubled by May 2026), no referral mechanism | Worst deal on the table |
A site-level rule of thumb until standardized benchmarks exist: under 25:1 for a search-oriented engine is a good deal; over 200:1 with no trend toward improvement means that engine is a cost center, and a robots.txt decision is warranted.
Datalenk computes crawl-to-refer ratios per engine automatically, by pairing bot-level data with referral and revenue data in one place ↗. But the metric is bigger than any tool, including ours: if you build analytics, steal it. The web is renegotiating its deal with AI, and you cannot negotiate what you do not measure.
What is a crawl-to-refer ratio? The number of requests an AI engine's crawlers make to your site divided by the human visits that engine refers back, over the same period, per engine. It measures what each AI engine takes versus what it gives.
What is a good crawl-to-refer ratio? Directionally: under 25:1 for search-oriented engines is healthy, over 200:1 with no improvement is extraction. Standardized benchmarks do not exist yet; per-engine trends matter more than absolutes.
Why is my ratio so bad? Three usual causes: search bots blocked in robots.txt while training bots roam free, JavaScript-rendered content AI crawlers cannot read, or content that answers nothing quotable. Access problems first, content problems second.
Does blocking high-ratio bots hurt my AI visibility? Blocking training bots (GPTBot, ClaudeBot, Bytespider, CCBot) does not affect AI search citations. Blocking search and user-fetch bots (OAI-SearchBot, PerplexityBot, ChatGPT-User) absolutely does. Know which type you are blocking ↗.
Cookieless, EU-hosted analytics that ties every visit to real Stripe revenue. Free in beta.