Crawl-to-Refer Ratio: The Metric That Tells You If AI Is Worth It

Last updated: June 2026.

Every site owner watching their logs in 2026 sees the same picture: AI bots everywhere. GPTBot reading the docs, ClaudeBot in the archives, PerplexityBot fetching last week's post. The reflexive question is "should I block them?" and the honest answer is: you cannot decide without a number.

This page defines that number. We call it the crawl-to-refer ratio, and we think it belongs in every analytics dashboard built after 2025.

Definition

Crawl-to-refer ratio = AI crawler requests received ÷ human visits referred back by that engine, over the same period.

It measures the bargain each AI engine offers you. A ratio of 50:1 means an engine made 50 crawl requests for every visitor it sent you. The lower the ratio, the better the deal. An engine that crawls heavily and refers nothing is extracting; an engine that crawls lightly and refers steadily is distributing.

The ratio only makes sense per engine, because the deal varies wildly between them, and that variance is exactly the decision-making information you need.

How to calculate it

You need two numbers per engine, same time window (a month is right):

1. Crawl requests : from server or CDN logs, count requests by user agent family. Group them by company: OpenAI = GPTBot + OAI-SearchBot + ChatGPT-User; Anthropic = ClaudeBot + Claude-SearchBot + Claude-User; and so on (the full user agent table is here ↗).

2. Referred visits : from analytics, count sessions by AI referrer (chatgpt.com, perplexity.ai, claude.ai...) per engine ↗.

Example, one month:
OpenAI crawls: 14,200 requests  | ChatGPT referrals: 310 visits → ratio 46:1
Perplexity crawls: 1,900        | Perplexity referrals: 95     → ratio 20:1
ByteDance crawls: 22,000        | referrals: 0                 → ratio ∞ (pure extraction)

Two honest caveats. First, referrer-based counts undercount real influence (app clicks and post-answer brand searches carry no AI fingerprint), so your true ratios are better than measured. Second, training crawlers pay you back on a delay measured in model versions, not weeks. The ratio is a flashlight, not a courtroom.

Benchmarks: the takers and the givers

Public data is young but directionally consistent, and the numbers are stark. Cloudflare's June 2026 data put the spread in black and white: Anthropic's ClaudeBot crawled about 11,122 pages for every single human visit it referred back (week of May 25 to June 1, 2026), and as high as ~24,000:1 across Q1, while OpenAI's GPTBot sat near 1,276:1. In the same window, bots accounted for 57.4% of web traffic to HTML content, with humans at 42.6%: the web now serves more machines than people, which is the whole reason this ratio matters. SEOmator's GEO data report and our own observations sketch the same hierarchy:

Engine	Typical behavior	Deal quality
Perplexity	Light crawl, citation-driven referrals	Best ratio of the majors: citations are its product
OpenAI (search + user bots)	Heavy crawl, growing referrals since ChatGPT search and the May 2026 live-link update	Improving fast
Google (AI Overviews / Gemini)	Crawl bundled with Googlebot, referrals partly cannibalized from your own organic clicks	Hard to isolate, watch closely
Anthropic	Moderate crawl, modest but real referrals via Claude citations	Middling, improving
Common Crawl (CCBot)	Bulk crawl, zero direct referrals	Indirect only (feeds many labs)
ByteDance (Bytespider)	Very heavy crawl (doubled by May 2026), no referral mechanism	Worst deal on the table

A site-level rule of thumb until standardized benchmarks exist: under 25:1 for a search-oriented engine is a good deal; over 200:1 with no trend toward improvement means that engine is a cost center, and a robots.txt decision is warranted.

What to do with your ratio

Good ratio (engine refers meaningfully) : let everything in, and invest in citable content for that engine; it is buying distribution with your bandwidth, cheaply.
Bad ratio on a search/user engine : usually a visibility problem on your side rather than malice: blocked search bots, client-rendered content the bots cannot read, or pages that are not quotable. Fix access first , content shape second.
Bad ratio on a training-only crawler : a values-and-bandwidth decision. Blocking GPTBot does not touch your ChatGPT search visibility; blocking Bytespider costs you approximately nothing today.
Track the trend, not the snapshot. Engines change their referral behavior in steps (ChatGPT search launch, the May 2026 live links). A taker can become a giver in one product release, which is why this belongs in a dashboard rather than a quarterly log dive.

Datalenk computes crawl-to-refer ratios per engine automatically, by pairing bot-level data with referral and revenue data in one place ↗. But the metric is bigger than any tool, including ours: if you build analytics, steal it. The web is renegotiating its deal with AI, and you cannot negotiate what you do not measure.

FAQ

What is a crawl-to-refer ratio? The number of requests an AI engine's crawlers make to your site divided by the human visits that engine refers back, over the same period, per engine. It measures what each AI engine takes versus what it gives.

What is a good crawl-to-refer ratio? Directionally: under 25:1 for search-oriented engines is healthy, over 200:1 with no improvement is extraction. Standardized benchmarks do not exist yet; per-engine trends matter more than absolutes.

Why is my ratio so bad? Three usual causes: search bots blocked in robots.txt while training bots roam free, JavaScript-rendered content AI crawlers cannot read, or content that answers nothing quotable. Access problems first, content problems second.

Does blocking high-ratio bots hurt my AI visibility? Blocking training bots (GPTBot, ClaudeBot, Bytespider, CCBot) does not affect AI search citations. Blocking search and user-fetch bots (OAI-SearchBot, PerplexityBot, ChatGPT-User) absolutely does. Know which type you are blocking ↗.

Crawl-to-Refer Ratio: The One Metric That Tells You Whether AI Is Worth It to Your Site

Definition

How to calculate it

Benchmarks: the takers and the givers

What to do with your ratio

FAQ

Measure the money,
not the pageviews

Definition

How to calculate it

Benchmarks: the takers and the givers

What to do with your ratio

FAQ

Measure the money,not the pageviews

Measure the money,
not the pageviews