Hardal · First-Party Analytics · June 2026
AI Visibility Playbook
AI is crawling your site. GA4 won't tell you.
A marketer's playbook for measuring your real AI traffic, with the tools you already run.
last 24h·23,951 bot requests·1 human referred
Server-side crawler hits. Your client-side analytics never sees them.
AI Is Already On Your Site
AI is doing two things to your site right now.
Your Input
It's reading your content.
Crawlers pull your pages to train models and answer queries. Nearly all of them skip your JavaScript entirely.
Your Result
It's sending you visitors.
A growing stream of visitors arrives from AI products, and early data shows they convert better than almost any other channel. Your analytics is blind to nearly all of it.
Two Questions Your Analytics Can't Answer
Build a baseline from what you already have.
GA4's new AI Assistant channel is a start, but it misses every AI crawler and the vast majority of human traffic. This playbook gives you a first look at your real AI visibility, using the tools you already have.
Is AI sending you visitors, and are they converting?
A quick look via GA4.
Are AI engines reading your content?
A manual check of your server logs.
The Numbers You're Missing
The numbers are already in.
of HTML web traffic is now automated. The first machine majority.
Cloudflare Radar · Jun 2026
of verified bot traffic is AI crawlers. Roughly 27% counting AI-search.
Cloudflare Radar · May 2026
of AI crawler requests are for training, not a live user query.
Cloudflare · 2026
growth in US retail traffic from AI, converting 42% better than other channels.
Adobe · Q1 2026
The crawling is huge. The traffic is tiny.
ClaudeBot reads 23,951 of your pages for every visitor it sends you. That crawling is server-side, so your analytics never sees it. Only the lone visitor shows up.
Pages crawled for every visitor referred
The gap you're flying blind through
The Bots
Training models devour your content and send zero traffic back.
The Humans
Visitors arrive from AI chats, convert well, then vanish into your Direct traffic.
The Blind Spot
Client-side analytics misses the first and badly undercounts the second.
GA4 AI Assistant Channel
Is AI sending you visitors?
// the easy one. Start client-side, with a channel GA4 already gives you.
See the visitors AI sends you
Since May 2026, GA4 labels AI-assistant clicks for you. No custom filters, no regex hacks. When someone clicks in from a supported AI chatbot, GA4 tags:
AI visitors convert 42% better than every other channel (Adobe), and the volume is growing fast. Set this up now, while the signal is small enough to act on.
GA4, step by step
- 1
Open Reports → Acquisition → Traffic acquisition and read the
AI Assistantrow in the Default Channel Group. - 2
Add your key events as a column or comparison to see whether those visits convert.
- 3
For a focused view, build an Exploration filtered to
Session medium = ai-assistantwith a conversion metric.
ai-assistant, are the stable anchors. Use those.What GA4 shows, what it hides
What to look for once it's running
- Which pages AI visitors land on first.
- AI products judged these relevant enough to send people to.
- Which sources convert. ChatGPT traffic can behave nothing like Perplexity.
- Whether AI traffic is growing month over month, and outpacing organic.
- Pages with high AI referral but low conversion. A fixable content or landing problem.
What it cannot see
- ChatGPT and Claude apps send no referrer, so those visits land in Direct.
- Only the assistants Google recognizes, so long-tail AI tools stay invisible.
- Human clicks only. Crawlers never appear here.
The GA4 referrer gap
GA4's AI channel shows you a fraction of the real number. GA4 only captures AI visits that arrive with a referrer header intact. In practice, most don't.
of AI-referred traffic lands in Direct. No channel, no source, no conversion data.
Analysis of 446,000 sessions · Loamly, 2026
of what GA4 calls Direct is actually AI-referred traffic.
200-site study · Attrifast, May 2026
GA4's AI Assistant channel is a floor. The real number is materially higher.

Don't wait on Google's list
Match these referrer hostnames in your own channel group to catch custom AI traffic:
Server-Side
Are crawlers reading your content?
// the part GA4 will never show you. This is your input to the AI engines.
The blind spot
Crawlers don't run JavaScript. Client-side analytics has never recorded a single one. Seeing them means going server-side: your logs, or your log platform.
One important distinction: most AI crawling is training, not live queries, so heavy crawl volume does not mean heavy referral traffic. The two signals are separate, and both matter.
Server-side · raw logs
See which AI bots crawl you, and the pages they read most.
Bot volume by user-agent · swap in your log path
for ua in GPTBot OAI-SearchBot ChatGPT-User ClaudeBot Claude-User \
Claude-SearchBot anthropic-ai PerplexityBot Perplexity-User CCBot \
Bytespider Amazonbot Applebot Meta-ExternalAgent cohere-ai Diffbot \
YouBot PetalBot; do
count=$(grep -aic "$ua" access.log)
[ "$count" -gt 0 ] && printf "%-22s %s\n" "$ua" "$count"
done | sort -k2 -rnWhich pages one bot crawls most
grep -ai "GPTBot" access.log | awk '{print $7}' \
| sort | uniq -c | sort -rn | head -20.gz files, use zgrep in place of grep.Which bots, how often
Heavy GPTBot but no ChatGPT-Usermeans OpenAI is training on you, but users aren't asking about you yet.
Pages worth reading
Your most-crawled URLs are what AI treats as relevant. Your most valuable pages in the AI layer, whether you knew it or not.
Crawl frequency
Run it weekly. Pages climbing the list are gaining AI attention; pages dropping are losing it.
Gaps
Key product or category pages missing from the list aren't in contention to be cited. That's fixable.
Server-side · your log platform
Your analyst will know the syntax for your setup. These are starting points.
sourcetype=access_combined | regex useragent="GPTBot|ClaudeBot|PerplexityBot|ChatGPT-User|OAI-SearchBot|CCBot|Bytespider" | timechart count by useragentKnow which bot is doing what
A bot's user-agent tells you its job. The three jobs are different:
Other general crawlers / search
Google-Extended and Applebot-Extended are robots.txtcontrol tokens, not user-agents. Don't grep for them. This list changes often.Before you trust the count
User-agents can be spoofed. Any server can claim to be GPTBot or PerplexityBot. Without cross-referencing the providers' published IP ranges, a raw count overstates real crawler activity and inflates your read on AI visibility. Most commonly spoofed in the wild: Bytespider and PerplexityBot.
“Can we validate these bot hits against the published IP ranges from OpenAI, Anthropic, and Perplexity, so the count reflects real crawlers and not spoofed agents?”
Stop Doing This By Hand
You have seen what it takes to do this yourself.
- A different query for every log platform.
- A user-agent list that breaks every time a new bot ships.
- IP ranges to reverse-check so spoofed bots don't inflate your numbers.
- Two reports, one for humans and one for bots, that never quite talk to each other.
Hardal does all of it for you, automatically, in one place.
Where this is heading
GA4 learned to count visitors from AI assistants. That's the floor, not the ceiling. AI agents that browse, compare, and act on someone's behalf already exist. Early projects are live. The mainstream wave is still building.
They don't run JavaScript. They don't leave a referrer. Client-side tools aren't built to catch them. Measuring the agentic web means measuring it server-side. We're not waiting for the problem to get worse.
Hardal AI Visibility
One report. Bots and humans. Always current. Add Hardal once. Your AI traffic, crawlers and visitors, lands in one report that updates continuously as new bots ship.

Without Hardal
- Per-platform log queries
- Manual UA list upkeep
- Separate bot and human reports
- Spoofed bots inflate the count
With Hardal
- One instrumentation, all sources
- Kept current automatically
- One unified AI traffic view
- UA + IP intelligence, validated
Request Early Access
See your own AI traffic.
AI Visibility is in early access. Run Hardal on your site. See your crawl traffic, the visitors AI sends you, and how both trend, in one report, updated automatically.
You have the playbook, so you are already ahead. Early-access requests from playbook readers go to the front of the queue. 🌭