HardalAI Visibility

Hardal · First-Party Analytics · June 2026

AI Visibility Playbook

AI is crawling your site. GA4 won't tell you.

A marketer's playbook for measuring your real AI traffic, with the tools you already run.

~/var/log/access.loglive
200GET/pricingClaudeBot/1.0Training
200GET/blog/server-side-trackingGPTBot/1.2Training
200GET/docs/installPerplexityBotSearch index
200GET/product/analyticsBytespiderTraining
304GET/customersOAI-SearchBotSearch index
200GET/ChatGPT-UserUser fetch

last 24h·23,951 bot requests·1 human referred

Server-side crawler hits. Your client-side analytics never sees them.

AI Is Already On Your Site

AI is doing two things to your site right now.

Your Input

It's reading your content.

Crawlers pull your pages to train models and answer queries. Nearly all of them skip your JavaScript entirely.

Your Result

It's sending you visitors.

A growing stream of visitors arrives from AI products, and early data shows they convert better than almost any other channel. Your analytics is blind to nearly all of it.

Two Questions Your Analytics Can't Answer

Build a baseline from what you already have.

GA4's new AI Assistant channel is a start, but it misses every AI crawler and the vast majority of human traffic. This playbook gives you a first look at your real AI visibility, using the tools you already have.

01

Is AI sending you visitors, and are they converting?

A quick look via GA4.

02

Are AI engines reading your content?

A manual check of your server logs.

The Numbers You're Missing

The numbers are already in.

0.0%

of HTML web traffic is now automated. The first machine majority.

Cloudflare Radar · Jun 2026

0.0%

of verified bot traffic is AI crawlers. Roughly 27% counting AI-search.

Cloudflare Radar · May 2026

~Half

of AI crawler requests are for training, not a live user query.

Cloudflare · 2026

+0%

growth in US retail traffic from AI, converting 42% better than other channels.

Adobe · Q1 2026

The crawling is huge. The traffic is tiny.

ClaudeBot reads 23,951 of your pages for every visitor it sends you. That crawling is server-side, so your analytics never sees it. Only the lone visitor shows up.

Pages crawled for every visitor referred

ClaudeBot23,951:1
GPTBot1,276:1
Perplexity111:1
Google Search4.9:1
Google search sends a visitor every ~5 pages indexed. AI crawlers take thousands. (SEOmator Q1 2026, reading Cloudflare Radar through mid-March. Ratios move fast.)

The gap you're flying blind through

The Bots

Training models devour your content and send zero traffic back.

The Humans

Visitors arrive from AI chats, convert well, then vanish into your Direct traffic.

The Blind Spot

Client-side analytics misses the first and badly undercounts the second.

CHAPTER01

GA4 AI Assistant Channel

Is AI sending you visitors?

// the easy one. Start client-side, with a channel GA4 already gives you.

See the visitors AI sends you

Since May 2026, GA4 labels AI-assistant clicks for you. No custom filters, no regex hacks. When someone clicks in from a supported AI chatbot, GA4 tags:

CHANNEL GROUPAI Assistant
MEDIUMai-assistant
CAMPAIGN(ai-assistant)

AI visitors convert 42% better than every other channel (Adobe), and the volume is growing fast. Set this up now, while the signal is small enough to act on.

GA4, step by step

  1. 1

    Open Reports → Acquisition → Traffic acquisition and read the AI Assistant row in the Default Channel Group.

  2. 2

    Add your key events as a column or comparison to see whether those visits convert.

  3. 3

    For a focused view, build an Exploration filtered to Session medium = ai-assistant with a conversion metric.

GA4 menu labels change. The dimension values, ai-assistant, are the stable anchors. Use those.

What GA4 shows, what it hides

What to look for once it's running

  • Which pages AI visitors land on first.
  • AI products judged these relevant enough to send people to.
  • Which sources convert. ChatGPT traffic can behave nothing like Perplexity.
  • Whether AI traffic is growing month over month, and outpacing organic.
  • Pages with high AI referral but low conversion. A fixable content or landing problem.

What it cannot see

  • ChatGPT and Claude apps send no referrer, so those visits land in Direct.
  • Only the assistants Google recognizes, so long-tail AI tools stay invisible.
  • Human clicks only. Crawlers never appear here.

The GA4 referrer gap

GA4's AI channel shows you a fraction of the real number. GA4 only captures AI visits that arrive with a referrer header intact. In practice, most don't.

0.0%

of AI-referred traffic lands in Direct. No channel, no source, no conversion data.

Analysis of 446,000 sessions · Loamly, 2026

0%

of what GA4 calls Direct is actually AI-referred traffic.

200-site study · Attrifast, May 2026

GA4's AI Assistant channel is a floor. The real number is materially higher.

GA4 Traffic acquisition filtered to chatgpt — AI Assistant, Referral, Unassigned, and Organic Search channels with session counts

Don't wait on Google's list

Match these referrer hostnames in your own channel group to catch custom AI traffic:

chatgpt.comchat.openai.comperplexity.aigemini.google.comclaude.aicopilot.microsoft.comdeepseek.comgrok.comyou.compoe.comphind.commeta.ai
Bing and DuckDuckGo blend AI answers with regular search results, so you cannot isolate AI-only traffic from them by hostname. A custom channel group re-buckets referral data GA4 already holds. It does not surface new traffic.
CHAPTER02

Server-Side

Are crawlers reading your content?

// the part GA4 will never show you. This is your input to the AI engines.

The blind spot

Crawlers don't run JavaScript. Client-side analytics has never recorded a single one. Seeing them means going server-side: your logs, or your log platform.

One important distinction: most AI crawling is training, not live queries, so heavy crawl volume does not mean heavy referral traffic. The two signals are separate, and both matter.

Server-side · raw logs

See which AI bots crawl you, and the pages they read most.

Bot volume by user-agent · swap in your log path

for ua in GPTBot OAI-SearchBot ChatGPT-User ClaudeBot Claude-User \
Claude-SearchBot anthropic-ai PerplexityBot Perplexity-User CCBot \
Bytespider Amazonbot Applebot Meta-ExternalAgent cohere-ai Diffbot \
YouBot PetalBot; do
  count=$(grep -aic "$ua" access.log)
  [ "$count" -gt 0 ] && printf "%-22s %s\n" "$ua" "$count"
done | sort -k2 -rn
bash

Which pages one bot crawls most

grep -ai "GPTBot" access.log | awk '{print $7}' \
| sort | uniq -c | sort -rn | head -20
bash
For rotated .gz files, use zgrep in place of grep.

Which bots, how often

Heavy GPTBot but no ChatGPT-Usermeans OpenAI is training on you, but users aren't asking about you yet.

Pages worth reading

Your most-crawled URLs are what AI treats as relevant. Your most valuable pages in the AI layer, whether you knew it or not.

Crawl frequency

Run it weekly. Pages climbing the list are gaining AI attention; pages dropping are losing it.

Gaps

Key product or category pages missing from the list aren't in contention to be cited. That's fixable.

Server-side · your log platform

Your analyst will know the syntax for your setup. These are starting points.

sourcetype=access_combined | regex useragent="GPTBot|ClaudeBot|PerplexityBot|ChatGPT-User|OAI-SearchBot|CCBot|Bytespider" | timechart count by useragent
Cloudflare, Vercel, and Fastly read the same pattern from their streamed logs.

Know which bot is doing what

A bot's user-agent tells you its job. The three jobs are different:

user-agents.registry
GPTBot
OpenAI
Training
ClaudeBot
Anthropic
Training
Bytespider
ByteDance
Training
CCBot
Common Crawl
Training
OAI-SearchBot
OpenAI
Search index
Claude-SearchBot
Anthropic
Search index
PerplexityBot
Perplexity
Search index
YouBot
You.com
Search index
ChatGPT-User
OpenAI
User fetch
Claude-User
Anthropic
User fetch
Perplexity-User
Perplexity
User fetch

Other general crawlers / search

AmazonbotApplebotMeta-ExternalAgentcohere-aiDiffbotPetalBotanthropic-ai
Google-Extended and Applebot-Extended are robots.txtcontrol tokens, not user-agents. Don't grep for them. This list changes often.

Before you trust the count

User-agents can be spoofed. Any server can claim to be GPTBot or PerplexityBot. Without cross-referencing the providers' published IP ranges, a raw count overstates real crawler activity and inflates your read on AI visibility. Most commonly spoofed in the wild: Bytespider and PerplexityBot.

“Can we validate these bot hits against the published IP ranges from OpenAI, Anthropic, and Perplexity, so the count reflects real crawlers and not spoofed agents?”
A question to raise with your team

Stop Doing This By Hand

You have seen what it takes to do this yourself.

  • A different query for every log platform.
  • A user-agent list that breaks every time a new bot ships.
  • IP ranges to reverse-check so spoofed bots don't inflate your numbers.
  • Two reports, one for humans and one for bots, that never quite talk to each other.

Hardal does all of it for you, automatically, in one place.

Where this is heading

GA4 learned to count visitors from AI assistants. That's the floor, not the ceiling. AI agents that browse, compare, and act on someone's behalf already exist. Early projects are live. The mainstream wave is still building.

They don't run JavaScript. They don't leave a referrer. Client-side tools aren't built to catch them. Measuring the agentic web means measuring it server-side. We're not waiting for the problem to get worse.

Hardal AI Visibility

One report. Bots and humans. Always current. Add Hardal once. Your AI traffic, crawlers and visitors, lands in one report that updates continuously as new bots ship.

Hardal Agent Breakdown — top AI agents visiting your site with traffic share and visit counts

Without Hardal

  • Per-platform log queries
  • Manual UA list upkeep
  • Separate bot and human reports
  • Spoofed bots inflate the count

With Hardal

  • One instrumentation, all sources
  • Kept current automatically
  • One unified AI traffic view
  • UA + IP intelligence, validated

Request Early Access

See your own AI traffic.

AI Visibility is in early access. Run Hardal on your site. See your crawl traffic, the visitors AI sends you, and how both trend, in one report, updated automatically.

No spam. We'll only contact you about early access.

You have the playbook, so you are already ahead. Early-access requests from playbook readers go to the front of the queue. 🌭