robots.txt for AI crawlers: GPTBot, ClaudeBot & more
AI answer engines crawl the web with named bots — GPTBot and OAI-SearchBot (OpenAI), ClaudeBot (Anthropic), PerplexityBot (Perplexity), Google-Extended (Google's AI training/grounding control), and others — and they honor robots.txt. To be eligible for citation in AI answers you must allow these user-agents; to opt out of AI training or grounding you can disallow them specifically. The most common mistake is unintentionally blocking them (or your sitemap) and then wondering why you're never cited.
the AI crawlers you should know
Each engine uses one or more named user-agents. Knowing them lets you make a deliberate choice per engine instead of an accidental blanket block.
- GPTBot — OpenAI's training/crawl bot.
- OAI-SearchBot — OpenAI's bot for surfacing sites in ChatGPT search results.
- ChatGPT-User — fetches a page when a user asks ChatGPT about it live.
- ClaudeBot — Anthropic's crawler (also Claude-Web / anthropic-ai historically).
- PerplexityBot — Perplexity's indexing crawler.
- Google-Extended — Google's token to control AI training/grounding (separate from Googlebot).
allow them explicitly to be citable
If you want AI answer engines to cite you, allow these bots. A simple, permissive robots.txt that allows all user-agents already covers them, but being explicit documents your intent and avoids surprises if you later tighten rules.
Critically, don't block them by accident. A disallow rule meant for one path, an overly broad pattern, or a wildcard left over from a migration can quietly remove you from every AI answer. After any robots.txt change, verify it.
- Allow GPTBot, OAI-SearchBot, ClaudeBot, PerplexityBot, and Google-Extended to be eligible for AI citations.
- Keep a `User-agent: *` baseline with `Allow: /` unless you have a reason not to.
- Always reference your sitemap with a `Sitemap:` line.
or opt out deliberately
If you don't want your content used for AI training or answers, disallow the specific bots — `User-agent: GPTBot` then `Disallow: /`, and the same for the others. Note that blocking Google-Extended opts you out of Google's AI features without affecting normal Googlebot ranking, so you can keep classic search while opting out of AI.
seocheck reads your robots.txt as part of its audit, so you can confirm AI crawlers aren't blocked — or that your opt-out is actually in effect.
Check your own page
seocheck scores your on-page SEO, GEO (AI answer engines), and sitemap health in seconds — free, no account.
FAQ
- if I block GPTBot, do I lose normal Google ranking?
- No. GPTBot is OpenAI's bot. Google's regular crawler is Googlebot, and its AI-specific control is Google-Extended. Blocking GPTBot only affects OpenAI; blocking Google-Extended only affects Google's AI features, not standard ranking.
- do all AI crawlers actually respect robots.txt?
- The major, named ones (GPTBot, ClaudeBot, PerplexityBot, Google-Extended) publicly state they honor robots.txt. Some smaller or non-compliant scrapers may not, and for those robots.txt is advisory only — network-level blocking is the harder control.
- how do I check my robots.txt is set up right?
- Fetch yoursite.com/robots.txt and confirm the AI bots aren't disallowed and your sitemap is referenced. seocheck checks robots.txt and sitemap discovery automatically when you run an audit.