Fix: robots.txt Missing AI Bot Directives
AI training crawlers like GPTBot (OpenAI), ClaudeBot (Anthropic), PerplexityBot, and Bytespider operate independently of Google. A robots.txt that only addresses User-agent: * and Googlebot does not explicitly address these crawlers. This guide covers both blocking and allowing them.
The Problem
Many sites either want to allow AI crawlers (for GEO — Generative Engine Optimisation, getting cited in AI answers) or block them (for content protection). A robots.txt without explicit AI crawler directives has ambiguous intent. Some AI crawlers respect User-agent: * rules. Others only act on explicit user-agent entries. Missing directives means inconsistent handling.
The Fix
User-agent: * Allow: / # Allow AI crawlers explicitly for GEO (AI search citation) User-agent: GPTBot Allow: / User-agent: ClaudeBot Allow: / User-agent: PerplexityBot Allow: / User-agent: Bytespider Allow: / User-agent: Google-Extended Allow: / Sitemap: https://yourdomain.com/sitemap.xml
Use Allow: / for AI crawlers you want to allow for AI search visibility, or Disallow: / to block them from training data. GPTBot is OpenAI's crawler. ClaudeBot is Anthropic's. PerplexityBot is used by Perplexity AI. Google-Extended controls Google's AI training separately from Googlebot.
Validate your robots.txt live — fetch any URL and get a corrected file in one click.
Open robots.txt Validator →