AI Crawler Directory — Every Bot, What It Does, How to Allow It
Complete directory of every known AI crawler: GPTBot, ClaudeBot, PerplexityBot, Google-Extended, and more. Includes what each crawler does, which platform it serves, and recommended robots.txt configuration.
There are currently 12 known AI crawlers operated by OpenAI, Anthropic, Google, Perplexity, Meta, Apple, Amazon, ByteDance, and Common Crawl. Each crawler serves a different purpose — from training AI models to powering real-time search. Blocking the wrong crawler can remove your site from AI search results entirely. This directory covers every crawler, what it does, and whether you should allow it.
Complete AI crawler directory
| Crawler | Platform | Purpose | robots.txt name | Recommended |
|---|---|---|---|---|
| GPTBot | OpenAI | Training data collection for GPT models | GPTBot | Allow (with caution) |
| OAI-SearchBot | OpenAI | Real-time search indexing for ChatGPT search | OAI-SearchBot | Allow |
| ChatGPT-User | OpenAI | Fetches pages when users share URLs in ChatGPT or use custom GPTs | ChatGPT-User | Allow |
| ClaudeBot | Anthropic | Training data and web retrieval for Claude | ClaudeBot | Allow |
| PerplexityBot | Perplexity | Indexing for Perplexity AI search | PerplexityBot | Allow |
| Google-Extended | Training data for Gemini and AI Overviews | Google-Extended | Allow | |
| GoogleOther | Additional crawling for Google AI features and research | GoogleOther | Allow | |
| CCBot | Common Crawl | Open training data used by many AI models | CCBot | Optional |
| Bytespider | ByteDance / TikTok | Training data for ByteDance AI models | Bytespider | Optional |
| Amazonbot | Amazon | Indexing for Alexa answers and Rufus shopping AI | Amazonbot | Allow (ecommerce) |
| FacebookBot | Meta | Crawling for Meta AI features and link previews | FacebookBot | Allow |
| AppleBot-Extended | Apple | Training data for Siri and Apple Intelligence features | Applebot-Extended | Allow |
Understanding the three types of AI crawler
Training crawlers
These crawlers collect content to train AI models. Blocking them prevents your content from being included in future model training but does not affect current AI search results. Examples: GPTBot, Google-Extended, CCBot, Bytespider.
Search indexing crawlers
These crawlers index your content for real-time AI search. Blocking them removes your site from that platform's AI search results entirely. Examples: OAI-SearchBot, PerplexityBot, Amazonbot.
User-triggered crawlers
These crawlers fetch pages when a user specifically requests it (e.g., sharing a URL in ChatGPT). Blocking them prevents the AI from reading pages users share. Example: ChatGPT-User.
Recommended robots.txt configuration
The following configuration allows all AI search crawlers (so your content appears in AI search results) while optionally blocking training-only crawlers if you prefer not to contribute to model training.
Option 1: Allow all AI crawlers (recommended)
# AI Search Crawlers — Allow All
# This ensures maximum AI search visibility
User-agent: GPTBot
Allow: /
User-agent: OAI-SearchBot
Allow: /
User-agent: ChatGPT-User
Allow: /
User-agent: ClaudeBot
Allow: /
User-agent: PerplexityBot
Allow: /
User-agent: Google-Extended
Allow: /
User-agent: GoogleOther
Allow: /
User-agent: Amazonbot
Allow: /
User-agent: FacebookBot
Allow: /
User-agent: Applebot-Extended
Allow: /
Option 2: Allow search crawlers, block training-only crawlers
# AI Search Crawlers — Allow search, block training
# Maintains AI search visibility while limiting training use
# ALLOW — these power AI search results
User-agent: OAI-SearchBot
Allow: /
User-agent: ChatGPT-User
Allow: /
User-agent: PerplexityBot
Allow: /
User-agent: Amazonbot
Allow: /
User-agent: Google-Extended
Allow: /
User-agent: FacebookBot
Allow: /
# BLOCK — these are training-only
User-agent: GPTBot
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: Bytespider
Disallow: /
Key decisions and trade-offs
Should you block GPTBot?
GPTBot collects training data for future GPT models. Blocking it will not remove you from current ChatGPT search results (that is OAI-SearchBot). However, allowing GPTBot means your content may influence how future GPT models understand your industry and brand — which could be beneficial for long-term AI visibility. Most AI search agencies recommend allowing GPTBot.
Does blocking Google-Extended affect Google Search?
No. Google-Extended is separate from Googlebot. Blocking Google-Extended will prevent your content from being used by Gemini and AI Overviews training but will not affect your Google Search rankings. However, you will lose potential visibility in Gemini responses.
What about CCBot and Common Crawl?
Common Crawl is an open dataset used to train many AI models. Blocking CCBot prevents your content from appearing in the Common Crawl dataset. The trade-off: many smaller AI models and research projects use Common Crawl data, so blocking it reduces your content's reach across the broader AI ecosystem.
How to check which AI crawlers are visiting your site
Check your server access logs for these user-agent strings. Most hosting platforms and CDNs (Cloudflare, Vercel, Netlify) provide bot traffic reports that can identify AI crawler visits. If you use Google Search Console, note that it only reports on Googlebot and Google-Extended — not third-party AI crawlers.
For more on configuring your site for AI crawlers, see our guide on robots.txt configuration for AI search.
Frequently asked questions
How many AI crawlers are there?
There are currently 12 known AI crawlers from major platforms: three from OpenAI (GPTBot, OAI-SearchBot, ChatGPT-User), two from Google (Google-Extended, GoogleOther), and one each from Anthropic, Perplexity, Amazon, Meta, Apple, ByteDance, and Common Crawl. New crawlers are announced periodically as AI platforms expand.
Will blocking AI crawlers hurt my SEO?
Blocking AI crawlers does not directly affect traditional SEO rankings. Google has confirmed that blocking Google-Extended does not impact Google Search rankings. However, blocking AI search crawlers (like OAI-SearchBot or PerplexityBot) will prevent your content from appearing in those AI platforms' search results — which is an increasingly important traffic source.
Should I allow all AI crawlers?
For maximum AI search visibility, yes. Allowing all 12 crawlers ensures your content can appear across ChatGPT, Gemini, Perplexity, Claude, and other AI platforms. The only reason to block specific crawlers is if you have concerns about your content being used for AI model training — in which case, block training crawlers (GPTBot, CCBot, Bytespider) while allowing search crawlers (OAI-SearchBot, PerplexityBot).
Oliver Mackman
AI Search Analyst, SEOCompare
Oliver leads SEOCompare's editorial and comparison research. With over a decade in digital marketing, he oversees agency evaluation, tool testing, and AI search data analysis.
Last reviewed: 7 April 2026
Need help with AI search visibility?
Get a free AI visibility audit to see how your business appears across ChatGPT, Gemini, Perplexity, and AI Overviews.
Request your free audit