There are currently 12 known AI crawlers operated by OpenAI, Anthropic, Google, Perplexity, Meta, Apple, Amazon, ByteDance, and Common Crawl. Each crawler serves a different purpose — from training AI models to powering real-time search. Blocking the wrong crawler can remove your site from AI search results entirely. This directory covers every crawler, what it does, and whether you should allow it.

Complete AI crawler directory

Crawler	Platform	Purpose	robots.txt name	Recommended
GPTBot	OpenAI	Training data collection for GPT models	`GPTBot`	Allow (with caution)
OAI-SearchBot	OpenAI	Real-time search indexing for ChatGPT search	`OAI-SearchBot`	Allow
ChatGPT-User	OpenAI	Fetches pages when users share URLs in ChatGPT or use custom GPTs	`ChatGPT-User`	Allow
ClaudeBot	Anthropic	Training data and web retrieval for Claude	`ClaudeBot`	Allow
PerplexityBot	Perplexity	Indexing for Perplexity AI search	`PerplexityBot`	Allow
Google-Extended	Google	Training data for Gemini and AI Overviews	`Google-Extended`	Allow
GoogleOther	Google	Additional crawling for Google AI features and research	`GoogleOther`	Allow
CCBot	Common Crawl	Open training data used by many AI models	`CCBot`	Optional
Bytespider	ByteDance / TikTok	Training data for ByteDance AI models	`Bytespider`	Optional
Amazonbot	Amazon	Indexing for Alexa answers and Rufus shopping AI	`Amazonbot`	Allow (ecommerce)
FacebookBot	Meta	Crawling for Meta AI features and link previews	`FacebookBot`	Allow
AppleBot-Extended	Apple	Training data for Siri and Apple Intelligence features	`Applebot-Extended`	Allow

Understanding the three types of AI crawler

Training crawlers

These crawlers collect content to train AI models. Blocking them prevents your content from being included in future model training but does not affect current AI search results. Examples: GPTBot, Google-Extended, CCBot, Bytespider.

Search indexing crawlers

These crawlers index your content for real-time AI search. Blocking them removes your site from that platform's AI search results entirely. Examples: OAI-SearchBot, PerplexityBot, Amazonbot.

User-triggered crawlers

These crawlers fetch pages when a user specifically requests it (e.g., sharing a URL in ChatGPT). Blocking them prevents the AI from reading pages users share. Example: ChatGPT-User.

Recommended robots.txt configuration

The following configuration allows all AI search crawlers (so your content appears in AI search results) while optionally blocking training-only crawlers if you prefer not to contribute to model training.

Option 1: Allow all AI crawlers (recommended)

# AI Search Crawlers — Allow All
# This ensures maximum AI search visibility

User-agent: GPTBot
Allow: /

User-agent: OAI-SearchBot
Allow: /

User-agent: ChatGPT-User
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Google-Extended
Allow: /

User-agent: GoogleOther
Allow: /

User-agent: Amazonbot
Allow: /

User-agent: FacebookBot
Allow: /

User-agent: Applebot-Extended
Allow: /

Option 2: Allow search crawlers, block training-only crawlers

# AI Search Crawlers — Allow search, block training
# Maintains AI search visibility while limiting training use

# ALLOW — these power AI search results
User-agent: OAI-SearchBot
Allow: /

User-agent: ChatGPT-User
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Amazonbot
Allow: /

User-agent: Google-Extended
Allow: /

User-agent: FacebookBot
Allow: /

# BLOCK — these are training-only
User-agent: GPTBot
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Bytespider
Disallow: /

Key decisions and trade-offs

Should you block GPTBot?

GPTBot collects training data for future GPT models. Blocking it will not remove you from current ChatGPT search results (that is OAI-SearchBot). However, allowing GPTBot means your content may influence how future GPT models understand your industry and brand — which could be beneficial for long-term AI visibility. Most AI search agencies recommend allowing GPTBot.

Does blocking Google-Extended affect Google Search?

No. Google-Extended is separate from Googlebot. Blocking Google-Extended will prevent your content from being used by Gemini and AI Overviews training but will not affect your Google Search rankings. However, you will lose potential visibility in Gemini responses.

What about CCBot and Common Crawl?

Common Crawl is an open dataset used to train many AI models. Blocking CCBot prevents your content from appearing in the Common Crawl dataset. The trade-off: many smaller AI models and research projects use Common Crawl data, so blocking it reduces your content's reach across the broader AI ecosystem.

How to check which AI crawlers are visiting your site

Check your server access logs for these user-agent strings. Most hosting platforms and CDNs (Cloudflare, Vercel, Netlify) provide bot traffic reports that can identify AI crawler visits. If you use Google Search Console, note that it only reports on Googlebot and Google-Extended — not third-party AI crawlers.

For more on configuring your site for AI crawlers, see our guide on robots.txt configuration for AI search.

Frequently asked questions

How many AI crawlers are there?

There are currently 12 known AI crawlers from major platforms: three from OpenAI (GPTBot, OAI-SearchBot, ChatGPT-User), two from Google (Google-Extended, GoogleOther), and one each from Anthropic, Perplexity, Amazon, Meta, Apple, ByteDance, and Common Crawl. New crawlers are announced periodically as AI platforms expand.

Will blocking AI crawlers hurt my SEO?

Blocking AI crawlers does not directly affect traditional SEO rankings. Google has confirmed that blocking Google-Extended does not impact Google Search rankings. However, blocking AI search crawlers (like OAI-SearchBot or PerplexityBot) will prevent your content from appearing in those AI platforms' search results — which is an increasingly important traffic source.

Should I allow all AI crawlers?

For maximum AI search visibility, yes. Allowing all 12 crawlers ensures your content can appear across ChatGPT, Gemini, Perplexity, Claude, and other AI platforms. The only reason to block specific crawlers is if you have concerns about your content being used for AI model training — in which case, block training crawlers (GPTBot, CCBot, Bytespider) while allowing search crawlers (OAI-SearchBot, PerplexityBot).

AI Crawler Directory — Every Bot, What It Does, How to Allow It