Live Database

AI Crawlers Database

A comprehensive reference of 38 AI bots that crawl the web for training data, search indexing, and real-time content retrieval.

38
Crawlers
25
Recommended
23
Companies

Key Crawlers

The essential bots most sites should consider allowing

KEY

GPTBot

OpenAI

ChatGPT training & web browsing

KEY

Google-Extended

Google

Gemini/Bard training

KEY

ClaudeBot

Anthropic

Claude web access & citations

KEY

PerplexityBot

Perplexity

AI search engine indexing

KEY

xAI-Grok-Bot

xAI

Grok training & citations

KEY

Applebot-Extended

Apple

Apple Intelligence features

KEY

Bravebot

Brave

Brave Search and Leo AI

KEY

CCBot

Common Crawl

Open dataset used by many LLMs

Full Database

All 38 crawlers organized by function

Filter:
💬 Chat Assistants 9 crawlers · 9 recommended
REC

ChatGPT-User

OpenAI
ChatGPT-User

ChatGPT browse mode (real-time)

View docs →
REC

OAI-SearchBot

OpenAI
OAI-SearchBot

ChatGPT Search feature

View docs →
REC

Gemini-Deep-Research

Google
Gemini-Deep-Research

Gemini deep research feature

View docs →
REC

Google-NotebookLM

Google
Google-NotebookLM

NotebookLM source fetching

View docs →
REC

ClaudeBot

Anthropic
ClaudeBot

Claude web access & citations

View docs →
REC

Claude-Web

Anthropic
Claude-Web

Claude web features

View docs →
REC

Claude-User

Anthropic
Claude-User

User-triggered page fetches for Claude

View docs →
REC

Claude-SearchBot

Anthropic
Claude-SearchBot

Claude search result quality

View docs →
REC

Perplexity-User

Perplexity
Perplexity-User

User-triggered real-time fetches

View docs →
🔍 AI Search 7 crawlers · 7 recommended
REC

PerplexityBot

Perplexity
PerplexityBot

AI search engine indexing

View docs →
REC

Bravebot

Brave
Bravebot

Brave Search and Leo AI

View docs →
REC

DuckAssistBot

DuckDuckGo
DuckAssistBot

DuckDuckGo AI Assist

View docs →
REC

YouBot

You.com
YouBot

AI search engine

View docs →
REC

PhindBot

Phind
PhindBot

Phind AI search for developers

View docs →
REC

ExaBot

Exa
ExaBot

Exa semantic search indexing

View docs →
REC

AndiBot

Andi
AndiBot

Andi conversational search

View docs →
🧠 Model Training 19 crawlers · 8 recommended
REC

GPTBot

OpenAI
GPTBot

ChatGPT training & web browsing

View docs →
REC

Google-Extended

Google
Google-Extended

Gemini/Bard training

View docs →

GoogleOther

Google
GoogleOther

Google AI research and development

View docs →

Google-CloudVertexBot

Google
Google-CloudVertexBot

Vertex AI platform

View docs →
REC

anthropic-ai

Anthropic
anthropic-ai

Claude training

View docs →
REC

xAI-Grok-Bot

xAI
xAI-Grok-Bot

Grok training & citations

View docs →
REC

Applebot

Apple
Applebot

Siri and Spotlight search

View docs →
REC

Applebot-Extended

Apple
Applebot-Extended

Apple Intelligence features

View docs →

Meta-ExternalAgent

Meta
Meta-ExternalAgent

Meta AI training

View docs →

FacebookBot

Meta
FacebookBot

Meta AI features

View docs →
REC

Amazonbot

Amazon
Amazonbot

Alexa and Amazon AI

View docs →
REC

CCBot

Common Crawl
CCBot

Open dataset used by many LLMs

View docs →

Bytespider

ByteDance
Bytespider

TikTok/Doubao AI training

No public docs

cohere-ai

Cohere
cohere-ai

Enterprise AI training

View docs →

Deepseek

DeepSeek
Deepseek

DeepSeek AI training

View docs →

DeepseekBot

DeepSeek
DeepseekBot

DeepSeek web crawling

View docs →

MistralAI-User

Mistral
MistralAI-User

Mistral AI (European)

View docs →

Groq-Bot

Groq
Groq-Bot

Groq AI inference platform

View docs →

ImagesiftBot

Imagesift
ImagesiftBot

Image AI training

No public docs
📦 Other 3 crawlers · 1 recommended

amazon-kendra

Amazon
amazon-kendra

Amazon Kendra enterprise search

View docs →

Diffbot

Diffbot
Diffbot

Knowledge graph construction

View docs →
REC

LinkedInBot

LinkedIn
LinkedInBot

LinkedIn link previews

View docs →
⚠️

A Note on Agentic Browsers

New AI browsers like ChatGPT Operator, Google Project Mariner, and Anthropic Computer Use use standard Chrome user-agent strings, making them indistinguishable from regular browser traffic.

They cannot be blocked via robots.txt. GetCited keeps you informed about the evolving AI crawler landscape, even when control isn't possible.

Manage Your AI Crawler Access

GetCited is a free WordPress plugin that lets you control which AI bots can crawl your site.

Coming Soon Learn More

Database last updated: December 29, 2025