A comprehensive reference of 38 AI bots that crawl the web for training data, search indexing, and real-time content retrieval.
The essential bots most sites should consider allowing
ChatGPT training & web browsing
Gemini/Bard training
Claude web access & citations
AI search engine indexing
Grok training & citations
Apple Intelligence features
Brave Search and Leo AI
Open dataset used by many LLMs
All 38 crawlers organized by function
TikTok/Doubao AI training
No public docsImage AI training
No public docsNew AI browsers like ChatGPT Operator, Google Project Mariner, and Anthropic Computer Use use standard Chrome user-agent strings, making them indistinguishable from regular browser traffic.
They cannot be blocked via robots.txt. GetCited keeps you informed about the evolving AI crawler landscape, even when control isn't possible.
GetCited is a free WordPress plugin that lets you control which AI bots can crawl your site.
Coming Soon Learn MoreDatabase last updated: December 29, 2025