Configure which AI crawlers can access your site, download a correct robots.txt, and get a starter llms.txt scaffold — all in your browser, instantly.
Free · no signup · nothing leaves your browser
robots.txt has been around since 1994. For twenty years the only entities reading it were Google and Bing. Then, between 2022 and 2024, a new class of crawler showed up — and most sites either ignored them or blocked them by accident.
The consequence: a site that blocks GPTBot or ChatGPT-User can rank #1 on Google and still never appear in a ChatGPT answer. The gap between "Google presence" and "AI search presence" is often just two lines in robots.txt. This tool removes that gap in 60 seconds.
| Crawler | If allowed | If blocked |
|---|---|---|
| OAI-SearchBot / ChatGPT-User | ChatGPT can cite you in live answers | ChatGPT cannot cite you, regardless of content quality |
| GPTBot | Your content may appear in future ChatGPT training data | Excluded from OpenAI training (does NOT affect live citations) |
| PerplexityBot | Eligible to appear in Perplexity answers | Invisible to Perplexity entirely |
| Google-Extended | Content eligible for Gemini AI Overviews and Vertex AI | Opted out of Google's AI products |
| ClaudeBot / anthropic-ai | Content may influence Claude's knowledge | Excluded from Anthropic training |
| Bingbot | Indexed in Bing — prerequisite for ChatGPT citations | Invisible to Bing and therefore to ChatGPT web search |
There are two fundamentally different things AI crawlers do, and confusing them leads to bad decisions.
Training crawlers (GPTBot, ClaudeBot, CCBot) collect data to bake into a model's weights. Blocking them means your content is less likely to appear in a model's base knowledge — the things it "knows" without looking anything up.
Retrieval / answer bots (OAI-SearchBot, ChatGPT-User, PerplexityBot) fetch pages in real time to build the cited answer a user sees right now. Blocking these is what actually makes you invisible — and it's the far more consequential move for most sites.
The "Answer engines only" stance in this tool blocks training bots while keeping retrieval bots fully open. It's a reasonable middle ground if you want citation visibility but prefer to opt out of training datasets. The "Welcome all AI" default is what most sites should choose — the upside of broad AI visibility almost always outweighs the training opt-out concern.
llms.txt is a lightweight convention — not a standard, not a spec you're compelled to follow, but a fast-growing signal that well-structured sites use to help AI models represent them accurately.
The format: a # Title at the top, a > description blockquote, then ## Sections with Markdown links and short descriptions. You place it at yoursite.com/llms.txt. Some AI systems read it during crawling; others may request it when summarizing a site. It costs almost nothing to add and gives every AI tool a clear map of what you want them to know about you.
The scaffold this tool generates is a starting point — you should add more sections (About, Products, Team, Key statistics) and flesh out the descriptions with entity-dense, answer-first copy.
# title, a > blockquote summary, then ## sections listing key pages as links with short descriptions. Unlike robots.txt (which controls crawler access), llms.txt helps the model understand and represent your content accurately once it has been crawled.This tool opens the door. LumenGEO — the SaaS I built — shows you whether ChatGPT and Perplexity are actually citing you right now, what competitors are getting cited for, and where the gaps are.