Free AI robots.txt + llms.txt Generator — Let AI Search Crawl You

Q: Should I block GPTBot?

Only if you have a specific reason to opt out of training data — and even then, understand the trade-off: blocking GPTBot does not block OAI-SearchBot or ChatGPT-User, which are the bots responsible for ChatGPT's live web search answers. If you block all OpenAI bots indiscriminately, you make yourself invisible to ChatGPT's citations without actually preventing your content from appearing in older model training runs. The distinction is training crawlers (GPTBot, ClaudeBot, CCBot) versus retrieval bots (OAI-SearchBot, ChatGPT-User, PerplexityBot) — they are separate and should be configured separately.

Why this matters

Most sites are silently invisible to AI search

robots.txt has been around since 1994. For twenty years the only entities reading it were Google and Bing. Then, between 2022 and 2024, a new class of crawler showed up — and most sites either ignored them or blocked them by accident.

The consequence: a site that blocks GPTBot or ChatGPT-User can rank #1 on Google and still never appear in a ChatGPT answer. The gap between "Google presence" and "AI search presence" is often just two lines in robots.txt. This tool removes that gap in 60 seconds.

Crawler	If allowed	If blocked
OAI-SearchBot / ChatGPT-User	ChatGPT can cite you in live answers	ChatGPT cannot cite you, regardless of content quality
GPTBot	Your content may appear in future ChatGPT training data	Excluded from OpenAI training (does NOT affect live citations)
PerplexityBot	Eligible to appear in Perplexity answers	Invisible to Perplexity entirely
Google-Extended	Content eligible for Gemini AI Overviews and Vertex AI	Opted out of Google's AI products
ClaudeBot / anthropic-ai	Content may influence Claude's knowledge	Excluded from Anthropic training
Bingbot	Indexed in Bing — prerequisite for ChatGPT citations	Invisible to Bing and therefore to ChatGPT web search

Training vs. retrieval

The distinction most people miss

There are two fundamentally different things AI crawlers do, and confusing them leads to bad decisions.

Training crawlers (GPTBot, ClaudeBot, CCBot) collect data to bake into a model's weights. Blocking them means your content is less likely to appear in a model's base knowledge — the things it "knows" without looking anything up.

Retrieval / answer bots (OAI-SearchBot, ChatGPT-User, PerplexityBot) fetch pages in real time to build the cited answer a user sees right now. Blocking these is what actually makes you invisible — and it's the far more consequential move for most sites.

The "Answer engines only" stance in this tool blocks training bots while keeping retrieval bots fully open. It's a reasonable middle ground if you want citation visibility but prefer to opt out of training datasets. The "Welcome all AI" default is what most sites should choose — the upside of broad AI visibility almost always outweighs the training opt-out concern.

Read the full ChatGPT citation guide →

What is llms.txt?

A plain-text hint sheet for language models

llms.txt is a lightweight convention — not a standard, not a spec you're compelled to follow, but a fast-growing signal that well-structured sites use to help AI models represent them accurately.

The format: a # Title at the top, a > description blockquote, then ## Sections with Markdown links and short descriptions. You place it at yoursite.com/llms.txt. Some AI systems read it during crawling; others may request it when summarizing a site. It costs almost nothing to add and gives every AI tool a clear map of what you want them to know about you.

The scaffold this tool generates is a starting point — you should add more sections (About, Products, Team, Key statistics) and flesh out the descriptions with entity-dense, answer-first copy.

Common questions

FAQ

What is llms.txt?

llms.txt is a plain-text file you place at the root of your site (yoursite.com/llms.txt) that tells AI language models what your site is about and which pages matter most. It follows a lightweight Markdown convention: a # title, a > blockquote summary, then ## sections listing key pages as links with short descriptions. Unlike robots.txt (which controls crawler access), llms.txt helps the model understand and represent your content accurately once it has been crawled.

Should I block GPTBot?

Only if you have a specific reason to opt out of OpenAI's training data — and even then, understand the trade-off: blocking GPTBot does not block OAI-SearchBot or ChatGPT-User, which are the bots responsible for ChatGPT's live web-search citations. If you block all OpenAI bots indiscriminately, you make yourself invisible to ChatGPT's citations without actually preventing older model training runs from having captured your content. The "Answer engines only" stance in this tool handles the distinction correctly.

Will this make ChatGPT cite my site?

Correctly configuring robots.txt removes the most common silent blocker — a surprising number of sites block ChatGPT-User or OAI-SearchBot and then wonder why they never appear in answers. But crawler access is eligibility, not a guarantee. To earn citations, you also need to be indexed by Bing (ChatGPT reads Bing's index, not Google's), write answer-first passages, and publish original data that gives the model a specific reason to name you.

Does this tool store my domain or site content?

No. Everything happens entirely in your browser. No data is sent to any server. The generated robots.txt and llms.txt are assembled in JavaScript on your device and never leave it — you can verify this by running it while offline.

AI robots.txt + llms.txt Generator

Step 1Your site details

Step 2Set your stance

Most sites are silently invisible to AI search

The distinction most people miss

A plain-text hint sheet for language models

FAQ

Access is step one. Citations are the goal.