Free Tool

AI robots.txt + llms.txt Generator

Configure which AI crawlers can access your site, download a correct robots.txt, and get a starter llms.txt scaffold — all in your browser, instantly.

Free · no signup · nothing leaves your browser

Step 1Your site details

Step 2Set your stance

Choose a preset — or fine-tune the toggles below
Search engines
Googlebot
Standard Google web crawler — indexes pages for Google Search
Allow
Bingbot
Microsoft Bing crawler — critical because ChatGPT's web search reads Bing's index
Allow
Applebot
Apple's web crawler — feeds Siri, Spotlight, and Safari search suggestions
Allow
AI answer engines / retrieval
OAI-SearchBot
Fetches pages for ChatGPT's live web search answers — this is the citation bot
Allow
ChatGPT-User
Visits a page when a user shares or clicks a link inside ChatGPT
Allow
PerplexityBot
Perplexity AI's indexing crawler — builds the pool of citable sources
Allow
Perplexity-User
Fetches pages live when a Perplexity user clicks a search result
Allow
Google-Extended
Controls whether your content feeds Google's Gemini AI models and Vertex AI
Allow
AI training crawlers
GPTBot
Trains ChatGPT's base model — does NOT control live web-search citations
Allow
ClaudeBot
Anthropic's crawler — trains Claude and builds its web knowledge
Allow
anthropic-ai
Alternative Anthropic user-agent; treat identically to ClaudeBot
Allow
Claude-Web
Claude's browsing agent when responding to user queries with web access
Allow
CCBot
Common Crawl — large open dataset used by many AI labs for training
Allow
Bytespider
ByteDance crawler — feeds TikTok's recommendation AI and training data
Allow
Amazonbot
Amazon's crawler — feeds Alexa, Rufus, and Amazon's AI product features
Allow
Applebot-Extended
Apple's extended crawler for AI training — separate from the base Applebot
Allow
robots.txt
llms.txt
Why this matters

Most sites are silently invisible to AI search

robots.txt has been around since 1994. For twenty years the only entities reading it were Google and Bing. Then, between 2022 and 2024, a new class of crawler showed up — and most sites either ignored them or blocked them by accident.

The consequence: a site that blocks GPTBot or ChatGPT-User can rank #1 on Google and still never appear in a ChatGPT answer. The gap between "Google presence" and "AI search presence" is often just two lines in robots.txt. This tool removes that gap in 60 seconds.

Crawler If allowed If blocked
OAI-SearchBot / ChatGPT-UserChatGPT can cite you in live answersChatGPT cannot cite you, regardless of content quality
GPTBotYour content may appear in future ChatGPT training dataExcluded from OpenAI training (does NOT affect live citations)
PerplexityBotEligible to appear in Perplexity answersInvisible to Perplexity entirely
Google-ExtendedContent eligible for Gemini AI Overviews and Vertex AIOpted out of Google's AI products
ClaudeBot / anthropic-aiContent may influence Claude's knowledgeExcluded from Anthropic training
BingbotIndexed in Bing — prerequisite for ChatGPT citationsInvisible to Bing and therefore to ChatGPT web search
Training vs. retrieval

The distinction most people miss

There are two fundamentally different things AI crawlers do, and confusing them leads to bad decisions.

Training crawlers (GPTBot, ClaudeBot, CCBot) collect data to bake into a model's weights. Blocking them means your content is less likely to appear in a model's base knowledge — the things it "knows" without looking anything up.

Retrieval / answer bots (OAI-SearchBot, ChatGPT-User, PerplexityBot) fetch pages in real time to build the cited answer a user sees right now. Blocking these is what actually makes you invisible — and it's the far more consequential move for most sites.

The "Answer engines only" stance in this tool blocks training bots while keeping retrieval bots fully open. It's a reasonable middle ground if you want citation visibility but prefer to opt out of training datasets. The "Welcome all AI" default is what most sites should choose — the upside of broad AI visibility almost always outweighs the training opt-out concern.

Read the full ChatGPT citation guide →

What is llms.txt?

A plain-text hint sheet for language models

llms.txt is a lightweight convention — not a standard, not a spec you're compelled to follow, but a fast-growing signal that well-structured sites use to help AI models represent them accurately.

The format: a # Title at the top, a > description blockquote, then ## Sections with Markdown links and short descriptions. You place it at yoursite.com/llms.txt. Some AI systems read it during crawling; others may request it when summarizing a site. It costs almost nothing to add and gives every AI tool a clear map of what you want them to know about you.

The scaffold this tool generates is a starting point — you should add more sections (About, Products, Team, Key statistics) and flesh out the descriptions with entity-dense, answer-first copy.

Common questions

FAQ

What is llms.txt?
llms.txt is a plain-text file you place at the root of your site (yoursite.com/llms.txt) that tells AI language models what your site is about and which pages matter most. It follows a lightweight Markdown convention: a # title, a > blockquote summary, then ## sections listing key pages as links with short descriptions. Unlike robots.txt (which controls crawler access), llms.txt helps the model understand and represent your content accurately once it has been crawled.
Should I block GPTBot?
Only if you have a specific reason to opt out of OpenAI's training data — and even then, understand the trade-off: blocking GPTBot does not block OAI-SearchBot or ChatGPT-User, which are the bots responsible for ChatGPT's live web-search citations. If you block all OpenAI bots indiscriminately, you make yourself invisible to ChatGPT's citations without actually preventing older model training runs from having captured your content. The "Answer engines only" stance in this tool handles the distinction correctly.
Will this make ChatGPT cite my site?
Correctly configuring robots.txt removes the most common silent blocker — a surprising number of sites block ChatGPT-User or OAI-SearchBot and then wonder why they never appear in answers. But crawler access is eligibility, not a guarantee. To earn citations, you also need to be indexed by Bing (ChatGPT reads Bing's index, not Google's), write answer-first passages, and publish original data that gives the model a specific reason to name you.
Does this tool store my domain or site content?
No. Everything happens entirely in your browser. No data is sent to any server. The generated robots.txt and llms.txt are assembled in JavaScript on your device and never leave it — you can verify this by running it while offline.
— Go deeper

Access is step one. Citations are the goal.

This tool opens the door. LumenGEO — the SaaS I built — shows you whether ChatGPT and Perplexity are actually citing you right now, what competitors are getting cited for, and where the gaps are.