AI crawlers are constantly visiting your site. OpenAI’s GPTBot, Anthropic’s ClaudeBot, Google’s extended crawlers, Perplexity’s bot; they’re all collecting content. The problem is that when they arrive, they have no structural way to understand what your site is about, what the content is meant for, or what you’d want them to prioritize when synthesizing answers from it.
Key Takeaways
llms.txtis a Markdown file at the root of your domain that gives AI language models structured context about your site: what it is, who operates it, what sections it contains, and which content is most important.- It’s a convention proposed by Jeremy Howard and the Answer.AI community, not an enforced technical standard. Adoption varies across AI systems, but the implementation cost is low and adoption is growing.
robots.txtandllms.txtare complementary:robots.txtcontrols access,llms.txtprovides context. Both can and should coexist.llms-full.txtis an optional extended version that includes full page content rather than just links (most useful for smaller sites or documentation sites).- The most common mistakes are wrong file location (not at root), vague descriptions, and treating it as a set-it-and-forget-it file. Keep it updated and make the descriptions specific.
robots.txt tells crawlers which pages they can and can’t access. That’s access control, not context. There’s no standard mechanism in robots.txt to say “this site is a medical practice, the blog posts are educational, the service pages describe what we actually do, and the author is a licensed physician.” That context is what AI systems need to accurately represent your content in generated responses.
llms.txt was proposed as a solution to this problem. It’s a simple Markdown file at the root of your website that gives AI language models structured, human-readable context about your site: what it is, who it’s for, what sections exist, and what the content in each section is meant to communicate. It won’t stop a crawler that ignores it, and it’s not a technical standard enforced by browsers or servers. But it’s a well-structured convention that costs almost nothing to implement and communicates intent in a format AI systems are designed to understand.
Table of Contents
The Problem That Created llms.txt
To understand why llms.txt exists, you need to understand the gap it’s filling.
Traditional search crawlers (e.g., Googlebot, Bingbot) have decades of established conventions for communicating with website owners. robots.txt has been a standard since 1994. Sitemaps, structured data, meta tags, canonical tags: the web built up an extensive vocabulary for telling search engines what to do with your content.
AI crawlers arrived into this environment but operate differently. They’re not primarily looking for ranking signals; they’re extracting content to include in training datasets or to use for generating real-time answers. The questions they’re trying to answer about your site are fundamentally different: What is this site about? What kind of content does it contain? Is it authoritative on this topic? What specific sections should be prioritized for answering queries about this domain?
None of the existing conventions answer those questions directly. A well-structured sitemap tells crawlers which pages exist and when they were last updated. Structured data tells them what specific entities appear on a page. But there’s no file at the root of a site that says “this is a professional services firm, these pages explain what we do, these are the credentials of our practitioners, and this blog section contains educational content.”
Jeremy Howard and collaborators at Answer.AI proposed the llms.txt specification in 2024 to fill this gap. The spec is maintained at llmstxt.org, which documents the current format and provides implementation guidance.
robots.txt vs. llms.txt: Different Tools for Different Problems
These two files are complementary, not redundant. Understanding what each one does prevents a common confusion about when to use which.
robots.txt is access control. It tells crawlers which pages they’re allowed to visit and which they should skip. It uses a simple allow/disallow syntax and is widely supported across all crawlers that comply with the Robots Exclusion Protocol. If you add
Disallow: /private/to yourrobots.txt, compliant crawlers won’t access pages in that directory.
llms.txt is context provision. It tells AI language models what your site is about, what its sections contain, and how to interpret the content they find. It doesn’t block anything; it communicates meaning and structure to systems that have already accessed your site.
A practical way to think about the difference: robots.txt is the fence. llms.txt is the signage on the property that explains what kind of place it is.
Both files can and should coexist. You might use robots.txt to block AI training crawlers from staging environments or proprietary internal tools while using llms.txt to give context about the public content you do want AI systems to use accurately. They’re solving adjacent but distinct problems.
One important note: robots.txt has near-universal crawler support because it’s a decades-old standard with strong industry adoption. llms.txt is newer and has variable support across AI systems. I’ll cover the current adoption state further down.
What llms.txt Looks Like: Format and Syntax
llms.txt is a plain text file written in Markdown. It lives at the root of your domain (served at yourdomain.com/llms.txt) and should be served with a text/plain content type. The format is defined in the spec at llmstxt.org, and while Markdown lets you do many things, the spec defines a specific structure.
The required elements:
H1 header: The site name. Just the name, no description here.
Blockquote: A brief description of the site (what it is and who it serves). This is the first context signal an AI model reads.
H2 sections: Named content sections, each containing a list of URLs with optional per-URL descriptions. Sections organize your content by type or purpose.
Optional marker: H2 sections can be flagged as ## Optional to indicate content that’s lower priority for AI processing. It’s useful for large archives, promotional content, or pages you want indexed but not prioritized.
Individual URLs appear as Markdown links: - [Page Title](URL): Brief description of what this page contains.
Here’s a complete, annotated example for a hypothetical digital marketing consulting firm:
# Meridian Digital Consulting
> Meridian Digital Consulting is a digital marketing firm specializing in SEO,
> AI search visibility, and content strategy for mid-size B2B companies.
> Founded in 2018 and based in Austin, Texas. All services are delivered by
> named practitioners with documented credentials.
## Core Services
These pages describe the firm's service offerings and methodology.
- [SEO Audit Services](https://meridianconsulting.com/services/seo-audit):
Technical and content SEO audits for B2B company websites.
- [AI Visibility Auditing](https://meridianconsulting.com/services/ai-visibility-audit):
Page-level diagnostic audits for AI search engine visibility signals.
- [Content Strategy](https://meridianconsulting.com/services/content-strategy):
Editorial strategy and content architecture for search visibility.
## Team
Named practitioners with credentials and areas of expertise.
- [Michael Torres, Principal Consultant](https://meridianconsulting.com/team/michael-torres):
15 years in technical SEO and search strategy. Certified Google Analytics Professional.
- [About the Firm](https://meridianconsulting.com/about):
Company background, client types, and methodology overview.
## Educational Resources
Blog posts and guides on AI search, SEO, and content visibility.
- [Why Top-Ranked Pages Miss AI Results](https://meridianconsulting.com/blog/ranked-pages-invisible-ai):
Explains how AI search visibility differs from traditional SEO rankings.
- [AEO vs GEO vs SEO](https://meridianconsulting.com/blog/aeo-geo-seo-explained):
Definitions and distinctions between the three optimization disciplines.
- [How to Audit for AI Visibility](https://meridianconsulting.com/blog/ai-visibility-audit-guide):
Framework for evaluating page-level AI search signals.
## Optional
These pages provide supplementary context but are lower priority for AI synthesis.
- [Client Case Studies](https://meridianconsulting.com/case-studies):
Project summaries and results from past client engagements.
- [Speaking and Events](https://meridianconsulting.com/speaking):
Conference appearances and speaking topics.
A few things to notice in that example:
The blockquote description does real work. It establishes entity signals (named firm, location, founding date, specialization, staffing structure) in a few sentences. An AI model reading this file has a clear picture of what kind of organization this is before it reads a single page of content.
The section headers reflect the content’s purpose, not just its type. “Core Services” and “Team” differ from “Blog” in meaningful ways; the AI model can distinguish the first two as authoritative primary content and the third as educational material.
Per-URL descriptions are short but specific. “Technical and content SEO audits for B2B company websites” is more useful to an AI model than just the page title. The description tells the model what question this URL answers.
The Optional section is honest about priority. Large sites can include substantial content under Optional without implying that it should be weighted the same as core service pages.
llms.txt vs. llms-full.txt
The specification defines two related files:
llms.txt is the lightweight index. It contains the site overview and a curated list of important URLs with brief descriptions. It’s the file AI systems are expected to fetch first for context. For most sites, this is the primary file to implement.
llms-full.txt is an optional extended version that includes the full Markdown-formatted content of the pages listed in llms.txt, rather than just the links. The idea is that AI systems with large context windows could fetch llms-full.txt and get the complete site content in a single file, rather than crawling individual pages.
In practice, llms-full.txt is most useful for smaller sites where the complete content fits within a reasonable context window, or for documentation sites where comprehensive coverage matters. For a large marketing website with hundreds of pages, llms-full.txt becomes unwieldy.
For most sites getting started, implement llms.txt first. Get the structure right and the key URLs documented. llms-full.txt is an enhancement once the basic file is working.
Current Adoption: Who Actually Honors llms.txt
llms.txt is a convention, not an enforced standard. There’s no protocol that requires AI crawlers to fetch and respect the file the way web servers enforce content type requirements. Whether a given AI system uses your llms.txt depends on whether the system’s developers have built in support for it.
As of mid-2025, adoption is real but not universal. Several AI companies have stated support for the spec or have implemented reading it as part of their crawling behavior. The spec’s proponents have been active in promoting adoption, and early documentation of which systems honor it is available at llmstxt.org.
The practical question for implementers is: does the absence of universal adoption make it not worth doing? The answer is no, for two reasons.
First, the implementation cost is low. A well-structured llms.txt file takes two to four hours to write well for a typical site. The upside (e.g., clearer context for any AI system that reads it) is asymmetric with that cost. Systems that ignore it pay no penalty; systems that read it get useful information.
Second, adoption is growing rather than contracting. This is an emerging convention with real institutional backing. Early adoption is low-cost and positions you well as more AI systems formalize their support.
Where llms.txt Fits in Your AI Visibility Stack
llms.txt provides site-level context. Structured data markup provides page-level context. They work at different layers and serve complementary functions.
A well-implemented llms.txt tells an AI crawler what your site is before it reads any individual page. It establishes the site’s purpose, the organization behind it, the type of content it contains, and which sections are most important. This reduces the interpretive work the AI has to do when it processes individual pages.
Schema markup on individual pages tells the AI what each page is: its content type, who authored it, which entity it covers, and what questions it answers. Together, these two layers give AI systems a structured picture of your site at the macro and page levels. Without either, well-ranked pages often remain invisible in AI-generated responses despite their traditional search performance.
An AI visibility audit that ignores llms.txt is missing part of the picture, just as one that ignores schema is incomplete. They address different surfaces but both matter for the same ultimate goal: making your content easy for AI systems to understand and cite accurately.
Common Mistakes
Wrong file location. llms.txt must be served from the root of your domain — yourdomain.com/llms.txt. Not a subdirectory, not a subfolder in your CMS. If your site is at www.example.com, the file goes at www.example.com/llms.txt. A file at www.example.com/docs/llms.txt won’t be found by crawlers looking for it in the standard location.
Incorrect content type. The file should be served as text/plain, not text/html or application/octet-stream. Most web servers will serve .txt files with the correct content type automatically, but if you’re using a CMS that rewrites file handling, verify this.
Vague or empty descriptions. A blockquote that says “We’re a digital marketing company” and service URLs with no descriptions is technically valid but nearly useless. The value of llms.txt comes from the specificity of the context it provides. Named practitioners, defined service areas, specific content descriptions; those are what the file is for.
Treating it like robots.txt. Some implementations try to use llms.txt to signal content exclusions, similar to robots.txt disallow directives. That’s not what the format is designed for. If you want to block AI crawlers from specific pages, use robots.txt for access control. Use llms.txt for context provision about the pages you do want AI systems to understand.
Setting it and forgetting it. llms.txt should reflect your current site structure. If you launch new service pages, add practitioners, or publish a significant amount of new content in a category, update the file. A llms.txt that describes your 2023 site in 2026 provides stale context.
Including too many URLs. The file is a curated guide, not a full sitemap. If your site has 500 pages, listing all 500 in llms.txt defeats the purpose. Curate: include the pages that matter most for representing who you are and what you do, and use the Optional section for supplementary content.
Frequently Asked Questions
-
What is llms.txt and what does it do?
llms.txtis a Markdown-formatted text file placed at the root of a website that gives AI language models structured context about the site — what it is, who operates it, what content it contains, and how the content should be understood. It was proposed by Jeremy Howard and collaborators at Answer.AI in 2024 to fill a gap that existing web standards (likerobots.txtand sitemaps) don’t address: communicating meaning and context to AI crawlers, not just access control. -
How is llms.txt different from robots.txt?
robots.txtis access control — it tells crawlers which pages they’re allowed to visit using allow/disallow directives.llms.txtis context provision — it tells AI language models what your site is about and how to interpret the content they find. They’re complementary:robots.txtcontrols the fence,llms.txtexplains what’s inside. A site with well-implementedrobots.txtbut nollms.txtis giving AI crawlers access without context. Both files can coexist at the domain root. -
Where does llms.txt need to be placed on my website?
At the root of your domain, served at
yourdomain.com/llms.txt. Not in a subdirectory, not inside a CMS folder. The file should be served astext/plain. Most web servers handle this automatically for.txtfiles, but verify the content type if you’re using a CMS that rewrites file handling. Anllms.txtfile placed anywhere other than the root won’t be found by crawlers looking for it in the standard location. -
Do AI systems actually read and use llms.txt?
Adoption is real but variable. Several AI companies have stated support for the spec or built it into their crawling behavior; others haven’t publicly documented their implementation status. Because adoption is growing rather than declining, and the implementation cost is low, the asymmetry favors implementing it even without guaranteed universal support. Systems that ignore it lose nothing; systems that read it get useful context for representing your content accurately.
-
What should I include in my llms.txt file?
A site description in a blockquote (two to four sentences identifying who operates the site, what it does, and any relevant credentials), H2 sections for your main content categories, curated lists of important URLs with brief one-sentence descriptions per URL, and an
Optionalsection for supplementary content. Be specific: named practitioners, defined service areas, and concrete content descriptions are what make the file useful. A vague description provides almost no benefit over not having the file at all. -
What is llms-full.txt and should I implement it?
llms-full.txtis an optional extended version ofllms.txtthat includes the full Markdown-formatted content of your key pages, rather than just links to them. The idea is that AI systems with large context windows can read the full content in a single fetch rather than crawling individual pages. It’s most useful for small sites or documentation sites where complete coverage matters. For larger marketing or service websites, start withllms.txtand implementllms-full.txtonly if there’s a specific use case that justifies the maintenance overhead. -
How often should I update my llms.txt file?
Update it whenever you make significant changes to your site structure: launching new service pages, adding team members, creating major new content sections.
llms.txtdescribes your current site, not your site at some past point in time. A stale file provides outdated context, which can lead to AI systems representing your site inaccurately. Add allms.txtreview to your site maintenance checklist; quarterly is usually enough for most sites unless you’re publishing new content categories frequently. -
Does having llms.txt guarantee I’ll be cited in AI search responses?
No.
llms.txtimproves the context AI systems have about your site, but citation in AI-generated responses depends on many signals: technical accessibility, content structure, schema markup, entity clarity, and domain authority.llms.txtis one part of a broader AI visibility picture, not a standalone solution. Think of it as removing a specific obstacle (ambiguity about what your site is) rather than as a direct path to citation.
Implementing llms.txt: The Practical Steps
For most sites, implementation is straightforward:
- Draft the site description. Write two to four sentences that a stranger would need to understand what this site is and who operates it. Include the organization name, what it does, who the key practitioners are, and any relevant geographic or regulatory context.
- Identify your priority sections. For most sites, this is three to five categories: core services or products, team or about pages, primary educational or resource content. Resist the urge to map your entire site architecture — focus on the pages that most directly represent the site’s authority and purpose.
- For each priority page in your sections, write a one-sentence description that explains what question that page answers. “Explains the difference between SEO and AI search optimization for non-technical business owners” is more useful than “Blog post about SEO.”
- Assign supplementary content to
Optional. Archive blog posts, promotional pages, and lower-priority resources can be listed here without implying they’re as important as your core content. - Place the file at
/llms.txton your server. Verify it’s publicly accessible atyourdomain.com/llms.txtand serving astext/plain. - Test it. Fetch the file in a browser to confirm it’s accessible. Check the content type with a curl request or browser developer tools.
- Add it to your ongoing site maintenance checklist. Review it when you add major new content categories or change your service offerings.