Most page audits are built for traditional search. Technical SEO auditors check crawlability, page speed, canonical tags, and internal linking structure. Content auditors check keyword targeting, topical depth, and word count. Both are useful disciplines with well-established frameworks.
Neither one tells you whether a page will appear in AI-generated responses.
Key Takeaways
- AI visibility audits evaluate five signal categories that traditional SEO audits don’t capture: technical accessibility for AI crawlers, content structure for extraction efficiency, structured data for context declaration, entity clarity for authorship and organization trust, and content quality for specific, attributable answers.
- Each category can produce a page that ranks well in traditional search but gets skipped by AI synthesis systems. Failing any one of them creates a gap between ranking position and AI citation.
- Technical accessibility is the baseline. If AI crawlers can’t reach the page, nothing else matters. Check for AI-specific crawler blocks in
robots.txtand meta robots tags before evaluating other signals. - Entity clarity is the category that surprises practitioners most. Generic corporate copy that reads as credible to human readers often reads as ambiguous to AI engines. Named authors, specific credentials, and verifiable organizational identity are what AI citation trust is built on.
- Manual audits work for single pages but become inconsistent at scale. Systematic tooling closes the coverage and consistency gap, while practitioner interpretation determines what findings mean for a specific client’s priorities.
AI search engines (Google AI Overviews, Google AI Mode, Perplexity, ChatGPT Search) evaluate pages for a different set of signals than traditional ranking algorithms. A page that passes every traditional SEO check can still be completely absent from AI-generated answers because it fails the tests AI synthesis systems actually run: Can they extract clean answers from it? Can they verify who wrote it? Can they establish what kind of content it is without guessing?
An AI visibility audit uses a different lens. This article walks through the five signal categories that drive AI search visibility, what to look for in each, and what problems look like in practice. This isn’t a comprehensive checklist of every possible signal (the space is still evolving) but these five categories cover the gap between “ranks well” and “gets cited” for most pages in most industries.
Table of Contents
Before You Start: What You’re Actually Measuring
A traditional SEO audit asks: does this page have the attributes that ranking algorithms reward?
An AI visibility audit asks: does this page give AI synthesis systems what they need to use it as source material in a generated response?
Those are different questions. The first is about pleasing a ranking algorithm. The second is about enabling extraction, attribution, and trust. A page can win the first and fail the second.
The pages that get cited in AI-generated responses share certain characteristics: they answer questions directly, they declare who wrote them, they use structured markup that removes interpretive guesswork, and they make specific, attributable claims. An AI visibility audit is a systematic check for those characteristics.
Signal Category 1: Technical Accessibility
Technical accessibility for AI search means AI crawlers can reach, render, and index the page without friction. It’s the baseline requirement — everything else is irrelevant if the page can’t be crawled.
This category overlaps with traditional technical SEO but adds AI-specific considerations.
What to check
Whether AI-specific crawlers are blocked in robots.txt. This is the most common AI-specific technical issue. The robots.txt file that blocks SEO crawlers from staging environments or internal tools might also be blocking AI search crawlers, not intentionally, but because those user-agent strings were added after the original robots.txt rules were written. The major AI crawlers have defined user-agent strings: GPTBot for OpenAI, ClaudeBot for Anthropic, Perplexity’s crawler, Google’s extended AI crawlers. Each needs an explicit allow rule (or the absence of a disallow rule) to access the page. Google’s documentation on crawler management provides current user-agent strings.
Whether meta robots tags block AI crawlers. A <meta name="robots" content="noindex"> tag or an HTTP X-Robots-Tag: noindex header prevents indexing by all crawlers, including AI search systems. If a page has content you want in AI-generated responses, it needs to be indexable.
Whether JavaScript rendering is required to see the main content. AI crawlers vary in their ability to execute JavaScript. If your page’s primary content is loaded client-side via JavaScript and the static HTML is near-empty, some AI crawlers may see a blank page. This is less common than it was, but it’s worth checking for content-heavy pages built on modern JavaScript frameworks.
What a problem looks like
A client’s highest-traffic landing page has an overly broad Disallow: / rule in their staging robots.txt that was copied to production. Or a WordPress page has an accidentally enabled “Discourage search engines” setting that’s been live for months. Or a key service page relies entirely on a JavaScript-rendered content block that AI crawlers can’t see.
Signal Category 2: Content Structure
Content structure for AI search means the page’s information is organized for efficient extraction; specifically, that answers appear near the start of sections rather than requiring the AI to read the entire page to find them.
This is the category most closely connected to AEO (Answer Engine Optimization) and is where pages built for traditional keyword ranking most often fail the AI extraction test.
What to check
Whether key information leads each section. Look at the H2 and H3 sections on the page. Does each one open with the answer to the implied question of that section, then expand with context? Or does each section build toward the answer over several paragraphs? AI synthesis systems extract from the clearest path to the relevant information. A section that answers its question in the first two sentences is usable. A section that answers its question in the fifth paragraph may be skipped in favor of a page that’s more direct.
Whether an FAQ or Q&A pattern exists. FAQ sections are among the most efficient structures for AI extraction because they pre-format content as question-answer pairs. If a page covers topics that users are likely to query directly, a FAQ section makes that content explicitly extraction-ready.
Whether headers match real questions. Headers that mirror how users actually phrase questions (“How does fee-only financial planning work?” rather than “Financial Planning Overview”) signal to AI systems what question a section addresses. That’s useful for extraction targeting.
Whether content length is matched to the topic. A 2,500-word page on a topic that can be covered in 800 words has filler. AI systems assessing content quality penalize pages where the information-to-word-count ratio is low. Topical depth matters; word count for its own sake doesn’t.
What a problem looks like
An article that opens with three paragraphs of context-setting before addressing the reader’s actual question. A service page that covers ten related services at shallow depth rather than addressing any of them specifically. A how-to guide that has a clear process buried after an extended explanation of why the topic matters.
Signal Category 3: Structured Data
Structured data for AI search means schema markup that provides AI engines with explicit, machine-readable context about what the page is, who produced it, and what type of content it contains, eliminating inference that would otherwise introduce uncertainty.
Most practitioners understand schema in terms of Google’s rich results: the star ratings, FAQ dropdowns, and recipe cards that appear in traditional search. For AI search, the value is different. It’s about what the AI engine can establish with certainty rather than guess.
What to check
Whether relevant JSON-LD schema is present. The most important types for AI search visibility are Article (for editorial content), FAQPage (for pages with FAQ sections), Organization (for the publisher/company), and Person (for individual authors). If none of these are present, the AI engine is inferring content type and authorship from the HTML structure alone.
Whether the schema type matches the actual page. A blog post marked up as WebPage instead of Article is technically valid but loses the authorship and temporal fields that Article schema provides. A professional services page marked as a generic WebPage when it should be ProfessionalService or LocalBusiness is an opportunity missed. Use the most specific applicable type.
Whether schema accurately reflects what the page says. This is easy to miss: schema added years ago that no longer matches the current page content. A dateModified field from 2022 on a page updated monthly. An author field naming someone who no longer works there. Schema that contradicts the visible page creates conflicting signals that AI engines resolve by trusting neither.
Whether Organization and Person schema are in place sitewide. These two types build entity identity: who produced this content and why they’re credible. They’re foundational for AI citation, and they need to be consistent across the site, not just present on one page.
What a problem looks like
A WordPress site where the only schema is auto-generated, generic WebPage markup from an unconfigured plugin. A company blog where no posts have Author schema, so the author is either absent from the structured data or listed only as the company name. A service page with schema from the original site launch three years ago that hasn’t been updated since.
Signal Category 4: Entity Clarity
Entity clarity is the degree to which a page makes explicit what it is about and who produced it, in terms that AI models can cross-reference and trust rather than infer or assume.
This category is where pages built with generic marketing copy most consistently fail AI search, regardless of how well they rank in traditional search. The problem isn’t the quality of the writing; it’s the lack of identifiable, verifiable entities.
What to check
Whether the author is explicitly named with credentials. “Written by our team” or a byline that just says a first name without any context is entity-poor. A named author with a job title, organizational affiliation, and a link to a profile page with verifiable credentials is entity-rich. AI engines use author identity as a trust signal, not just whether an author is named, but whether the named author has a verifiable existence outside of this page.
Whether the organization is explicitly identified and consistent. The company name should appear in a consistent form across the page: in the content, in the footer, in the schema, and in the meta tags. If the schema says “Acme Corp,” and the page footer says “Acme Corporation,” and the content says “we at Acme,” the AI has three slightly different entity strings to reconcile. Consistency matters for entity resolution.
Whether sameAs links connect to authoritative external profiles. LinkedIn company pages, professional association memberships, regulatory registrations, and any external reference that an AI can use to verify the identity claim. A financial advisor page that references the advisor’s SEC registration number or CFP credential gives the AI something to cross-check. A generic financial advisor page with no verifiable external references offers no such anchor.
Whether the page’s topic focus is narrow enough for clear entity mapping. A page that covers ten loosely related topics is harder for AI systems to map to a specific entity or question type. The clearer the topic scope, the easier the extraction and attribution.
What a problem looks like
A healthcare practice page where the physician’s name appears once in a photo caption but isn’t mentioned in the body content, linked to a bio, or connected to any professional credentials. A law firm page where every attorney profile says they have “extensive experience” but no specific cases, bar admissions, or practice areas are named. A B2B SaaS product page where the company description is four vague sentences about “transforming businesses” with no named executives, no founding date, and no specific product claims.
Signal Category 5: Content Quality for AI Extraction
Content quality for AI extraction means the content answers real questions with specific, attributable language, not vague generalities that sound comprehensive but give AI systems nothing concrete to extract.
This is distinct from content quality in the traditional SEO sense. Traditional content quality metrics (depth, freshness, topical authority, engagement) still matter, but AI extraction adds a different dimension: can the AI pull a specific, accurate answer from this page and attribute it confidently?
What to check
Whether answers are specific rather than generic. “Many financial advisors charge fees based on your situation” is extractable but useless; it doesn’t answer anything. “Fee-only financial advisors typically charge between 0.5% and 1% of assets under management annually, according to the 2024 Kitces Advisor Compensation Study” is extractable and useful (not an actual study, by the way, but Kitces does produce content good for citations like this). AI engines are more likely to cite content that makes specific, attributable claims than content that hedges every statement.
Whether key terms are defined clearly on first use. AI synthesis systems need to build coherent answers from multiple sources. Pages that define their terms explicitly (e.g., “A fiduciary advisor, meaning one who is legally required to act in your best interest…”) give AI systems accurate language to work with. Pages that assume reader familiarity with technical terms force the AI to either skip the definition or infer one.
Whether claims can be traced to verifiable sources. Named sources, dated studies, and specific data points are signals that an AI engine can evaluate for reliability. Content that makes claims without any source trail is harder to cite with confidence. This doesn’t mean every sentence needs a footnote, but factual claims benefit from attribution.
Whether the content addresses the question behind the keyword. A page targeting “retirement planning Austin” might cover retirement planning broadly across 2,000 words without ever directly answering “how do I know if I’m on track for retirement?” which is the actual question most users asking that query want answered. AI search surfaces the answer to the user’s real question. Pages that answer the keyword but not the intent get passed over.
What a problem looks like
A blog post that is comprehensive and well-written but answers its central question in one sentence surrounded by 1,800 words of context. A product comparison page that describes every product at similar length without ever making a clear recommendation. A guide that addresses “the importance of” topics at length without providing the practical “how” that users need.
Manual vs. Systematic Audits
Going through these five categories manually for a single page is entirely possible. For a practitioner who does this regularly, a focused manual review of one page takes 30 to 60 minutes. You check the robots.txt and meta tags, read the content structure, inspect the schema markup in the page source, assess the entity signals, and evaluate whether the content answers real questions specifically.
The challenge is scale and consistency. A marketing site with 50 important pages can’t be audited this way without significant time investment. Manual checklists also drift; when you’re reviewing your eighth page, you’re less thorough than when you reviewed the first. And catching schema issues that require comparing the markup against the visible content, or checking whether AI-specific crawlers are blocked in robots.txt, is tedious to do manually across many pages.
This is exactly the problem a tool like AI Visibility Analyst is built to solve; it runs through these signal categories systematically for a given page and surfaces findings with traceable evidence, so you’re not relying on a manual checklist that’s easy to rush through. For a consultant auditing multiple client pages, or an in-house team that wants consistent evaluation across their most important pages, systematic tooling closes the consistency and coverage gaps that manual review introduces.
That said, systematic tooling and practitioner judgment aren’t substitutes for each other. A tool surfaces findings; a practitioner decides what those findings mean for a specific client’s business goals. Both are necessary for a useful audit output.
Frequently Asked Questions
-
How do I audit a page for AI visibility?
An AI visibility audit evaluates five signal categories: technical accessibility (can AI crawlers reach the page?), content structure (is information organized for efficient extraction?), structured data (does schema markup declare content type and authorship?), entity clarity (does the page identify who wrote it and who produced it, in verifiable terms?), and content quality for AI extraction (does the content answer real questions with specific, attributable language?). Evaluate each category on the specific page, identify which signals are absent or weak, and prioritize fixes by severity. Manual audits work for individual pages; systematic tools are more consistent across larger sets.
-
What signals affect AI search visibility most?
Across the pages I’ve evaluated, the most consistent gaps are entity signals (missing or generic author attribution), structured data (absent or generic schema markup), and content structure (answers buried in prose rather than leading sections). These three categories account for most of the gap between high-ranking pages and AI-cited pages. Technical accessibility issues (AI crawlers blocked by
robots.txt) are less common but critical when they occur, because no amount of signal optimization helps a page that AI systems can’t reach. -
Why does a high-ranking page sometimes not appear in AI search results?
AI search systems evaluate pages for extraction efficiency and trustworthiness, not just ranking authority. A page can rank well because of backlink authority and keyword relevance while failing AI visibility because its answers are buried in narrative prose, its authorship is unclear, or its schema markup is absent or generic. AI synthesis systems need content that’s easy to extract from and attribute confidently; those requirements aren’t captured by traditional SEO metrics like domain authority or keyword ranking.
-
How do I check if AI crawlers can access my pages?
Review your
robots.txtfile for rules that might block AI-specific crawler user-agents: GPTBot (OpenAI), ClaudeBot (Anthropic), and Perplexity’s crawler each have defined user-agent strings. Check whether yourrobots.txthas broadDisallowrules that apply to all crawlers. Also check for<meta name="robots" content="noindex">tags on pages you want AI systems to index. Google’s Search Central documentation lists current crawler user-agent strings for reference. -
What does entity clarity mean in an AI visibility audit?
Entity clarity is the degree to which a page explicitly identifies who wrote it and what organization produced it in terms that AI models can cross-reference. A page with a named author, a verifiable professional credential, an organizational affiliation, and
sameAslinks to external profiles has high entity clarity. A page with a generic “Our team” attribution or an anonymous byline has low entity clarity. AI citation systems are more confident attributing content from clearly identified entities than from ambiguous or unverifiable sources. -
How often should I audit pages for AI visibility?
Quarterly is a reasonable cadence for priority pages, frequently enough to catch changes in AI search behavior and spot new gaps as the platforms evolve. After any significant page update (new content, schema changes, author changes), re-audit to confirm the signals are still intact. An initial baseline audit of your most important pages makes sense before establishing a cadence; you need to know where you stand before you can track whether you’re improving.
-
Is an AI visibility audit the same as a traditional SEO audit?
No. They overlap in the technical accessibility layer (both check crawlability) and in content quality (both care about depth and relevance), but AI visibility audits specifically evaluate structured data completeness, entity clarity, and content structure for AI extraction, signals that traditional SEO audits typically don’t cover. A page that passes a comprehensive technical SEO audit can still score poorly on AI visibility. Both audits are useful; neither makes the other redundant.
Putting It Together
An AI visibility audit is a page-level diagnostic across these five categories. Not every finding is equally urgent; a page with strong entity signals and good content structure but missing FAQPage schema has a different priority profile than a page where AI crawlers are blocked entirely.
The goal of the analysis isn’t a perfect score on every signal. It’s a clear picture of where the gap is between how well a page performs in traditional search and how visible it is to AI search engines, and a prioritized set of improvements that close the most important gaps first.
Most pages I’ve evaluated that have strong traditional SEO but poor AI visibility fail on two or three signals consistently: entity clarity (generic author attribution or none), structured data (missing or generic), and content structure (answers buried in narrative prose). Fixing those three categories moves the needle more than optimizing the others in isolation.