AI Visibility Metrics: 6 KPIs, Formulas & Benchmarks

AI visibility metrics are the numbers that tell you whether ChatGPT, Gemini, Perplexity, Copilot and Google's AI Overviews mention, rank and recommend your brand when buyers ask for solutions. Six matter: mention rate, AI share of voice, average position, sentiment, citation rate and prompt coverage.

This guide defines each one, gives you the formula, and — the part most guides skip — shows what a good number actually looks like, using benchmark ranges from 200+ B2B SaaS brands tracked daily on MaxAEO across eight AI platforms. By the end you'll know how to measure all six, which one to fix first, and how to report them to a CMO who still thinks in rankings and impressions.

What Are AI Visibility Metrics?

AI visibility metrics measure how often, how prominently and how favorably AI assistants and answer engines include your brand in their generated answers. Instead of tracking where a URL ranks in a list of links, they track whether your brand appears inside the answer itself — and what the AI says when it does.

The shift matters because AI answers compress the buyer journey. When someone asks ChatGPT "what's the best contract management software for mid-market legal teams," the model returns a shortlist of three to seven brands. There is no page two. If you're not in that answer, you were never considered — and no rank tracker, impression report or click metric will tell you it happened.

That's the measurement gap ai search monitoring exists to close. Answer engine optimization (AEO) and generative engine optimization (GEO) are the disciplines for improving these numbers; the six metrics below are how you know whether any of it is working.

Why Classic SEO Metrics Can't Measure AI Search

Rankings, impressions and clicks fail in AI search for three structural reasons:

There is no stable rank. The same prompt, asked twice, produces different answers. AirOps' analysis found only 30% of brands stay visible from one AI answer to the next, and just 20% remain visible across five consecutive runs. A single-snapshot "we're in ChatGPT!" screenshot is noise, not data.
There is no impression data. OpenAI, Anthropic and Perplexity publish no Search Console equivalent. You can't pull a report of how often you appeared; you have to sample answers yourself, repeatedly, and compute the rates.
Most value never produces a click. A buyer who sees your brand recommended in three AI conversations may type your URL directly a week later. Referral traffic understates AI's influence, which is why answer-side metrics — not just analytics — are the primary measure.

The practical consequence: AI visibility is measured the way pollsters measure opinion — repeated sampling of a fixed prompt set, aggregated over time. Every metric below assumes that methodology.

The 6 AI Visibility Metrics That Matter

Here is the framework at a glance. Your category will vary, but these ranges hold across most competitive software niches.

Metric	Question it answers	Formula	Median (B2B SaaS)	Strong
Mention rate	How often does AI name us?	mentions ÷ total answers sampled	14%	45%+
AI share of voice	How big is our slice vs. competitors?	your mentions ÷ all brand mentions	6%	20%+
Average position	Where do we land on the shortlist?	mean list position when mentioned	4.2	≤2.5
Sentiment	How does AI describe us?	% positive / neutral / negative	18% positive	30%+ positive, <5% negative
Citation rate	Does AI link to us as a source?	answers citing your domain ÷ answers sampled	9%	25%+
Prompt coverage	How much of the buyer-question map are we on?	prompts where you appear ≥1× per week ÷ prompts tracked	31%	60%+

Benchmark source: MaxAEO tracking of 200+ B2B SaaS brands across eight platforms (ChatGPT, Gemini, Perplexity, Claude, Copilot, Grok, Google AI Overviews, AI Mode), daily prompt runs, Q1–Q2 2026. Medians are computed on unbranded prompts; "strong" = top quartile of tracked brands.

Benchmark table of six AI visibility metrics across ChatGPT, Gemini and Perplexity

1. Mention Rate: How Often AI Talks About You

Mention rate is the percentage of sampled AI answers that name your brand at all — linked or unlinked, recommended or merely listed. It is the foundation metric: every other number is computed from the answers where a mention exists.

Formula: mention rate = answers naming your brand ÷ total answers sampled × 100.

What the data shows: across MaxAEO's tracked brands, the median mention rate on unbranded category prompts ("best X for Y") is 14%. Category leaders sit at 55–75%. New entrants typically start below 10%. Platforms differ sharply — in our tracking, Perplexity names roughly 1.6× more brands per answer than ChatGPT because its answers lean on list-style sources, so a 25% mention rate on Perplexity and 25% on ChatGPT are not equally hard to earn.

That's why you track brand mentions in ChatGPT separately from every other platform, never blended into one vanity number. A blended rate hides the platform where you're invisible.

2. AI Share of Voice: Your Slice of the Conversation

AI share of voice (SoV) is your brand's percentage of all brand mentions across the answers sampled for your category. Mention rate tells you how often you appear; SoV tells you how much of the conversation you own relative to competitors fighting for the same shortlists.

Formula: AI SoV = your mentions ÷ total mentions of all brands × 100 — with a position-weighted variant that counts a #1 placement more than a #7.

In MaxAEO data, the typical B2B SaaS category surfaces 8–14 distinct brands across a week of sampling. Category leaders hold 25–40% position-weighted SoV; the median tracked brand holds about 6%. Because ai share of voice is zero-sum, it's the best executive metric: it can fall even while your mention rate rises, which tells you competitors are gaining faster.

The full math, weighting options and scoring tiers are covered in our guide to how to calculate AI share of voice and what a good score looks like.

3. Average Position: Where You Land on the Shortlist

Average position is the mean slot your brand occupies in list-style AI answers, counting only the answers where you appear. Unlike the other five metrics, lower is better: position 1 means you're the first name the AI gives a buyer.

Formula: average position = sum of your list positions ÷ answers where you appear.

Position matters because AI shortlists get truncated in use. Buyers frequently follow up with "compare the top two" or "which one is best for a small team" — and our tracking shows brands in positions 1–3 are carried into those follow-up answers roughly twice as often as brands in positions 4+. A mention at position 7 is real but fragile.

Benchmarks: leaders hold an average position of 1.5–2.5. The median tracked brand sits at 4.2. Watch the trend more than the level — a slide from 2.8 to 4.5 over six weeks usually precedes a mention-rate decline, because models reshuffle the bottom of lists before dropping a brand entirely.

4. Sentiment: How AI Describes You When It Does

Sentiment measures the tone of the language AI uses around your brand — classified positive, neutral or negative per mention, reported as a percentage split. A brand can have excellent mention numbers and still lose deals if every answer appends "however, users report a steep learning curve and slow support."

The realistic distribution surprises most teams: in MaxAEO tracking, the median split is 78% neutral, 18% positive, 4% negative. AI models hedge by default, so neutral dominance is normal. MaxAEO classifies each mention with an LLM grader spot-checked by humans weekly; if you classify manually, score the sentences around your brand name, not the whole answer.

Two signals are worth alerting on: negative share crossing 10% — almost always traceable to a specific source the models keep citing, like a review-site thread or a critical comparison post — and a recurring negative phrase appearing across platforms, which means it's baked into multiple sources.

This is where llm brand tracking crosses into AI reputation management: sentiment is the metric that tells comms and PR teams what the models believe, and which sources taught them.

5. Citation Rate: Whether AI Links to You as a Source

Citation rate is the percentage of sampled answers that cite your domain as a source, regardless of whether your brand is named in the answer text. Mentions and citations are different events: you can be recommended without being cited, and cited without being named. Mentions drive consideration; ai citations drive referral clicks and compound future visibility.

Formula: citation rate = answers citing your domain ÷ total answers sampled × 100.

Benchmarks: the median own-domain citation rate on commercial prompts is 9%; strong performers reach 25%+. The uncomfortable finding — in MaxAEO's tracking, roughly two-thirds of citations on commercial prompts point to third-party content: review platforms, comparison posts, Reddit threads and industry roundups, not vendor sites. AirOps' research aligns, finding ~85% of brand mentions originate from third-party pages.

The implication: you raise citation rate partly by improving your own pages and largely by earning presence on the pages AI already trusts. We break down exactly which ones in the source types ChatGPT, Perplexity and Gemini cite most.

6. Prompt Coverage: How Much of the Map You're On

Prompt coverage is the percentage of your tracked prompt set where your brand appears at least once per week. It answers a different question than mention rate. Mention rate averages across all sampled answers; coverage tells you which buyer questions you exist for at all — and where you're structurally absent.

Formula: prompt coverage = prompts where you appeared ≥1× this week ÷ prompts tracked × 100.

Because AI answers are volatile, coverage runs higher than per-answer mention rate: a brand with a 14% mention rate typically shows ~31% weekly coverage in our data, appearing intermittently on prompts it doesn't own. The diagnostic gold is the gap list: prompts where coverage is 0% for weeks are clusters where no source the models trust connects your brand to that use case. That's a content and PR target list, ranked for you.

Coverage is only as honest as the prompt set behind it — 120 prompts spanning category, comparison, use-case and persona queries beats 20 generic ones. Our guide to building an AI prompt set that mirrors what buyers actually ask covers the method.

How AI Visibility Benchmarks Differ by Platform

The same brand authority earns different numbers on every platform, because each engine builds answers from a different index at a different speed. What we observe across the tracked base:

Platform	What we observe	What it means for your numbers
ChatGPT	Slowest to reflect new or updated content — first meaningful gains at 60–90 days; fewer brands named per answer	Hardest mention rate to move; judge it on quarterly trends, not weekly ones
Perplexity	Citation-led; names ~1.6× more brands per answer than ChatGPT and picks up updated sources within days	Your highest raw numbers and fastest feedback loop — use it to test which fixes work
Google AI Overviews / AI Mode	Grounded in Google's live index	Moves when Google recrawls you and your sources; classic SEO hygiene pays double here
Copilot	Grounded in Bing's index	Often the easiest early win — submit updated URLs via Bing Webmaster Tools and IndexNow
Gemini	Most conservative about adding new brands in our tracking	Expect the flattest line in your first quarter; don't read it as failure

Claude and Grok behave most like ChatGPT in our data. The takeaway: set platform-specific expectations before you start, or the fast platforms will look like luck and the slow ones like failure.

How to Measure AI Visibility in 6 Steps

You can compute all six metrics with the same sampling pipeline:

Build a prompt set of 50–150 buyer questions across four types: category ("best AI visibility tool for agencies"), comparison, use-case and problem prompts. This set is your survey instrument — keep it stable so trends are real.
Run every prompt on every platform that matters to your buyers — ChatGPT, Gemini, Perplexity, Claude, Copilot, Grok, AI Overviews and Google AI Mode cover most B2B journeys.
Sample repeatedly, not once. Daily runs are the standard because answers reshuffle constantly; our analysis of how often AI answers change across eight platforms found week-old snapshots misstate current visibility for a large share of prompts.
Log four facts per answer: brands named (and order), sentiment of your mention, sources cited, and your domain's presence among them.
Compute the six metrics weekly, per platform. Weekly aggregation smooths daily volatility into trendable numbers; per-platform segmentation preserves the differences that tell you where to act.
Trend them against actions. Tag the dates you shipped comparison pages, earned a review-site update or landed a roundup placement, and watch which metrics move in the following 2–6 weeks.

Manually, this is feasible as a monthly 10–15 prompt spot-check — genuinely better than nothing. At daily, eight-platform, 120-prompt scale (≈960 answers a day), it's a job for an ai visibility tool; that throughput is exactly what MaxAEO automates, including the per-answer logging in step 4.

A Worked Example: One Brand, Six Numbers, 90 Days

Here's an anonymized case from MaxAEO's tracking: a Series B project-management SaaS, 120 prompts, eight platforms, daily runs.

Baseline (week 1): prompt coverage 22%, mention rate 11%, AI share of voice 4.8%, average position 5.1, sentiment 83/12/5 (neutral/positive/negative), citation rate 6%. The gap list showed 0% coverage on the entire "for agencies" prompt cluster — 18 prompts, all owned by two competitors. Citation logs showed answers leaned on three third-party pages: a stale review-platform profile and two comparison posts that omitted the brand.

Actions (weeks 2–5): published an agencies use-case page and an honest competitor-comparison page, refreshed the review-platform profile, pitched inclusion in both comparison posts (one updated), and fixed crawl blocks on their docs subdomain.

Result (week 8): coverage 41%, mention rate 19%, SoV 8.9%, average position 3.7, citation rate 13%, negative sentiment down to 3%.

MaxAEO dashboard showing mention rate and share of voice trending up over eight weeks

The platform sequencing is the repeatable lesson, and it matches the benchmark table above: Perplexity moved first (weeks 2–3, citing the updated comparison post almost immediately), Copilot and AI Overviews followed as Bing and Google recrawled, ChatGPT shifted meaningfully only from week 6, and Gemini barely moved by week 8. That 60–90 day ChatGPT lag matches what AirOps reports for initial visibility gains — set it as the expectation when you defend this budget line.

Which Metric Should You Fix First?

Fix coverage first, then mention rate, then position, then sentiment, then citation rate — each metric only means something once the one before it is non-zero:

Coverage near zero on key clusters? Fix existence first. Create the use-case and comparison content that connects your brand to those prompts, and get onto the third-party sources models already cite there. Nothing else matters until the models know you belong in the answer.
Covered but mention rate low? You're a marginal pick — models know you but don't default to you. Deepen the evidence: more corroborating sources, consistent positioning language across your site, review profiles and PR, so every source tells the same story.
Mentioned but position 4+? Strengthen differentiation signals. Models put brands earlier when sources describe them as the specific answer for the prompt's context ("best for agencies"), not a generic alternative.
Visible but sentiment souring? Trace the negative phrasing to its source documents and fix it there — this is reputation work, not content volume.
All healthy but citation rate flat? Make your pages quotable: clear definitions, original data, scannable structure — the things that get a domain cited rather than just paraphrased.

This sequencing is also the honest answer to "how do we get recommended by ChatGPT": there's no single switch, but coverage → mentions → position is the order in which recommendations are actually earned.

How to Report AI Visibility Metrics to Executives

The fastest way to fund this work is to translate the six numbers into language an SEO-literate CMO already trusts:

AI visibility metric	Closest classic equivalent	One-line framing for the deck
Mention rate	Impressions	"How often AI shows us to buyers"
AI share of voice	Share of voice / visibility index	"Our slice of AI recommendations vs. competitors"
Average position	Average ranking	"Where we sit on AI's shortlist"
Sentiment	Brand sentiment tracking	"What AI tells buyers about us"
Citation rate	Referring domains	"Whether AI treats our site as a source"
Prompt coverage	Keyword coverage	"The share of buyer questions we exist for"

Three rules keep the dashboard credible:

Lead with share of voice and coverage. Executives grasp a zero-sum share and "% of buyer questions we appear for" instantly; position and sentiment belong in the diagnostic appendix.
Pair answer-side metrics with a demand proxy. AI-influenced buyers rarely click — they search your brand or type the URL days later. Track branded search volume and direct signups in the weeks after coverage gains, and add a "How did you hear about us?" field to signup; in our experience that free-text field is where ChatGPT-sourced pipeline first becomes visible.
Defend the lag with leading indicators. Show Perplexity and Copilot moving in weeks 2–4 as evidence the same gains will reach ChatGPT by weeks 8–12. It turns "nothing's happening on ChatGPT" into a timeline, not a verdict.

Common Measurement Mistakes

Six errors we see repeatedly in teams' first quarter of tracking:

Blending platforms into one score. A 20% blended mention rate can mean 40% on Perplexity and 4% on ChatGPT — two completely different problems.
Trusting single-run snapshots. Given that most brands flicker in and out of consecutive answers, anything less than a week of sampling is anecdote.
Tracking branded prompts only. "Is [YourBrand] good?" measures reassurance, not discovery. Unbranded prompts are where deals start.
Chasing SoV before coverage. Share of voice on the 30% of prompts you appear for ignores the 70% where you don't exist.
Editing the prompt set mid-quarter. Every prompt change resets your baselines. Version the set, change it at quarter boundaries, and re-baseline when you do.
Reporting levels instead of trends. A 14% mention rate is neither good nor bad in isolation; 11% → 19% over eight weeks, tied to shipped actions, is a defensible result.

Frequently Asked Questions

How often should you measure AI visibility?

Daily sampling, weekly reporting. AI answers reshuffle constantly — only about 30% of brands persist from one answer to the next per AirOps' data — so daily runs are needed to separate signal from noise, while weekly aggregates are stable enough to trend and report.

How many prompts do you need for reliable AI visibility metrics?

50 is a workable floor; 100–150 is the sweet spot for a B2B category. You want enough prompts per cluster (5–10) that coverage gaps are clearly structural rather than random, without ballooning the set so large that nobody acts on the gap list.

What's a good mention rate in ChatGPT?

On unbranded category prompts, the median B2B SaaS brand in MaxAEO's tracking lands around 14%, top-quartile brands exceed 45%, and category leaders reach 55–75%. Calibrate against your named competitors on the same prompt set rather than a global average — and expect your Perplexity rate to run higher than ChatGPT's at the same level of authority.

What is a good AI share of voice?

In MaxAEO's B2B SaaS tracking, the median brand holds about 6% of category mentions; 20%+ is strong, and category leaders hold 25–40% position-weighted. Because share of voice is zero-sum, the practical benchmark is the named competitor you lose deals to — "good" means more than them, trending up.

What's the difference between a mention and a citation?

A mention is your brand named in the answer text; a citation is your domain linked as a source. They move independently: mentions build consideration even with zero clicks, citations drive referral traffic and reinforce future answers. Track both — a mentioned-but-never-cited brand depends entirely on how third parties describe it.

Which AI platforms should you track first?

Start where buyers actually ask: ChatGPT (largest assistant usage), Google AI Overviews and AI Mode (largest search reach), Perplexity (research-heavy evaluation) and Copilot (Microsoft-centric enterprises). Add Claude, Gemini and Grok once those are instrumented — in our tracking they add nuance, not a new strategy.

Can you track AI visibility without a tool?

Yes, at small scale: run 10–15 core prompts on two or three platforms monthly, log mentions, positions and cited sources in a spreadsheet. It will catch big shifts. What it can't do is daily volatility smoothing, eight-platform coverage or competitor SoV — the parts that make the numbers defensible in a budget review. Tool scoring conventions vary too; Semrush's AI toolkit, for instance, normalizes visibility to a 0–100 index, so always confirm definitions before comparing numbers across tools.

Six numbers, one discipline: mention rate, share of voice, average position, sentiment, citation rate and prompt coverage — sampled daily, reported weekly per platform, tied to the actions you shipped. Teams that track AI visibility metrics this way stop guessing whether AI recommends them and start engineering it. MaxAEO runs this exact measurement loop across eight AI platforms and tells you which fix to ship next.

This article was created with AI assistance and reviewed by a human editor.