If you only have 30 seconds: Measuring AI visibility takes four steps: pick a probe set of buyer-stage queries, run them across all 10 AI engines, score how often your brand appears as a named recommendation, and tag every Ghost Intent. Manual measurement works for one-time spot checks. For continuous measurement, Ninar AI automates the entire flow with a free tier you can use without a credit card.
Why Measurement Matters
You can't improve what you don't measure. Every AI visibility program starts with a baseline: where does your brand stand right now across the engines that matter? Without that baseline, every subsequent action is a guess.
The challenge is that AI visibility doesn't measure like SEO. There's no rank position to track. There's no clear “impression count.” Buyers ask AI engines questions and the AI returns named recommendations — either you're in the answer or you aren't. Measurement requires a structured approach.
This guide walks through the exact methodology, with the specific prompts, scoring framework, and tools you can use today.
Step 1: Build Your Probe Set
A probe set is the list of queries you'll test. The right probe set covers the entire buyer journey for your category, using language that real buyers use.
Cover All Eight Buyer-Journey Intents
Build at least one probe per intent:
- Pricing: “How much does [your category] cost?”
- Recommendation: “What's the best [your category] for [persona]?”
- Comparison: “Should I use [category leader] or [alternative category leader]?”
- Top Tools: “Top 10 [your category] in 2026”
- How-To: “How do I [solve problem your category addresses]?”
- Use Case: “What's the best tool for [specific scenario]?”
- Trust: “Is [your brand] reliable?”
- Local: “Best [your category] in [city]”
Add Multiple Variations Per Intent
Buyers ask the same question many ways. For each intent, run 2-3 phrasing variations to get a more reliable signal. The variations matter because AI engines often interpret subtly different phrasings differently.
Use Real Buyer Language
Skip industry jargon. Probe sets should sound like a buyer typing into ChatGPT, not like marketing copy. “Best AI visibility platform for a small marketing team” is a real query. “Optimal Generative Engine Optimization solution for SMB marketing teams” is not.
Step 2: Run Probes Across All 10 AI Engines
This is where manual measurement gets painful. The 10 engines that matter in 2026:
- ChatGPT (chat.openai.com)
- Claude (claude.ai)
- Gemini (gemini.google.com)
- Perplexity (perplexity.ai)
- Grok (inside X / grok.x.ai)
- DeepSeek (chat.deepseek.com)
- Microsoft Copilot (copilot.microsoft.com)
- Meta AI (inside WhatsApp / Instagram)
- Google AI Overviews (top of Google Search results)
- Google AI Mode (Google Search → AI Mode toggle)
For a probe set of 8 intents with 3 variations each (24 probes), running all 10 engines manually is 240 individual queries. Most teams give up before finishing.
Ninar AI runs all 240 probes concurrently in a single scan and writes the responses to its database for analysis — the same workflow that would take a person two days takes the platform a few minutes.
Step 3: Score Each Probe
For every probe response, capture three things:
Was Your Brand Mentioned?
The most basic signal. Did the AI engine name your brand at all? If yes, mark a 1. If no, mark a 0.
How Was It Mentioned?
Distinguish three mention types because they're worth very different amounts:
- Direct recommendation (highest value) — AI explicitly recommended your brand as the answer (“You should try X”)
- Implicit mention (medium value) — your brand was named as one option among several
- Passing mention (lowest value) — your brand appeared in context but wasn't recommended
What Sources Were Cited?
For engines that show citations (Perplexity, AI Overviews, Gemini), capture which domains the AI cited when answering. This becomes critical for diagnosis — the same domains that show up for competitor recommendations are the ones you need to land in.
Step 4: Calculate Your Visibility Score
A reasonable formula for a single-engine visibility score (0-100):
- Direct recommendation: +5 points per probe
- Implicit mention: +2 points per probe
- Passing mention: +1 point per probe
- No mention: 0 points
Sum across all probes for a given engine, normalize against the maximum possible score, multiply by 100. Repeat per engine. Aggregate by averaging across engines for a unified score.
For more advanced scoring, weight Decision-stage probes (Pricing, Recommendation, Comparison, Top Tools) higher than Awareness-stage probes (How-To, Trust). Ninar AI uses an intent-weighted scoring model with Decision intents at 1.2x and Awareness intents at 0.85x weight.
Step 5: Tag Every Ghost Intent
For each intent category, look at the per-intent score. If a specific intent scores zero or near-zero across multiple engines but your brand appears in awareness queries, that's a Ghost Intent — AI knows you exist but won't recommend you for that specific buyer query.
Ghost Intents are the most actionable diagnostic output. They tell you exactly which content gaps to fill first.
Step 6: Run Variance Tests
AI responses are stochastic. The same probe run twice can return different answers. To make sure your scores are reliable, run variance tests: run the same probe 3-5 times against the same engine and check whether the brand mention pattern is stable.
Three stability bands:
- Anchored — brand appears consistently across runs (high confidence in score)
- Established — brand appears in most runs but not all (medium confidence)
- Emerging — brand appears in some runs (low confidence; the score is volatile)
Ninar AI runs variance tests automatically and surfaces the stability band per probe.
Step 7: Compare Against Competitors
Visibility scores are most useful in relative context. Run the same probe set with your competitors' brand names. If the same Decision-stage prompts surface your competitor 80% of the time and your brand 5%, you have specific evidence of where you're losing.
Manual competitor benchmarking is tedious. Ninar AI auto-detects competitors mentioned in your probes and scores them in parallel, giving you a competitive map without separate scans.
Step 8: Track Over Time
One scan is a baseline. Real measurement is the trend. Set up regular scans — weekly for active programs, monthly for maintenance — and track:
- Aggregate visibility score (trend up or down)
- Per-engine scores (which engines are gaining vs losing)
- Per-intent scores (which buyer stages are improving vs declining)
- Sentiment Velocity (how fast sentiment is changing about your brand)
- Source citation patterns (which domains are starting or stopping to cite you)
Sentiment Velocity is particularly valuable as a leading indicator. A brand whose sentiment is decelerating in AI answers will see its visibility score decline 30-60 days later. Catching the deceleration early lets you intervene before scores drop.
Common Measurement Mistakes
Mistake 1: Scanning Only One or Two Engines
Single-engine scores give a falsely confident picture. The same brand can score 78 on ChatGPT and 23 on Perplexity. Always measure across all 10 engines.
Mistake 2: Using Brand-Specific Probes Only
Asking AI “Tell me about [your brand]” tests awareness, not recommendation. The probes that matter are category-level queries that don't name your brand — those reveal whether AI surfaces you spontaneously.
Mistake 3: Running Probes Once Without Variance Testing
AI responses are stochastic. A single probe run isn't statistically reliable. Always run multiple iterations to establish a stability band.
Mistake 4: Ignoring Buyer-Journey Intent
A high aggregate score can hide critical Ghost Intents. A brand can win Awareness queries (How-To, Trust) and lose every Decision query (Pricing, Recommendation, Comparison). Per-intent measurement is non-negotiable.
Mistake 5: Treating Visibility as a Snapshot, Not a Trend
One scan tells you where you are today. Three scans over three months tell you whether you're improving. The trend matters more than the baseline.
How to Automate the Entire Workflow
Manual measurement at the depth this guide describes — 10 engines, 8 intents, 3 variations, 5 variance runs, weekly cadence — is unsustainable for any team. Ninar AI automates the entire workflow:
- Pre-built probe libraries per industry (you don't have to write probes from scratch)
- Concurrent scanning across all 10 AI engines
- Automatic scoring with intent weighting
- Ghost Intent diagnosis with severity tagging
- Variance testing with stability bands
- Auto-detected competitor benchmarking
- Sentiment Velocity tracking over time
- Source citation graphs per engine
- Scheduled rescans (weekly, biweekly, monthly)
- Automated alerts when scores drift, new competitors appear, or new sources start citing
The free tier covers 2 engines and is enough for a baseline measurement. The Pro tier ($79/month) covers 4 engines and includes the diagnostic suite. The Scale ($299) and Enterprise ($599) tiers cover all 10 engines.
Frequently Asked Questions
How long does a manual AI visibility measurement take?
For a probe set of 24 probes (8 intents x 3 variations) across 10 engines, manual measurement takes 1-2 days of focused work, with another day for analysis. For weekly cadence, manual measurement is impractical for any team. Ninar AI runs the same workflow in minutes.
Can I use ChatGPT to measure my own AI visibility?
You can run individual probes against ChatGPT to spot-check, but you can't get a complete picture from a single engine. AI visibility requires multi-engine measurement because each engine weights signals differently.
What's the simplest measurement to start with?
Run 5 category-level probes (no brand name in the query) across ChatGPT and Gemini. Note whether your brand appears in the answers. This 10-minute exercise gives you enough signal to know if you have an AI visibility problem worth investigating further.
How do I know if my measurement is statistically reliable?
Run each probe 3-5 times. If your brand appears in most runs, the signal is reliable. If it appears in only 1-2 runs out of 5, your visibility is volatile and the score is less trustworthy. Ninar AI automates this with the Anchored / Established / Emerging stability bands.
Should I measure across multiple geographies?
Yes if you operate in multiple markets. AI engines respond differently in different geographies because grounding signals (citations, search results, regional sources) differ by region. Run city-specific probes for every market you serve. Ninar AI's local module handles 500+ cities natively.
How often should I rescan?
For active AI visibility programs: weekly. For maintenance mode: monthly. For rapid-iteration content programs: daily. Ninar AI offers all three frequencies.
Run your first measurement in 60 seconds. Ninar AI's free tier gives you a baseline AI visibility scan across two engines — no credit card required. Start your free scan →
Ninar AI