# Who Signed Your Image? A Content Credentials (C2PA) Checker

Scan a page or single image URL for C2PA / Content Credentials manifests. Find out which of your images carry a provenance chain and which don&#39;t.

Author: J.A. Watte
Published: April 20, 2026
Source: https://jwatte.com/blog/blog-tool-content-credentials/

---

**TL;DR.** AI search = discovery (classic SEO buys the seat) + retrieval (passage-level chunking buys the citation). AI crawlers do not execute JS — every critical claim must live in the server-rendered HTML.

The **[Content Credentials](/tools/content-credentials/)** is the audit you reach for when you already suspect a problem in this dimension and need a fast, copy-paste-able fix list. It reuses the same chrome as every other jwatte.com tool — deep-links from the mega analyzers, AI-prompt export, CSV/PDF/HTML download — but the checks it runs are narrow and specific to the dimension described above.

> Scan a page or a single image URL for C2PA / Content Credentials manifests. Tells you which images carry a provenance chain (so AI-vs-human origin is verifiable) and which don

## Why this dimension matters

AI search runs in two stages: DISCOVERY (the LLM queries a classic search engine to get ~20 candidate URLs) and RETRIEVAL (it fetches those pages, chunks them into ~150-token passages, and cites whichever chunk best matches the query). Classic SEO buys the seat; paragraph-level structure buys the citation. AI crawlers (GPTBot, ClaudeBot, PerplexityBot, Google-Extended) do NOT execute JavaScript — every critical claim must be in the server-rendered HTML.

## Common failure patterns

- **SPA shell with empty `<div id="root">`** — React / Vue / Angular apps that hydrate on the client look completely empty to AI crawlers. The fix is SSR (Next.js `getServerSideProps`, Nuxt `asyncData`, Svelte Kit load) or prerender / static export for content-heavy pages.
- **Missing `llms.txt` at the site root** — the emerging standard for pointing AI crawlers at your canonical content. Absence is not catastrophic but presence makes your site noticeably easier to retrieve. Pair with `llms-full.txt` for full-content mirroring.
- **AI crawler-blocking in robots.txt without strategy** — blocking GPTBot while allowing Googlebot is a choice; blocking all AI crawlers by default without knowing whether your audience queries ChatGPT / Claude / Perplexity is a cost. Decide deliberately; most content businesses benefit from allowing retrieval crawlers while blocking training crawlers.
- **Paragraphs over 300 words** — each `<p>` is a retrieval unit for the chunker. Target 40–150 words per paragraph. Thinner = no answer match; thicker = split mid-thought and lose coherence at citation time.

## How to fix it at the source

Start with `llms.txt` + `llms-full.txt` at the site root. Audit your robots.txt stance per bot deliberately. Restructure long paragraphs into 40–150-word chunks that each contain a complete claim + evidence pair. Track LLM referral visits via a custom Referrer segment (chatgpt.com, perplexity.ai, claude.ai, gemini.google.com, copilot.microsoft.com) — that is the canonical AEO KPI.

## Thresholds that matter

| Signal | Target |
|---|---|
| Paragraph length (retrieval unit) | 40–150 words. Thinner fails to answer; thicker gets split mid-thought. |
| JSON-LD blocks | 2+ per page (site-wide Org + page-specific type). |
| llms.txt byte size | < 50 KB for fast ingestion; `llms-full.txt` can be larger (1–2 MB). |
| robots.txt per-bot directive | Explicit for GPTBot, ClaudeBot, PerplexityBot, Google-Extended, CCBot, Bytespider, Applebot-Extended. |

## Example fix

_llms.txt starter at site root:_

```markdown
# Your Business

> One-sentence "what this site is" — used by LLM retrievers as the authoritative site description.

## Core content

- [About](https://yoursite.com/about): who you are, why you do this
- [Products](https://yoursite.com/products): catalog with stable URLs
- [Documentation](https://yoursite.com/docs): technical references

## Policies

- [Privacy](https://yoursite.com/privacy)
- [Terms](https://yoursite.com/terms)
- [AI-crawler policy](https://yoursite.com/ai.txt)

## Optional — full content mirror

- [llms-full.txt](https://yoursite.com/llms-full.txt): full canonical content for long-form retrieval
```

## When to run the audit

- After a major site change — redesign, CMS migration, DNS change, hosting platform swap.
- Quarterly as part of routine technical hygiene; the checks are cheap to run repeatedly.
- Before an investor / client review, a PCI scan, a SOC 2 audit, or an accessibility-compliance review.
- When a downstream metric drops (rankings, conversion, AI citations) and you need to rule out this dimension as the cause.

## Reading the output

Every finding is severity-classified. The playbook is the same across tools:

- **Critical / red** — same-week fixes. These block the primary signal and cascade into downstream dimensions.
- **Warning / amber** — same-month fixes. Drag the score, usually don't block.
- **Info / blue** — context only. Often what a PR reviewer would flag but that doesn't block merge.
- **Pass / green** — confirmation. Keep the control in place.

Every audit also emits an "AI fix prompt" — paste into ChatGPT / Claude / Gemini for exact copy-paste code patches tied to your specific stack.

## Related tools in this family

- **[Mega AEO Analyzer](/tools/mega-aeo-analyzer/)** — the AEO orchestrator — 10 dimensions (citation, attribution, retrievability, freshness, tokenizer, prompt-injection, fair-use).
- **[AI Posture Audit](/tools/ai-posture-audit/)** — cross-references robots.txt, ai.txt, meta robots, X-Robots-Tag per bot — flags disagreements.
- **[llms.txt Quality Scorer](/tools/llms-txt-quality-scorer/)** — audits llms.txt structure against the llmstxt.org spec.
- **[AI Crawler Access Auditor](/tools/ai-crawler-access-auditor/)** — simulates each major AI bot's crawl permissions on your site.
- **[RAG Readiness Audit](/tools/rag-readiness-audit/)** — tests how cleanly your pages chunk for enterprise RAG pipelines.

## Fact-check notes and sources

- llmstxt.org: [llms.txt proposed standard](https://llmstxt.org/)
- OpenAI: [GPTBot documentation](https://platform.openai.com/docs/gptbot)
- Anthropic: [ClaudeBot documentation](https://support.anthropic.com/en/articles/8896518-does-anthropic-crawl-data-from-the-web-and-how-can-site-owners-block-the-crawler)
- Perplexity: [PerplexityBot](https://docs.perplexity.ai/guides/bots)
- Google: [Google-Extended opt-out](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers)

*This post is informational and not a substitute for professional consulting. Mentions of third-party platforms in the tool itself are nominative fair use. No affiliation is implied.*


---

Canonical HTML: https://jwatte.com/blog/blog-tool-content-credentials/
RSS: https://jwatte.com/feed.xml
JSON Feed: https://jwatte.com/feed.json
Hero image: https://jwatte.com/images/blog-tool-content-credentials.webp
