Retrieval Freshness Signal Audit

Retrievers can't tell a 2022 article from a 2026 article by reading the content. Both might say "the current state of the art is..." The difference is in the signals outside the prose.

Six signals retrievers check. Most SMB content emits one or two. Pages that emit all six win citations for time-sensitive queries.

The six freshness signals

1. schema.org dateModified. Present in JSON-LD, ISO format. The single strongest retrieval freshness signal. Claude with web tools, Gemini Grounding, and Perplexity all weight this heavily.

2. "As of [year]" or "Updated [date]" in body. Natural-language freshness markers near the top of the article. Survives markdown-to-HTML pipelines; visible to readers too. Good for both human trust and retrieval.

3. Visible updated timestamp. A visible "Updated April 23, 2026" stamp near the H1 or byline. Rendered into the HTML, parseable by retrievers, reassuring to readers who scan.

4. HTTP Last-Modified header. Server-emitted header. Retrievers that respect HTTP caching semantics check it. Not all retrievers do, but the ones that do weight it strongly.

5. Current-year mention in body. Mentioning "2026" somewhere in the first 500 words. Proxy signal — if the content talks about the current year, it's presumably fresh.

6. Versioning / revision language. Phrases like "updated edition," "v2," "revised 2026," "refreshed" signal ongoing maintenance rather than one-and-done publishing.

What the Retrieval Freshness Signal Audit does

Paste a URL. The tool:

Fetches the page + HTTP headers via the fetch proxy.
Checks each of the six signals independently.
Scores the page 0-100 on overall freshness signaling.
Lists per-signal status with specific fix guidance.
Emits an AI prompt that writes the exact HTML / schema / server-config patches.

Reading the score

80%+: strong. Retrievers pick this page up as fresh for time-sensitive queries. Protect — don't remove the signals when redesigning.

50-80%: typical. Fixing the remaining signals is usually a 30-minute job that produces immediate lift in retrieval eligibility.

Below 50%: retrieval-hostile. The page is essentially invisible to freshness-weighted queries. Retrievers see it as potentially stale and route around it.

The "fresh signals but not fresh content" anti-pattern

Bumping dateModified without actually updating the content is detectable. Claude, Gemini, and retrieval-based evaluators use content-delta heuristics — if a page claims dateModified 2026-04-23 but the text is identical to the 2022 archived version, the claim is ignored.

The rule: freshness signals require actual content updates to be credible. Update the signals AND add/modify 2-3 paragraphs. Add a "What's new in 2026:" section. Update a stats table with current numbers. The signals document real updates, not cosmetic ones.

The per-retriever weighting

Gemini Grounding: dateModified in schema + visible updated timestamp. Both required for strong ranking.

Perplexity: current-year mention + visible timestamp. Less schema-dependent (Perplexity's parser is more text-heavy).

Claude with web tools: dateModified + HTTP Last-Modified. Most rigorous about technical signals.

ChatGPT Browsing: visible date + current-year mentions. Like Perplexity, more text-driven.

Passing all six gets you on every retriever's preferred-source list. Passing just a subset limits you to specific retrievers.

The 30-day freshness-signal upgrade

Week 1: Audit top 20 pages. Catalog which signals are missing per page.

Week 2: Implement dateModified in schema sitewide (via build pipeline or CMS hook). Deploy.

Week 3: Add visible "Updated [Month Day, Year]" elements near H1 on top pages. Configure server to emit Last-Modified header.

Week 4: For each of the top 20, add one "As of [year]" paragraph and one current-year mention in the first 500 words.

At day 30, the top 20 pages should be at 90%+ freshness signaling. Retrievers start treating them as preferred sources for time-sensitive queries within 2-4 weeks.

The evergreen-content maintenance cadence

For content you plan to keep fresh perpetually:

Re-audit every 6 months.
On every material update: bump dateModified, update visible timestamp, add a "What changed" note, update the first paragraph to reference the current year.
On every 12-month anniversary: substantive revision (not just a date bump). Rewrite intro. Update stats. Refresh examples.

News / once-published content: don't game the signals. Leave the original publish date. The retrieval model will treat it as dated and that's correct.

Fact-check notes and sources

schema.org dateModified: schema.org/dateModified
HTTP Last-Modified header: RFC 9110 Section 8.8.2
Retriever freshness weighting: observational across Claude, Gemini, Perplexity, ChatGPT browsing (2024-2026)

This post is informational, not AEO-consulting advice. Mentions of OpenAI, Anthropic, Google, Perplexity are nominative fair use. No affiliation is implied.

Six Ways To Tell A Retriever Your Content Is Current