I was auditing a research-heavy blog that cites dozens of sources per post. The JSON-LD was correct. The bibliography at the bottom of each post was clean. Perplexity was citing the site once in a while. Claude was not citing it at all.
I opened view-source on one of the posts and noticed that every quoted passage was marked up like this:
<div class="quote">
<p>"The majority of deployments we observed failed within 30 days." — Smith Research, 2024</p>
</div>
No <blockquote>. No <cite>. No cite attribute. A div with a class and an em dash. The content was there. The semantics were not.
We changed the markup and the pattern of AI citations changed with it.
What the Two Elements Actually Do
HTML has two dedicated elements for attribution, and they have been in the spec since HTML 4.
<cite> marks the title of a referenced work. A book, an article, a research paper, a website name. The content of the <cite> element is the name of the source. It is not meant to hold the author's name, not meant to hold the URL, just the name of the work being cited.
The data comes from <cite>Smith Research Quarterly</cite>.
<blockquote> marks a block-level quoted passage, and its cite attribute holds the URL of the source:
<blockquote cite="https://example.com/smith-research-2024-report">
<p>The majority of deployments we observed failed within 30 days.</p>
<footer>— <cite>Smith Research Quarterly</cite>, 2024</footer>
</blockquote>
The cite attribute is the machine-readable URL. The <cite> element inside the footer is the human-readable source name. Together they form a complete, parseable attribution.
Why AI Engines Treat This as a Source-Quality Signal
When an AI citation engine retrieves a page and has to decide whether the content is well-sourced or not, one of the heuristics it uses is the density and quality of outbound attributions. Pages that cite their sources correctly tend to be more reliable than pages that do not.
But the engine has to be able to find those citations. A quote inside a div with a class is effectively invisible to source-quality heuristics. A <blockquote> with a cite URL and a nested <cite> element tells the engine three things at once: this is quoted material (not original to the page), it comes from a specific source (name in <cite>), and that source is at a specific URL (cite attribute).
That is the difference between a page that looks like it has sources and a page that is provably citing specific works. AI engines weight the second one higher, and they pass that weight through: pages that cite quality sources correctly tend to be cited themselves more often.
Raw JSON-LD vs Visible Semantic Tagging
People sometimes ask why this matters if the citations are already in JSON-LD as a citation array on the Article schema. The answer: JSON-LD and HTML semantics serve different purposes.
JSON-LD is what search engines and schema-aware parsers read to build knowledge graphs. It is out-of-band metadata. It is correct to ship.
HTML semantics are what in-band crawlers, readability parsers, and the content-extraction layer of AI retrieval read. They operate on the rendered DOM. When Perplexity pulls a paragraph of your article as a quoted citation, it cares about whether the paragraph itself is marked up as quoted-from-another-source or as your original writing. JSON-LD tells it what sources you reference; HTML semantics tell it which sentences in the page are quoted versus original.
You want both. They corroborate each other.
A Realistic Before-and-After
Before:
<p>According to Smith Research, "the majority of deployments we observed failed within 30 days." This matches what I have seen in my own audits.</p>
After:
<p>Smith Research found the same pattern I have seen in my own audits:</p>
<blockquote cite="https://smith-research.example/reports/2024-deployments">
<p>The majority of deployments we observed failed within 30 days.</p>
<footer>— <cite>Smith Research Quarterly</cite>, 2024</footer>
</blockquote>
<p>My audits show roughly the same 30-day threshold.</p>
Same information, different machine-readable value. The second version explicitly marks what is quoted, what is original, where the quote came from, and what work it is from. An AI extracting a citation has an unambiguous model.
When to Use <cite> Without <blockquote>
Not every mention of a source is a block quote. Sometimes you reference a work inline without quoting it:
The methodology I use is adapted from <cite>The Pragmatic Programmer</cite>.
That is a correct use of <cite> standalone. No <blockquote>, no cite attribute — this is a mention, not a quotation.
If you want to link to the source (for a website or online article), wrap the <cite> in an anchor:
Data from <a href="https://example.com/report"><cite>Annual Deployment Report</cite></a>.
The <cite> still marks the name of the work. The anchor handles the link.
The Inline Quote Case: <q>
For short inline quotations, there is a third element, <q>, which is the inline cousin of <blockquote>:
The report concluded that <q cite="https://example.com/report">most deployments failed within 30 days</q>.
Browsers render <q> with automatic quotation marks, so you do not type them yourself. The cite attribute works the same way as on <blockquote>.
I use <q> less often than <blockquote> because most of my quoted passages are block-level, but for a sentence-embedded quote, <q cite=> is correct and machine-readable.
Why "According to [Source]" Text Is Not a Substitute
The phrase "according to Smith Research" in a paragraph is not parseable. It is natural language, it varies in form ("per", "as reported by", "via", "the team at X told me"), and there is no reliable way for an AI to extract the citation target from it alone. Engines have gotten better at this kind of extraction, but the accuracy is lower than reading a cite attribute or a <cite> element.
If you write "according to Smith Research" in your prose, that is fine — the prose is for humans. But pair it with the actual semantic markup. The phrase and the markup corroborate each other, and the engine picks up both.
Where the Analyzer Flags This
The site audit at /tools/mega-analyzer/ scans rendered HTML for quoted-content patterns. If it detects paragraphs that look like quotations (indented div patterns, em-dash attributions, quotation-mark-wrapped sentences over a certain length) but finds no <blockquote>, <q>, or <cite> elements on the same page, it flags the mismatch as a semantic-markup gap.
The fix is to rewrite the quoted sections using the proper elements. On a content-heavy blog this is an afternoon of find-and-replace. On a fresh site it is a template pattern you adopt once and never think about again.
The Short Version
<cite>marks the name of a referenced work. Nothing else.<blockquote cite="URL">marks a block-level quotation and its source URL.<q cite="URL">is the inline-quotation equivalent.- Divs with class names do not carry the same signal. AI engines read the semantic elements.
- JSON-LD and HTML semantics both matter; they corroborate, they do not replace each other.