Nine AI Mode entity-binding bugs that pass every schema validator

May 14, 2026

Editorial note. The publication date shown above may be in the future. That is intentional. Posts on this site are scheduled against an editorial calendar that aligns with product releases, book launches, and platform-signal timing; the datePublished reflects the date the post is slated to go public, which is also the date indexers and syndication partners should treat as canonical. If you are reading this before that date you were early — welcome.

I started this post after fixing a real-estate site that lost its Google AI Mode citation for the operator's own first and last name. The broader fix lives in the AI Mode entity anchors post. What started as two bugs grew to nine as I worked through twelve sites in May 2026: my own author site, the realtor site, a self-storage client, a property-management site, six book sites in my publishing network, an aggregator app, and an AI publication.

All nine bugs pass schema.org validation. All nine pass Google's Rich Results Test. None of them show up in any audit checklist I have read. Each one is enough on its own to drop a page from AI Mode citation for its own operator's name.

The bugs split into four groups. Bindings (1, 6), graph dilution (2, 5, 7, 8), encoding (3, 4), and identifier consistency (9).

Bug 1: mainEntityOfPage as a bare string

The schema spec says mainEntityOfPage can be a string URL or an object with @id. Most tutorials show the string form because it is shorter:

{
  "@type": "Person",
  "@id": "https://example.com/#author",
  "mainEntityOfPage": "https://example.com/about/"
}

Looks fine. Validates clean. Indexes without trouble. AI Mode does not bind to it.

When mainEntityOfPage is a string, Google reads it as a hint: "the canonical bio for this entity probably lives at that URL." When it is an @id reference object, Google reads it as a binding: "the canonical bio is the entity with that exact @id." If the target @id is also a ProfilePage block on that page, the binding is bidirectional, and AI Mode follows the loop.

The object form:

{
  "@type": "Person",
  "@id": "https://example.com/#author",
  "mainEntityOfPage": {"@id": "https://example.com/about/#profilepage"}
}

Paired with a matching block on the /about/ page:

{
  "@type": "ProfilePage",
  "@id": "https://example.com/about/#profilepage",
  "mainEntity": {"@id": "https://example.com/#author"}
}

Person says "I am profiled at this anchor." ProfilePage says "I describe this person." AI Mode can navigate either direction.

The string form is not broken. It works for indexing. It works for the Knowledge Graph. What it does not do is give AI Mode a strong "this is the canonical profile" signal, which matters for queries where the person's name is the entire query and AI Mode has to pick one URL to cite.

Bug 2: multiple anonymous WebPage blocks competing with the ProfilePage

This one cost my own author site its AI Mode citation the day after I shipped the fix for Bug 1.

The setup. The base template was emitting THREE schema blocks describing the same URL on the /about/ page:

<script type="application/ld+json">
{
  "@type": "WebPage",
  "name": "About J.A. Watte",
  "speakable": { ... }
}
</script>

<script type="application/ld+json">
{
  "@type": "WebPage",
  "name": "About J.A. Watte",
  "isPartOf": {"@id": "https://example.com/#website"},
  "publisher": {"@id": "https://example.com/#organization"}
}
</script>

<script type="application/ld+json">
{
  "@type": "ProfilePage",
  "@id": "https://example.com/about/#profilepage",
  "mainEntity": {"@id": "https://example.com/#author"}
}
</script>

The Person block had mainEntityOfPage correctly pointing at the ProfilePage @id. Bug 1 was already fixed. But sitting alongside the ProfilePage were TWO anonymous WebPage blocks describing the same URL with no @id of their own.

Google's entity reconciler sees three nodes describing this URL. The Person binding points at one of them. The other two compete. There is no rule for which one wins. AI Mode loses confidence that the ProfilePage is the canonical page node for this URL, and the binding weakens.

The fix is to emit exactly one page-type node per URL, with an explicit @id. On / and /about/ that is the ProfilePage. On every other URL it is a single WebPage. Speakable, isPartOf, publisher, dateModified, and breadcrumb all live inside that one node.

{
  "@type": "ProfilePage",
  "@id": "https://example.com/about/#profilepage",
  "name": "J.A. Watte — Author profile",
  "mainEntity": {"@id": "https://example.com/#author"},
  "isPartOf": {"@id": "https://example.com/#website"},
  "publisher": {"@id": "https://example.com/#organization"},
  "speakable": {
    "@type": "SpeakableSpecification",
    "cssSelector": ["h1", "h2"]
  },
  "breadcrumb": {"@id": "https://example.com/about/#breadcrumb"},
  "dateModified": "2026-05-14"
}

Drop xpath from speakable while you are in there. Google's validator mangles standard XPath 1.0 patterns and the cssSelector form is sufficient.

This bug propagates faster than the others because templates copy. If you have multi-locale base templates (base.njk, base-es.njk, base-zh.njk), every locale carries the same bug. If a child layout (blog-post.njk, article.njk) emits its own Article or BlogPosting block, that block ALSO competes with the WebPage from the base layout unless the base skips its own emission for that layout.

Bug 3: Nunjucks autoescape on JSON-LD URLs

This one never trips a parser. The JSON is valid. The URL is structurally well-formed. It just does not resolve.

You store a URL with query parameters in site.json:

{
  "profiles": {
    "narDirectory": "https://directories.apps.realtor/memberDetail/?personId=4837748&officeStreetCountry=US&memberLastName=Watte"
  }
}

You emit it in a Nunjucks template inside a JSON-LD sameAs array:

"sameAs": [
  "{{ site.profiles.narDirectory }}"
]

Looks fine. Builds clean. Here is what actually lands in the rendered page:

<script type="application/ld+json">
{
  "sameAs": [
    "https://directories.apps.realtor/memberDetail/?personId=4837748&amp;officeStreetCountry=US&amp;memberLastName=Watte"
  ]
}
</script>

The & got escaped to &. That is correct behavior for HTML autoescape. Inside a <script type="application/ld+json"> block, HTML entity references are not decoded. So the URL has the literal substring & between the query parameters. The URL does not resolve. The KG crawler cannot follow it. The sameAs anchor is wasted.

This is invisible everywhere except where it counts. Schema.org validates the JSON. Google's Rich Results Test does not flag it.

Fix in Nunjucks:

"sameAs": [
  "{{ site.profiles.narDirectory | safe }}"
]

The | safe filter tells Nunjucks not to autoescape. Inside HTML attributes you do not need this because browsers decode entities in attributes. Inside JSON-LD you do, because <script> blocks do not.

Bug 4: Nunjucks autoescape on every other JSON-LD string field

Bug 3 is the narrow version. The general rule covers every string field inside a JSON-LD block, not just URLs.

Your blog landing page has a pageTitle in frontmatter:

pageTitle: "Blog — Wealth, Business & Real Estate"

You emit it in a BreadcrumbList:

"name": "{{ pageTitle }}",

Renders as:

"name": "Blog — Wealth, Business &amp; Real Estate"

Same root cause. The & got HTML-encoded by Nunjucks autoescape and JSON.parse does not decode HTML entities. The literal substring & ends up in the parsed name. Apostrophes get the same treatment: don't becomes don't. Quotes become ".

Knowledge Graph name displays read these fields. AI Mode citation snippets quote them. Any system that uses the JSON-LD value verbatim ships the encoded entity to the user.

Fix using the | dump | safe pattern:

"name": {{ pageTitle | dump | safe }},

| dump runs JSON.stringify(value) and produces a quoted, JSON-escaped string. | safe keeps Nunjucks from re-escaping the quotes. No surrounding "..." needed because dump includes the quotes.

Apply this pattern to every string field interpolated from site.*, frontmatter, or collection data: name, description, headline, breadcrumb item names, ItemList entries, BlogPosting headline. Hardcoded literals are fine. URL fields specifically still need | safe per Bug 3.

Equivalent in other engines:

Jinja2: {{ var | tojson | safe }} for the JSON-string pattern, {{ url | safe }} for URLs
Handlebars: {{{value}}} (triple-mustache bypasses HTML escape entirely)
Liquid: {{ value | json }} for the JSON-string pattern
Eleventy with Markdown: only Nunjucks blocks inside the template are affected, not the Markdown body

Bug 5: Article paired with a sibling WebPage block

This one turned up on a self-storage client's comparison pages and turned out to be a common hand-coded pattern.

The setup. Four comparison pages each emitted two blocks for the same URL:

<script type="application/ld+json">
{
  "@type": "Article",
  "headline": "Best Self-Storage in Twin Falls 2026",
  "mainEntityOfPage": {"@id": "https://example.com/comparison/#webpage"}
}
</script>

<script type="application/ld+json">
{
  "@type": "WebPage",
  "@id": "https://example.com/comparison/#webpage",
  "speakable": { ... },
  "isPartOf": {"@id": "https://example.com/#website"},
  "publisher": {"@id": "https://example.com/#organization"},
  "breadcrumb": {"@id": "https://example.com/comparison/#breadcrumb"}
}
</script>

Both have @id. The Article references the WebPage via mainEntityOfPage. Schema.org technically allows this. It still counts as two page-type nodes describing the same URL, and Bug 2's dilution logic applies.

The fix is to fold the WebPage fields into the Article and make the Article itself the page-type node:

{
  "@type": "Article",
  "@id": "https://example.com/comparison/#article",
  "headline": "Best Self-Storage in Twin Falls 2026",
  "mainEntityOfPage": "https://example.com/comparison/",
  "speakable": {
    "@type": "SpeakableSpecification",
    "cssSelector": ["h1", "h2"]
  },
  "isPartOf": {"@id": "https://example.com/#website"},
  "publisher": {"@id": "https://example.com/#organization"},
  "breadcrumb": {"@id": "https://example.com/comparison/#breadcrumb"}
}

Article.mainEntityOfPage becomes a bare URL string. That is correct here because the Article IS the page-type node. The object-form rule from Bug 1 applies to entities whose canonical page lives ELSEWHERE (a Person whose bio is on a separate page). When the block itself is the page-type, the URL string is the right form.

Bug 6: inline page-type object inside mainEntityOfPage

Same client. Different files. Different shape of the same dilution bug.

A Python patcher script had been writing this on every blog post:

{
  "@type": "BlogPosting",
  "headline": "Five things to know before renting storage",
  "mainEntityOfPage": {
    "@type": "WebPage",
    "@id": "https://example.com/blog/post/",
    "dateModified": "2026-04-30"
  }
}

The inline {"@type":"WebPage", ...} value of mainEntityOfPage looks like a back-reference. But JSON-LD parsers treat any nested object with @type as its own graph node. So that inline WebPage becomes a phantom anonymous (or @id-bearing) WebPage node sibling to the BlogPosting. Two page-type nodes for one URL again.

There are three valid shapes for mainEntityOfPage and one broken one:

// Valid: bare URL string (use when the containing block IS the page-type node)
"mainEntityOfPage": "https://example.com/blog/post/"

// Valid: object @id reference to an existing block elsewhere on the page
"mainEntityOfPage": {"@id": "https://example.com/about/#profilepage"}

// Valid for entities whose page-type lives ELSEWHERE: full @id ref
"mainEntityOfPage": {"@id": "https://example.com/blog/post/#blogposting"}

// Broken: inline typed object — creates a phantom node
"mainEntityOfPage": {"@type": "WebPage", "@id": "...", "dateModified": "..."}

The fix on the BlogPosting case was to use the bare URL string form and pull dateModified up onto the BlogPosting itself:

{
  "@type": "BlogPosting",
  "@id": "https://example.com/blog/post/#blogposting",
  "headline": "Five things to know before renting storage",
  "mainEntityOfPage": "https://example.com/blog/post/",
  "dateModified": "2026-04-30"
}

Bug 7: anonymous author/publisher Organization inside Article and NewsArticle

This is the pattern that hit an AI publication site. The server-rendered NewsArticle template was emitting this on every published article:

{
  "@type": "NewsArticle",
  "headline": "...",
  "author": {
    "@type": "Organization",
    "name": "Apprised News",
    "url": "https://apprised.news"
  },
  "publisher": {
    "@type": "Organization",
    "name": "Apprised News",
    "url": "https://apprised.news",
    "logo": {"@type": "ImageObject", "url": "..."}
  }
}

Both author and publisher are inline Organization objects with no @id. They become phantom anonymous Organization nodes — siblings to the site's canonical #organization block on the homepage. The Knowledge Graph reconciler sees two extra Org entities per article, none of them tying back to the canonical site Organization.

Across an aggregator with a hundred articles, that is two hundred phantom Org nodes weakening the same site's entity unification.

The fix is to reference the canonical Organization by @id:

{
  "@type": "NewsArticle",
  "@id": "https://example.com/article-url/#newsarticle",
  "headline": "...",
  "author": {"@id": "https://example.com/#organization"},
  "publisher": {"@id": "https://example.com/#organization"}
}

The canonical Organization data (name, url, logo, sameAs, all of it) flows in via JSON-LD's @id merge behavior. The Article does not need to re-state any of it. If the author is a specific person, point at their canonical Person @id instead, and let that Person block carry the full data once.

This is the most common bug on server-rendered Article templates where the template author treated each article as a self-contained schema document. JSON-LD is a graph, not a document. Reference the canonical, do not inline a duplicate.

Bug 8: the aggregator-site equivalent of the ProfilePage pattern

Personal-brand sites bind to a Person via a ProfilePage on /about/. Aggregator and publication sites bind to an Organization (or NewsMediaOrganization) via the homepage and /about/. The structural rules are the same. The anchor type is different.

I tried to apply the ProfilePage fix to a news aggregator and realized the canonical entity is not a Person at all. The site is an Organization. The bio is at /about/. The Organization's mainEntityOfPage should reference an AboutPage (or WebPage) on that bio URL, and the AboutPage should reference back via mainEntity.

On the homepage:

{
  "@type": "Organization",
  "@id": "https://example.com/#organization",
  "name": "Aggregator Name",
  "mainEntityOfPage": {"@id": "https://example.com/#webpage"}
}

{
  "@type": "WebPage",
  "@id": "https://example.com/#webpage",
  "mainEntity": {"@id": "https://example.com/#organization"}
}

On the about page:

{
  "@type": "AboutPage",
  "@id": "https://example.com/about/#aboutpage",
  "mainEntity": {"@id": "https://example.com/#organization"}
}

Organization.mainEntityOfPage can move between the homepage WebPage and the about-page AboutPage depending on which page the operator considers the canonical bio. For most aggregators with a thin homepage the AboutPage is the right anchor.

The rules from Bugs 1, 2, 5, 6, 7 all transfer. Single page-type node per URL. Object form for mainEntityOfPage. No inline typed objects. No anonymous Organizations inside Article/NewsArticle author/publisher fields. The only difference is the anchor type — Organization, not Person; AboutPage or WebPage, not ProfilePage.

Bug 9: @id slash-before-fragment inconsistency

This one shipped silently on three of the book sites in my publishing network. The base template was emitting per-site Organization @id without a trailing slash:

"@id": "https://jwatte.com#organization"

site.url was "https://thew2trap.com" with no trailing slash, so the rendered @id was https://thew2trap.com#organization.

The homepage @graph on the same site was emitting:

"@id": "https://jwatte.com/#organization"

Which renders as https://thew2trap.com/#organization. With the slash.

https://thew2trap.com#organization and https://thew2trap.com/#organization look identical to humans. To Google's entity reconciler they are different @id values pointing at different entities. Two Organization nodes per site, fragmented.

The fix is to pick one form everywhere. The convention I shipped across the publishing network is slash-before-fragment:

"@id": "https://jwatte.com/#organization"

With site.url having no trailing slash in site.json, and every interpolation appending / before #. Same rule for cross-site references to the shared Person @id: it is hardcoded as https://jwatte.com/#ja-watte, slash-before-fragment, everywhere it appears across all seven sites.

Schema validators do not care because both URLs are valid URIs. Schema.org does not care because both are valid identifiers per the spec. Google's entity reconciler cares because it merges nodes by exact @id string match.

How to check whether you have any of these

I added detectors to the mega analyzer for Bugs 1, 2, 3, 4, 5, 6, 7, 8, and 9. Every one surfaces in the Schema tab's Entity anchors section with the specific fix copy. The AI Eligibility composite card's "Entity anchors (KG)" cell flags every red as a structural block. If any of them is red, the cell stays red regardless of how many other signals you have.

If you would rather check by hand, paste the live HTML into a JSON parser and walk every <script type="application/ld+json"> block:

const blocks = [...document.querySelectorAll('script[type="application/ld+json"]')];
let pageTypeCount = 0;
let stringEntityHits = 0;
let stringMep = 0;
let inlinePageTypeMep = 0;
let anonAuthorPub = 0;
let slashlessId = 0;

const PAGE_TYPES = /^(WebPage|ProfilePage|AboutPage|CollectionPage|Article|BlogPosting|NewsArticle|ItemPage)$/;
const ARTICLE_TYPES = /^(Article|BlogPosting|NewsArticle)$/;

function walk(n, fn) {
  if (!n || typeof n !== 'object') return;
  if (Array.isArray(n)) return n.forEach(x => walk(x, fn));
  fn(n);
  for (const k of Object.keys(n)) if (n[k] && typeof n[k] === 'object') walk(n[k], fn);
}

function checkStrings(n) {
  if (typeof n === 'string') {
    stringEntityHits += (n.match(/&amp;|&#39;|&quot;/g) || []).length;
  } else if (n && typeof n === 'object') {
    Object.values(n).forEach(checkStrings);
  }
}

blocks.forEach(b => {
  try {
    const data = JSON.parse(b.textContent);
    checkStrings(data);
    walk(data, n => {
      const t = Array.isArray(n['@type']) ? n['@type'][0] : n['@type'];
      if (PAGE_TYPES.test(t || '')) pageTypeCount++;
      if (typeof n.mainEntityOfPage === 'string') stringMep++;
      if (n.mainEntityOfPage && typeof n.mainEntityOfPage === 'object' && n.mainEntityOfPage['@type']) {
        const v = Array.isArray(n.mainEntityOfPage['@type']) ? n.mainEntityOfPage['@type'][0] : n.mainEntityOfPage['@type'];
        if (PAGE_TYPES.test(v || '')) inlinePageTypeMep++;
      }
      if (ARTICLE_TYPES.test(t || '')) {
        ['author', 'publisher'].forEach(f => {
          const v = n[f];
          const ck = vv => {
            if (vv && typeof vv === 'object' && !Array.isArray(vv)) {
              const vt = Array.isArray(vv['@type']) ? vv['@type'][0] : vv['@type'];
              if (/Organization|NewsMediaOrganization/.test(vt || '') && !vv['@id']) anonAuthorPub++;
            }
          };
          if (Array.isArray(v)) v.forEach(ck); else ck(v);
        });
      }
      if (n['@id'] && /^https?:\/\/[^\/#]+#[a-zA-Z]/.test(n['@id'])) slashlessId++;
    });
  } catch (e) {
    console.log('parse error in block', b);
  }
});

console.log({pageTypeCount, stringEntityHits, stringMep, inlinePageTypeMep, anonAuthorPub, slashlessId});

Anything more than zero is worth fixing.

What about the validators?

Schema validators answer one question: does this JSON-LD parse and do the required fields exist for the rich-result type. That is a useful question. It is not the same question AI search is asking.

AI search is asking: can I bind this name to a specific URL with confidence, and is this URL the canonical home of that entity. The signals it reads for that question are mostly the same fields, but the form of the values, the count of competing nodes, and the consistency of identifiers matters in ways validators do not check.

The general lesson. When you make a schema change that looks safe to a validator, also: count the page-type nodes describing the same URL, grep the rendered HTML for HTML-encoded entities inside JSON-LD blocks, check that mainEntityOfPage is the right shape for the containing entity (object @id ref when the page-type lives elsewhere, bare URL string when this block IS the page-type, never an inline typed object), and check that every @id across the site uses one consistent slash-before-fragment form. All of those checks live below the line that validators draw.

If you build small business websites for clients on a budget, these are the kinds of details that separate "the schema validates" from "the AI cites them by name." The systems and templates I lay out in The $20 Dollar Agency include patterns for all nine, plus the audit habits that catch this category of bug before deploy.

Nine AI Mode entity-binding bugs that pass every schema validator

Bug 1: mainEntityOfPage as a bare string

Bug 2: multiple anonymous WebPage blocks competing with the ProfilePage

Bug 3: Nunjucks autoescape on JSON-LD URLs

Bug 4: Nunjucks autoescape on every other JSON-LD string field

Bug 5: Article paired with a sibling WebPage block

Bug 6: inline page-type object inside mainEntityOfPage

Bug 7: anonymous author/publisher Organization inside Article and NewsArticle

Bug 8: the aggregator-site equivalent of the ProfilePage pattern

Bug 9: @id slash-before-fragment inconsistency

How to check whether you have any of these

What about the validators?

Related reading

Send a Message