Last month I ran an audit on a real estate site with 38 listing pages and roughly 400 photos. The sitemap was clean. The robots.txt was clean. Perplexity was citing the site about half the time for "houses for sale in [city]." But Perplexity Images was citing zero of the photos, and neither was Google Images, and neither was the multimodal side of Gemini.
The fix was a second sitemap: /sitemap-image.xml. Built, deployed, referenced from the sitemap index. Two weeks later the image citations started showing up.
What sitemap-image.xml Actually Is
The standard XML sitemap at /sitemap.xml uses the http://www.sitemaps.org/schemas/sitemap/0.9 namespace and lists page URLs. That is the whole job.
The image sitemap extension adds a second namespace, http://www.google.com/schemas/sitemap-image/1.1, and lets you group one or more <image:image> entries under each <url>. The shape is per-page, not per-image: a single <url> entry says "this page exists, and here are the images on it."
That per-page grouping is the important part. It is what tells a multimodal retriever "the photo of the kitchen renovation on example.com/kitchen-remodel is on this page, with this caption, at this URL." Without it, the retriever has a pile of <img> tags and no reliable map between images and the pages they belong to.
Why Multimodal AI Cares
Perplexity Images, Gemini's multimodal mode, Kagi's image search, and Google Images all ingest image sitemaps when they are available. They do not require them — an image-rich page without a sitemap will still get some images indexed — but the sitemap gives the retriever three things it would otherwise have to guess:
- Which images are canonical. A page may have 40
<img>tags including icons, decorative SVGs, social buttons, and one hero photo. The sitemap lists only the ones that matter. - Which page owns the image. When the same photo appears on five pages via a shared component, the sitemap tells the retriever which URL to cite.
- That the image is intended to be indexed. Hotlinked images, ad pixels, and tracker gifs do not appear in your sitemap. That absence is itself a signal.
For AI answer engines that return image thumbnails alongside cited text, those three signals are the difference between "sometimes surfaces your photo" and "names your page as the image source."
The Structure I Use
A minimal working sitemap-image.xml:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:image="http://www.google.com/schemas/sitemap-image/1.1">
<url>
<loc>https://example.com/kitchen-remodel</loc>
<image:image>
<image:loc>https://example.com/images/kitchen-before.webp</image:loc>
<image:title>Kitchen before renovation</image:title>
<image:caption>Original 1978 kitchen with oak cabinets.</image:caption>
</image:image>
<image:image>
<image:loc>https://example.com/images/kitchen-after.webp</image:loc>
<image:title>Kitchen after renovation</image:title>
<image:caption>Quartz counters, white shaker cabinets, open shelving.</image:caption>
</image:image>
</url>
<url>
<loc>https://example.com/bathroom-remodel</loc>
<image:image>
<image:loc>https://example.com/images/bathroom-after.webp</image:loc>
<image:title>Primary bathroom remodel</image:title>
</image:image>
</url>
</urlset>
Each <url> is one page. Each <image:image> inside it is one photo on that page. Title and caption are optional but I always fill them in — those strings are what a multimodal retriever will associate with the image when it builds a citation card.
The Sitemap Index Entry
Do not replace your existing sitemap. Add a sitemap index if you do not already have one, and list both files:
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>https://example.com/sitemap.xml</loc>
</sitemap>
<sitemap>
<loc>https://example.com/sitemap-image.xml</loc>
</sitemap>
</sitemapindex>
Then point robots.txt at the index:
Sitemap: https://example.com/sitemap-index.xml
Crawlers that follow the index will discover both. Crawlers that do not understand sitemap indexes will still find /sitemap.xml at the conventional path.
When to Skip It
I do not ship sitemap-image.xml on every site. The rule I use:
- Under 5 total images on the site: skip. Not worth the build step.
- Images are purely decorative (icons, backgrounds, UI chrome): skip. There is nothing for a retriever to cite.
- Single-page portfolio with a gallery: skip. The page itself is the citation target. An image sitemap adds nothing.
- Anything else with real content photos, product images, recipe shots, property listings, project gallery, author headshots: ship it.
The analyzer at /tools/mega-analyzer/ flags this automatically. If it counts 5 or more content images on a site and finds no image sitemap, it raises it as a missing multimodal signal. If the site has fewer than 5 images, the check is skipped.
How I Build It in Eleventy
Every site I run is either Eleventy or plain HTML, so this is the pattern that works for me. Keep a data file listing per-page images, or parse them out of rendered HTML at build time. Then a single .njk template emits the XML:
---
permalink: /sitemap-image.xml
eleventyExcludeFromCollections: true
---
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:image="http://www.google.com/schemas/sitemap-image/1.1">
</urlset>
The same pattern works in Hugo, Jekyll, Astro, or anything else that can iterate collections and emit a file.
The Netlify Content-Type
XML sitemaps work without a Content-Type header because browsers sniff them, but I set one anyway so every crawler sees the right MIME:
# netlify.toml
[[headers]]
for = "/sitemap-image.xml"
[headers.values]
Content-Type = "application/xml; charset=utf-8"
Cache-Control = "public, max-age=3600"
One hour cache is a good default. Rebuilds on deploy invalidate it anyway.
The Short Version
sitemap-image.xmlis a second sitemap that groups images under their parent pages.- Multimodal retrievers — Perplexity Images, Gemini, Google Images — use it to map images to the pages that own them.
- Structure:
<url>per page,<image:image>per photo, with optional title and caption. - Register it in a sitemap index alongside the regular
sitemap.xml. - Skip it if the site has fewer than 5 content images. Otherwise it is a one-time build and a real multimodal signal.