← Back to Blog

Cross-Reference GSC and Bing Exports Against Your Sitemap — Without Touching OAuth

Cross-Reference GSC and Bing Exports Against Your Sitemap — Without Touching OAuth

Google Search Console has the ground truth on your indexing. Their Coverage report tells you every URL Google has discovered and what status it's in: Indexed, Discovered – currently not indexed, Crawled – currently not indexed, Duplicate without user-selected canonical, Alternate page with proper canonical tag, Page with redirect, Not found (404), and more.

The browser-side audit tools on jwatte.com can probe one URL at a time and find hygiene issues (canonical-to-404, soft 404s, SPA shells, missing noindex on error pages). What they can't do is tell you which specific URLs Google is currently refusing to index and why — that information lives in your Search Console account and requires OAuth to access programmatically.

Until now the gap was: either (A) spend 2-3 weeks building OAuth integration and a backend to handle tokens, which breaks the "browser-only, no infra" principle the rest of the tools follow, or (B) make the user export CSVs from Search Console by hand and import them into a tool that does the cross-reference.

The new Search Console + Bing Importer takes path B. You export, the tool reads, the tool shows you exactly what's not indexed and why.

The export workflow

Google Search Console:

  1. Go to search.google.com/search-console and select your property.
  2. Click IndexingPages.
  3. Click Export in the top-right of the page (the icon next to the help menu).
  4. Choose Download CSV.
  5. The CSV contains two relevant sheets in one file: a summary table and a per-URL table with status. The importer auto-detects which is which.

For impression and click data (separate report):

  1. Click PerformanceSearch results.
  2. Set date range to the past 90 days.
  3. Click Pages below the chart to group by URL.
  4. Click ExportDownload CSV.

Bing Webmaster Tools (optional, but valuable for AI-search discovery):

  1. Go to bing.com/webmasters and select your site.
  2. Click Site Scan in the left nav.
  3. Wait for a scan to complete (or trigger a fresh one).
  4. Export the results as CSV from the scan results page.

Or:

  1. Click URL InspectionSearch for URLs.
  2. Export the per-URL inspection results.

The importer auto-detects both formats.

The cross-reference

With the CSVs loaded and a sitemap URL supplied, the tool does four things:

1. Fetches your sitemap and extracts the canonical URL list.

The sitemap is the authoritative "these are the URLs I want indexed" declaration. Every URL in this list is supposed to be discoverable, crawlable, and indexed.

2. Parses the GSC CSV into per-URL records.

Each URL gets a status field: "Indexed", "Discovered – currently not indexed", etc. The tool classifies each as indexed, not-indexed, or other.

3. Parses the Bing CSV (if supplied) similarly.

Bing's status codes are slightly different but the same classification logic applies. HTTP code 200 + "Indexed: Yes" → indexed. HTTP code 404 → not-indexed with a specific reason.

4. Produces three lists and a fix prompt.

  • URLs in sitemap but flagged not-indexed by GSC. The most actionable list. These are pages you want indexed that Google is actively refusing or deferring. Each carries its GSC reason.
  • URLs in sitemap but missing from the GSC CSV. Either freshly published (Google hasn't discovered yet), or the CSV was filtered on export. These need an IndexNow ping or a GSC URL Inspection → Request Indexing.
  • URLs in GSC but not in sitemap. Stale URLs Google remembers that you've forgotten. Either add them to sitemap (if real) or 410 them (if dead).

The fix prompt assembles the three lists into a structured brief for Claude or ChatGPT, with the specific reason per URL.

What the GSC status codes actually mean

Understanding the status is half the battle. The most common not-indexed reasons:

Discovered – currently not indexed. Google knows the URL exists but chose not to crawl it yet. Usually a quality signal: thin content, insufficient internal links, low domain authority. Increase internal-link depth; add more content; build topical authority. Requesting indexing via URL Inspection helps.

Crawled – currently not indexed. Google crawled the page and decided not to index it. Quality signal with higher confidence than "Discovered". The page had its chance and was rejected. Usually thin content, duplicate content, low E-E-A-T, or lack of unique value. Restructure the page before re-requesting.

Duplicate without user-selected canonical. Google found multiple pages with the same content and picked a different one as canonical. Set an explicit <link rel="canonical"> on the URL you want indexed pointing to itself. Check internal links — are they pointing at the right URL with trailing slashes, correct domain, etc.?

Alternate page with proper canonical tag. Google found the URL and determined it's an alternate of some other canonical URL. Often correct (you genuinely set the canonical) but sometimes wrong due to the canonical-bleed pattern this entire tool suite has been systematically fixing. Run the Mega Analyzer's Indexing Hygiene tab on the flagged URL; it'll probe for the bleed.

Page with redirect. The URL 301s or 302s. Google indexed the destination instead. Verify the redirect is intentional and the destination is what you want indexed. If not, remove the redirect.

Not found (404). Google tried, got a 404. If the URL is in your sitemap, either remove it from sitemap or un-404 it.

Soft 404. The URL returned HTTP 200 but the page content looks like an error page. Fix by serving a real 404 status code.

Server error (5xx). Intermittent 500s. Check server logs; fix whatever's crashing.

Blocked by robots.txt. Self-explanatory. Remove the Disallow rule if you want the URL indexed.

Excluded by 'noindex' tag. Meta robots or X-Robots-Tag says noindex. Remove the directive if you want the URL indexed.

What to do with the three lists

Not-indexed URLs from sitemap: group by reason. The most common reasons — "Discovered", "Crawled", "Duplicate", "Alternate page" — each have specific fixes. Work in batches: all "Discovered" URLs get the same treatment (improve internal linking + request indexing). All "Alternate page" URLs get the Mega Analyzer probe treatment.

Sitemap URLs missing from GSC data: these are either fresh (normal) or stuck (problematic). Submit the top 5-10 via URL Inspection → Request Indexing. Ping IndexNow. If they're still missing after two weeks, check whether they're internally linked — an orphan URL in sitemap with no internal links gets deprioritized.

GSC URLs not in sitemap: review for real content vs. stale. Real content you forgot → add to sitemap. Stale content → 410 Gone (not 404) so Google knows to drop it. Orphan URLs Google discovered via external backlinks but that don't exist on your site can be removed via GSC's Removal tool.

Why this is a browser-side tool and not OAuth

The browser-only stance matches every other tool on jwatte.com. The cost: you have to export by hand. The benefit: no infrastructure to manage, no tokens to leak, no backend to maintain.

For SMB operators running audits once a month, the export step takes 90 seconds. For an agency running audits on dozens of client sites, it takes longer — but it scales linearly (one export per site), and the data stays on the operator's machine. No cross-contamination, no token-management headache, no compliance risk from SaaS tools holding client GSC credentials.

Path A (OAuth integration) is on the roadmap — see the relevant discussion in the bundling strategy — but it's a Phase 2 once the CSV workflow proves the value.

What this catches that single-URL audits miss

A single-URL audit run on a random page on your site can only tell you about that page. It can't tell you:

  • Your sitemap has 400 URLs but GSC says 87 are "Discovered – currently not indexed." That's a 22% indexing-refusal rate, concentrated somewhere.
  • You have 30 "Alternate page with proper canonical tag" hits for URLs that shouldn't be alternates. They all share the same template bug — probably canonical-bleed.
  • 15 URLs are 404-flagged by GSC but still in your sitemap. That's a sitemap-hygiene failure separate from any single-page bug.
  • Google knows about 50 URLs that aren't in your sitemap. Either you forgot half your content or there's crawl leak from external sources.

These are patterns that only surface when you cross-reference the full catalog against the indexing data. One-URL-at-a-time tooling can't see them.

Related reading

Fact-check notes and sources


Export your GSC Coverage CSV today. Run the importer. The gap between your sitemap and your indexing reality is almost always bigger than you expect.

← Back to Blog

Accessibility Options

Text Size
High Contrast
Reduce Motion
Reading Guide
Link Highlighting
Accessibility Statement

J.A. Watte is committed to ensuring digital accessibility for people with disabilities. This site conforms to WCAG 2.1 and 2.2 Level AA guidelines.

Measures Taken

  • Semantic HTML with proper heading hierarchy
  • ARIA labels and roles for interactive components
  • Color contrast ratios meeting WCAG AA (4.5:1)
  • Full keyboard navigation support
  • Skip navigation link
  • Visible focus indicators (3:1 contrast)
  • 44px minimum touch/click targets
  • Dark/light theme with system preference detection
  • Responsive design for all devices
  • Reduced motion support (CSS + toggle)
  • Text size customization (14px–20px)
  • Print stylesheet

Feedback

Contact: jwatte.com/contact

Full Accessibility StatementPrivacy Policy

Last updated: April 2026