An automated audit lands in your inbox. Sixty-three "critical" accessibility errors. A scary fail on color contrast. Three trackers your cookie banner supposedly missed. Most small business owners I talk to either ignore the report (because they don't understand it) or pay the auditor's preferred remediation vendor (because they have no way to second-guess the finding).
There's a third option. Reproduce the finding yourself. If you can pull the exact page the auditor pulled, run the same check, and get the same result, that's a real bug worth fixing. If you can't reproduce it, the tool was probably wrong, and the polite move is to write back with evidence so they can update their rule.
This post is the practical guide to doing that without getting yourself in trouble. Headless browser scraping is fine in 2026, but only if you do it in a way that respects the site, the law, and the people running both.
Why automated audits get false positives in the first place
Every automated checker has a coverage ceiling, even when the marketing pages don't lead with it. If you treat a tool's output as ground truth, you'll fix things that aren't broken and miss things that are.
A few well-documented patterns.
axe-core's own docs are honest about this. Deque's published position is that axe finds roughly 57% of WCAG issues automatically and flags the rest as "incomplete" (needs human review) or out of scope (Deque axe-core README, Deque blog post on WCAG 2.2 support). That's not a knock. axe is the most accurate scanner I've used. But "57% automatable" implies a real ceiling and a real surface where it might be wrong.
WAVE scans hidden elements by design. WAVE flags elements hidden via CSS or aria-hidden because hidden elements often surface later through a dropdown or modal. The side effect is a steady stream of false positives on decorative or off-screen content (WAVE help docs). I've seen it flag carousel slides that aren't visible yet, then flag them again after they rotate.
Lighthouse's accessibility score is a sample, not an audit. Google's own docs note that Lighthouse runs a subset of axe-core's rules, that scores don't map directly to WCAG conformance, and that automated tooling generally catches around 30% of WCAG issues (Chrome for Developers, Lighthouse accessibility scoring, GoogleChrome/lighthouse issue 9507). CI green-light, not a compliance certificate.
Pa11y inherits its runner's limits. Pa11y CLI defaults to HTML CodeSniffer or axe-core under the hood (pa11y/pa11y on GitHub), so any false-positive class in either bubbles up. The same page gives different counts from pa11y --runner htmlcs vs pa11y --runner axe because the rule sets disagree.
Cookie and tracker scanners disagree more than you'd expect. Different blocklists tag different scripts as "trackers." A request to a font CDN can register as third-party data exfil on one tool and a normal CDN fetch on another. Two scanners can produce a 30% difference in tracker count without either being wrong; they're answering slightly different questions.
Every scanner has a coverage ceiling and a documented false-positive class. None of this makes the tools bad. It makes them tools, not oracles. You still have to check.
Asking permission before you scrape someone else's site
If the site is yours, skip this section. If you're auditing a vendor, a competitor, or a third party pulled into a finding, read carefully.
The current US legal floor for scraping public web data is set by two cases. Van Buren v. United States, 593 U.S. 374 (2021) narrowed the Computer Fraud and Abuse Act so that "exceeds authorized access" only covers users accessing parts of a system that are off-limits, not users who access permitted data with bad motives (Van Buren opinion at supremecourt.gov, Cornell LII case page). hiQ Labs, Inc. v. LinkedIn Corp. (9th Cir. 2022) applied that narrowing to scraping public web pages, holding that scraping data from publicly accessible parts of a site that don't require an account does not violate the CFAA's "without authorization" prong (Ninth Circuit opinion PDF, Wikipedia summary of the procedural history).
Two caveats. First, hiQ eventually settled with a permanent injunction against further scraping because the case shifted to LinkedIn's contract claim under California Unfair Competition Law. CFAA off the table; ToS-based contract liability very much on it. Second, both cases are US-only; UK GDPR, EU GDPR, and several state privacy laws layer on top.
The summary for a small-business operator: scraping publicly readable data is generally not a federal crime in the Ninth Circuit's view. Violating a site's ToS can still be a contract problem, and scraping anything behind a login or paywall is a different conversation. Asking permission first is cheap insurance.
A real email I send before any third-party scrape:
Subject: Quick heads-up. Auditing a finding on your site
Hi [name or webmaster@],
I'm Josh Watte at jwatte.com. An automated audit flagged
[specific finding] at [specific URL] and I want to verify
before propagating or pushing back.
Plan:
- Puppeteer script identifying as
"JWatteSiteAudit/1.0 (+https://jwatte.com/contact)"
- One fetch every 3 seconds, max 12 pages
- Source IP: my home connection, Mountain time
- Captured: rendered HTML + computed styles only, no PII
- Retention: 30 days, then deleted
- I'll honor robots.txt and any 429 you send
Reply "no" and I'll drop it. Reply "send me the data"
and I'll mail you the JSON when I'm done.
Josh
I get a "fine, go ahead" most of the time. The few who say no usually have a reason (fragile staging, a recent migration, a security review in progress) and now I know not to make their week worse. Either way, the site owner knows who I am if anything strange shows up in their logs. The closest thing to a written norm here is RFC 9309, the Robots Exclusion Protocol, the IETF spec of robots.txt as of September 2022. RFC 9309 doesn't dictate manners; it codifies what a polite crawler is expected to read and obey.
A Puppeteer setup that respects the site
Here is the actual scaffold I use. It identifies itself, honors robots.txt, rate limits, and runs one tab at a time. Nothing fancy.
import puppeteer from 'puppeteer';
import robotsParser from 'robots-parser';
const UA = 'JWatteSiteAudit/1.0 (+https://jwatte.com/contact)';
const SITE = 'https://example-vendor.com';
const PATHS = ['/', '/pricing', '/about', '/contact'];
async function fetchRobots(origin) {
const res = await fetch(`${origin}/robots.txt`, {
headers: { 'User-Agent': UA }
});
const text = res.ok ? await res.text() : '';
return robotsParser(`${origin}/robots.txt`, text);
}
async function sleep(ms) {
return new Promise(r => setTimeout(r, ms));
}
async function audit() {
const robots = await fetchRobots(SITE);
const browser = await puppeteer.launch({ headless: 'new' });
for (const path of PATHS) {
const url = new URL(path, SITE).toString();
if (!robots.isAllowed(url, UA)) {
console.log(`Skipping ${url} per robots.txt`);
continue;
}
const page = await browser.newPage();
await page.setUserAgent(UA);
await page.setViewport({ width: 1280, height: 800 });
const resp = await page.goto(url, {
waitUntil: 'networkidle2',
timeout: 30000
});
if (resp.status() === 429 || resp.status() === 503) {
console.log('Backing off');
await page.close();
await sleep(60000);
continue;
}
const html = await page.content();
// do whatever check you came to do here
await page.close();
await sleep(3000);
}
await browser.close();
}
audit().catch(console.error);
Four things this code does that matter.
It identifies itself in the User-Agent (page.setUserAgent in the Puppeteer docs). The default headless Chrome string includes "HeadlessChrome" and is a fingerprint many sites rate-limit. Setting your own UA labels you correctly and stops you looking like generic automation.
It reads robots.txt and skips disallowed paths (RFC 9309 is the canonical reference for what's allowed in the file).
It rate limits to one request every three seconds. There's no universal rule, but 1-3 seconds is the floor most academic crawlers use and the band most sites tolerate without alerting. If a site returns 429 or 503, back off hard.
It runs one tab at a time. Parallel tabs to the same origin are how you get banned.
What the code does not do is hide. I don't recommend puppeteer-extra-plugin-stealth for this use case (repo at github.com/berstend/puppeteer-extra). The stealth plugin is a great tool when you have explicit permission to scrape a site and you're working around fingerprinting that fights the legitimate use case. It's the wrong tool when you've just emailed the site owner asking to identify yourself in their logs. Pick one or the other.
Reproducing an auditor's finding, the worked example
A real one from last month. An automated scan flagged a contrast violation: "color contrast 3.8:1 on .cta-button, fails WCAG 2.1 SC 1.4.3 AA." The site owner forwarded it asking whether to fix it.
await page.goto('https://example.com/');
const styles = await page.evaluate(() => {
const el = document.querySelector('.cta-button');
if (!el) return null;
const cs = getComputedStyle(el);
return {
color: cs.color,
background: cs.backgroundColor,
fontSize: cs.fontSize,
fontWeight: cs.fontWeight
};
});
console.log(styles);
Output: color: rgb(255, 255, 255), background: rgb(45, 130, 220). Plug those into WebAIM's contrast checker, the industry reference for WCAG contrast. Result: 4.51:1, passes AA.
The auditor's tool said 3.8:1; the rendered page said 4.51:1. The auditor was reading the hover state's CSS rule and computing contrast against the default background. On hover the button background lightens, and against the same white text contrast does drop below 4.5:1. So the finding was real, but it applied to the hover state and the report was mis-labeled.
I wrote back with the computed styles and the source CSS rule. They updated their rule labeling. The site owner spent zero dollars on remediation for the default state and made a small CSS tweak for hover. Real finding, wrong frame, fixed in twenty minutes.
The standard for AA contrast is 4.5:1 for normal text and 3:1 for large text, defined in WCAG 2.2 Success Criterion 1.4.3. WCAG 2.2 became a W3C Recommendation on October 5, 2023 (W3C news announcement).
Cross-tool sanity-checking
Run one tool and you adopt its worldview as gospel. Run two and the disagreements are where the interesting findings live.
| Concern | Tool A | Tool B | What disagreement usually means |
|---|---|---|---|
| WCAG accessibility | axe-core | WAVE | A in B but not A: WAVE scanned hidden elements; possibly false positive. A in A but not B: axe found a structural issue WAVE doesn't check. |
| WCAG accessibility | axe-core | Pa11y | Pa11y switching runners changes the result. Disagreement often means the HTML CodeSniffer rule disagrees with axe. |
| WCAG accessibility | Lighthouse | axe | Lighthouse runs a subset of axe rules, so disagreements usually mean Lighthouse is lighter. Not a real conflict. |
| Cookies and trackers | Ghostery | DuckDuckGo Privacy Essentials | Different blocklists. Ghostery's list is more aggressive; DDG is more conservative on what counts as a tracker. |
| Security headers | OWASP ZAP | Mozilla Observatory | ZAP checks behavior; Observatory checks header values. Disagreements are rare on header presence and common on header quality. |
I won't claim "axe-core finds 47% more WCAG issues than WAVE." I haven't seen a published benchmark I trust on that comparison, and the specific number depends on the test corpus, which is where industry-funded benchmarks usually cheat. What you can claim: when two reputable tools disagree, there's almost always a documented reason in one of their docs, and reading that doc tells you whether the finding is real.
Temporarily relaxing site settings for an auditor's pass
Sometimes the auditor's tool can't get to your site at all. HSTS preload makes it follow HTTP redirects in a way it didn't expect. CSP blocks its instrumentation script. The rate limiter eats its IP after 20 pages.
Three patterns I use, all reversible, all monitored.
HSTS audit window. Don't disable HSTS. Confirm the auditor's tool can follow HTTPS redirects. MDN's Strict-Transport-Security reference covers max-age and preload. Tools that fail on HSTS-preloaded domains are usually following HTTP redirects with broken cookie handling, which is on their side, not yours.
CSP report-only mode. If your strict CSP is blocking the auditor's analysis script, switch to Content-Security-Policy-Report-Only for the audit window (MDN reference). Report-only sends you violation reports but doesn't block. The auditor runs their checks, you see what they triggered, you revert to enforcing mode the moment the audit finishes. This is a "two hours, monitored, reverted same day" pattern, not a long-term relaxation.
Rate-limit allowlist for the auditor's IP. Most rate limiters and WAFs let you exempt a specific IP for a window. Get the source IP, add it, run the audit, remove the entry. Real risk: someone behind the same NAT abuses the exception during the window. Small risk for a small window.
None of these are zero-risk. They're cheaper than refusing the audit, refusing to investigate, or paying for a remediation you didn't need.
Self-checking the tool's output without trusting it
Once you have the rendered page in hand, you can run the same checks the audit tool claims to run.
import { load } from 'cheerio';
const $ = load(html);
// Reproduce: "missing alt text"
const imgs = $('img');
const missing = imgs.filter((_, el) => !$(el).attr('alt') && $(el).attr('alt') !== '');
console.log(`Images: ${imgs.length}, missing alt: ${missing.length}`);
// Reproduce: "empty heading"
const empties = $('h1, h2, h3, h4, h5, h6')
.filter((_, el) => !$(el).text().trim());
console.log(`Empty headings: ${empties.length}`);
If your reproduction count matches the auditor's count, the finding is real. If it doesn't, you have a question: which selector are they using, what state are they testing, are they counting hidden elements you're excluding? Cheerio is the standard server-side jQuery-like parser for Node (cheeriojs/cheerio on GitHub); for browser-environment checks needing full DOM behavior, jsdom is the heavier alternative.
I've used this approach maybe a dozen times in the last year. Roughly half the disagreements were the auditor's tool being slightly wrong. The other half were the rendered page being different from what I assumed (server-side template change, AB-test variant, geo-targeted content). Both halves are useful information.
Where to learn more
A short, opinionated reading list. Each one earns its spot.
- WebAIM contrast checker. The canonical contrast tool.
- MDN HTTP headers reference. The most reliable single page for what every header does.
- W3C WCAG 2.2 spec. The source of truth. Bookmark it. Don't trust third-party summaries.
- Pa11y at pa11y.org. The cleanest CLI accessibility scanner. Easy to script into a custom audit pipeline.
- awesome-puppeteer. Curated list of plugins and example scrapers.
- Puppeteer official API. The only reference for what
page.setUserAgent,page.goto, andpage.evaluateactually do. - RFC 9309 (Robots Exclusion Protocol). Eight pages. Read all of them.
Where this connects on the rest of the site
If you came here from one of the audit tools, the closest companion posts:
- Why I built the WCAG audit tool covers the tool itself and the lawsuit landscape.
- DMCA takedowns and Terms of Use for developers walks the inverse case, when your site is being scraped without permission.
- Modern security headers is the deeper guide to CSP, HSTS, and the headers above.
- Cookie + Storage Drift Audit is the companion tool for the tracker-disagreement section.
For the larger picture of running a small site without consultants, The $97 Launch covers the rest of the under-$100 stack: hosting, DNS, email, search, accessibility.
Fact-check notes and sources
Every specific claim above traces back to a primary source. Grouped by section.
Why automated audits get false positives:
- Deque axe-core README on GitHub and Deque blog: WCAG 2.2 support in axe-core 4.5 for Deque's published 57% automatable-issues claim and zero-false-positive-bug commitment.
- WAVE help docs for documented behavior on hidden-element scanning.
- Chrome for Developers, Lighthouse accessibility scoring and GoogleChrome/lighthouse issue 9507 for Lighthouse's relationship to axe-core and to WCAG.
- pa11y/pa11y on GitHub for runner options and rule-set defaults.
Asking permission and the legal lay of the land:
- Van Buren v. United States, 593 U.S. 374 (2021), official opinion, Cornell LII case page.
- hiQ Labs, Inc. v. LinkedIn Corp., 9th Cir. 2022 opinion PDF, Wikipedia summary including the December 2022 settlement.
- RFC 9309, Robots Exclusion Protocol (IETF, 2022) and the Datatracker entry.
Puppeteer setup:
- Puppeteer API: page.setUserAgent.
- puppeteer-extra-plugin-stealth on GitHub (berstend/puppeteer-extra).
Reproducing the contrast finding:
- WebAIM contrast checker.
- WCAG 2.2 SC 1.4.3 Contrast (Minimum).
- W3C announcement: WCAG 2.2 became a Recommendation on Oct 5, 2023.
Cross-tool comparison:
- axe-core, WAVE help, Pa11y, Lighthouse accessibility scoring, Ghostery, DuckDuckGo Privacy Essentials, OWASP ZAP, Mozilla Observatory.
Temporarily relaxing site settings:
Self-checking the output:
I haven't seen a public, methodologically sound benchmark comparing axe-core's WCAG-issue catch rate against WAVE's or Pa11y's on a shared corpus. If you've got one, please send it. Until then, I'm not going to repeat the percentages I've seen in vendor marketing.
This post is informational, not legal advice. The CFAA, state privacy laws, and the Terms of Service of any site you scrape are the actual binding texts. Mentions of Deque, WebAIM, W3C, Puppeteer, Cheerio, Mozilla, Ghostery, DuckDuckGo, OWASP, Pa11y, and other third parties are nominative fair use. No affiliation is implied.