Here's the failure pattern nobody talks about in agent-commerce guides.
Step 1: Publish /.well-known/mcp.json with declared callable actions. Feels great. Ahead of the curve.
Step 2: An agent (ChatGPT, Claude Desktop, Gemini copilot) reads the manifest, picks your action, tries to invoke it.
Step 3: The endpoint is broken. Returns 500. Or times out. Or the declared auth is "none" but the endpoint actually rejects unauthenticated requests with a 403.
Step 4: The agent logs the failure. Its framework's retry logic fails again. It downgrades your action-manifest in its discovery weights.
Step 5: For the next N months, that agent framework routes around you for similar queries. You never find out. No error arrives in your inbox. You just slowly lose agent-mediated traffic.
This failure mode is silent and compounding.
What the Agent Action Reliability Probe does
You paste your domain. The tool:
- Fetches all three common agent-manifest paths:
/.well-known/mcp.json,/.well-known/agent-actions.json,/.well-known/ai-plugin.json. - Parses each manifest for declared tools + actions with URLs.
- HEAD-probes each declared action URL, measuring response time, HTTP status, and header compliance.
- Categorizes each action: healthy / slow / broken.
- Computes overall reliability percentage.
- Emits an AI fix prompt that diagnoses root cause per broken endpoint.
The three common failure modes
1. 404 — Endpoint doesn't exist. The manifest declares /api/book-inspection but there's no handler. Classic copy-paste-the-template-but-never-implement failure. Fix: actually implement the endpoint.
2. 405 — Method mismatch. The manifest declares "method": "POST" but the server only accepts GET. Ideal server returns 405 "Method Not Allowed" rather than 404; either way, the declared manifest lies to the agent. Fix: update the manifest to match the actual method, OR update the server to accept the declared method.
3. Slow (>5 seconds). The endpoint works, but response time is terrible. Agents typically time out after 5-10s on action invocations. Persistent slow responses ≈ broken. Fix: add origin caching, async handling with webhook callback, or move compute closer to request.
The 90% reliability threshold
Agent frameworks internally track per-site reliability. Claude Desktop's MCP client, ChatGPT's Responses API agent path, Gemini's tool-call layer — all keep rolling reliability metrics per declared tool.
Observable pattern: at <90% reliability over 2-3 invocation attempts, agents begin deprioritizing your tools in future queries. At <70%, they effectively route around you.
Aim for 95%+ reliability. It's the difference between being the agent's preferred tool and being its fallback.
The monitoring cadence
Declared actions need weekly monitoring. Endpoints break when:
- Deploy shipped a bug
- Database schema migration broke a query
- Third-party dependency (payment processor, scheduling service) had an outage
- Rate limit tightened and legitimate calls now get 429'd
A weekly probe catches all of these within 6-7 days instead of 30+. The cost savings from not-losing-agent-priority dwarfs the 5 minutes of weekly probing.
Cron schedule: every Monday morning, probe every declared action. Alert if any drops below 95% reliability or if any new action becomes broken. Re-run after every deploy.
The auth-lying failure
The sneakiest failure mode: manifest declares "auth": "none" but the endpoint requires an API key. A polite agent trying a no-auth call gets 401/403. Agent gives up.
Two fixes, both legitimate:
- Change the manifest to match reality. If your endpoint genuinely needs auth, declare it. Agents that support auth flows will supply credentials.
- Change the endpoint to honor "no auth." For read-only verify / search / contact actions, dropping auth is usually safe. Rate-limit hard instead.
The worst option: leave the mismatch. Agents don't debug; they deprioritize.
The proxy-behind-caching complication
If your actions are behind a CDN with aggressive caching, HEAD probes may return stale 200s even when the actual POST endpoint is broken. The probe catches most failures but not all.
For high-stakes actions (payment-involved, customer-data-involved), supplement this heuristic probe with purpose-built integration tests that exercise the full round-trip including authentication, request body, response parsing. Those tests belong in CI / pre-deploy checks.
Related reading
- MCP Advertising Audit — discover manifests (upstream)
- Agent Invocation Sitemap Generator — publish the manifests
- Agentic Commerce Readiness — broader agent-commerce readiness
- MCP Server Audit — full MCP implementation review
Fact-check notes and sources
- Model Context Protocol: modelcontextprotocol.io
- HEAD method semantics: RFC 9110 Section 9.3.2
- Agent framework reliability behavior: observational, synthesized from Claude Desktop, ChatGPT Responses API, and Gemini tool-call patterns as of early 2026
This post is informational, not engineering-consulting advice. Mentions of Anthropic, OpenAI, Google, Model Context Protocol are nominative fair use. No affiliation is implied.