Agent Action Reliability Probe

Here's the failure pattern nobody talks about in agent-commerce guides.

Step 1: Publish /.well-known/mcp.json with declared callable actions. Feels great. Ahead of the curve.

Step 2: An agent (ChatGPT, Claude Desktop, Gemini copilot) reads the manifest, picks your action, tries to invoke it.

Step 3: The endpoint is broken. Returns 500. Or times out. Or the declared auth is "none" but the endpoint actually rejects unauthenticated requests with a 403.

Step 4: The agent logs the failure. Its framework's retry logic fails again. It downgrades your action-manifest in its discovery weights.

Step 5: For the next N months, that agent framework routes around you for similar queries. You never find out. No error arrives in your inbox. You just slowly lose agent-mediated traffic.

This failure mode is silent and compounding.

What the Agent Action Reliability Probe does

You paste your domain. The tool:

Fetches all three common agent-manifest paths: /.well-known/mcp.json, /.well-known/agent-actions.json, /.well-known/ai-plugin.json.
Parses each manifest for declared tools + actions with URLs.
HEAD-probes each declared action URL, measuring response time, HTTP status, and header compliance.
Categorizes each action: healthy / slow / broken.
Computes overall reliability percentage.
Emits an AI fix prompt that diagnoses root cause per broken endpoint.

The three common failure modes

1. 404 — Endpoint doesn't exist. The manifest declares /api/book-inspection but there's no handler. Classic copy-paste-the-template-but-never-implement failure. Fix: actually implement the endpoint.

2. 405 — Method mismatch. The manifest declares "method": "POST" but the server only accepts GET. Ideal server returns 405 "Method Not Allowed" rather than 404; either way, the declared manifest lies to the agent. Fix: update the manifest to match the actual method, OR update the server to accept the declared method.

3. Slow (>5 seconds). The endpoint works, but response time is terrible. Agents typically time out after 5-10s on action invocations. Persistent slow responses ≈ broken. Fix: add origin caching, async handling with webhook callback, or move compute closer to request.

The 90% reliability threshold

Agent frameworks internally track per-site reliability. Claude Desktop's MCP client, ChatGPT's Responses API agent path, Gemini's tool-call layer — all keep rolling reliability metrics per declared tool.

Observable pattern: at <90% reliability over 2-3 invocation attempts, agents begin deprioritizing your tools in future queries. At <70%, they effectively route around you.

Aim for 95%+ reliability. It's the difference between being the agent's preferred tool and being its fallback.

The monitoring cadence

Declared actions need weekly monitoring. Endpoints break when:

Deploy shipped a bug
Database schema migration broke a query
Third-party dependency (payment processor, scheduling service) had an outage
Rate limit tightened and legitimate calls now get 429'd

A weekly probe catches all of these within 6-7 days instead of 30+. The cost savings from not-losing-agent-priority dwarfs the 5 minutes of weekly probing.

Cron schedule: every Monday morning, probe every declared action. Alert if any drops below 95% reliability or if any new action becomes broken. Re-run after every deploy.

The auth-lying failure

The sneakiest failure mode: manifest declares "auth": "none" but the endpoint requires an API key. A polite agent trying a no-auth call gets 401/403. Agent gives up.

Two fixes, both legitimate:

Change the manifest to match reality. If your endpoint genuinely needs auth, declare it. Agents that support auth flows will supply credentials.
Change the endpoint to honor "no auth." For read-only verify / search / contact actions, dropping auth is usually safe. Rate-limit hard instead.

The worst option: leave the mismatch. Agents don't debug; they deprioritize.

The proxy-behind-caching complication

If your actions are behind a CDN with aggressive caching, HEAD probes may return stale 200s even when the actual POST endpoint is broken. The probe catches most failures but not all.

For high-stakes actions (payment-involved, customer-data-involved), supplement this heuristic probe with purpose-built integration tests that exercise the full round-trip including authentication, request body, response parsing. Those tests belong in CI / pre-deploy checks.

Fact-check notes and sources

Model Context Protocol: modelcontextprotocol.io
HEAD method semantics: RFC 9110 Section 9.3.2
Agent framework reliability behavior: observational, synthesized from Claude Desktop, ChatGPT Responses API, and Gemini tool-call patterns as of early 2026

This post is informational, not engineering-consulting advice. Mentions of Anthropic, OpenAI, Google, Model Context Protocol are nominative fair use. No affiliation is implied.

Broken Actions Silently Destroy Your Agent Visibility