Build an AI Job Search Agent and an Inference Engine That Sizes Itself

April 29, 2026

Editorial note. The publication date shown above may be in the future. That is intentional. Posts on this site are scheduled against an editorial calendar that aligns with product releases, book launches, and platform-signal timing; the datePublished reflects the date the post is slated to go public, which is also the date indexers and syndication partners should treat as canonical. If you are reading this before that date you were early — welcome.

Two ideas that belong together. The first: an AI agent that automates the mechanical parts of job searching so you can spend your time on the parts that actually require you to be human. The second: an inference routing layer that automatically sends simple requests to cheap models and complex requests to expensive ones, so the agent doesn't burn through your API budget on tasks that a $0.07-per-million-token model handles just fine.

The job search agent

A job search has five stages. Four of them are mechanical. One requires judgment.

Finding openings — mechanical. Scraping job boards, filtering by criteria, deduplicating across platforms.
Matching qualifications — partially mechanical. Comparing your resume to job requirements, scoring the fit, ranking opportunities.
Tailoring materials — mechanical. Adjusting your resume keywords and writing a cover letter that maps your experience to the specific job description.
Applying — mechanical. Filling forms, uploading documents, clicking submit.
Interviewing — this is the human part. No agent should do this for you.

The existing platforms (Jobright, Loopcv, AIApply, Huntr, JobCopilot) each handle pieces of this pipeline. But building your own gives you control over the logic, the model behind it, and the data. Here's the architecture.

The pipeline

Stage 1: Discovery. A scheduled script that queries job board APIs (LinkedIn, Indeed, Greenhouse, Lever) or scrapes public listings for your target roles, locations, and salary ranges. Store results in a local SQLite database with deduplication on company + role title + posting URL. Run this daily.

Stage 2: Scoring. For each new listing, send the job description and your master resume to the LLM. Prompt: "Score this job on a 1-10 scale for fit with this resume. Consider: skill match, experience level match, industry match, location match. Return the score and a one-sentence explanation." Store the score. Sort by score. Only proceed to stage 3 for listings scoring 7+.

Stage 3: Tailoring. For each high-scoring listing, send the job description and your master resume to the LLM. Prompt: "Rewrite this resume to emphasize the skills and experience most relevant to this specific job description. Adjust keyword density to match the job posting. Write a three-paragraph cover letter that connects my specific experience to their specific requirements. Do not fabricate experience I don't have."

Stage 4: Application. This is the hardest to automate reliably because every company uses a different application form. Platforms like Loopcv and AIApply handle this via browser automation. If building your own, Playwright or Puppeteer can fill forms, but expect to maintain the automation scripts as form layouts change.

Stage 5: Tracking. Log every application in the SQLite database with status (applied, rejected, interview scheduled, offer). Review weekly. Adjust your scoring criteria based on which applications get responses.

The tools

You can build this entire pipeline in Claude Code or Codex. The discovery and scoring stages are scripts the model can write and iterate on. The tailoring stage is a prompt the model runs. The application stage is browser automation the model can generate. The tracking stage is a database query.

The CLAUDE.md pattern works well here: store your master resume, your target criteria, and your application history as context files that the agent loads at session start.

The inference routing engine

The job search agent makes hundreds of LLM calls per day. Scoring 50 job listings, tailoring 5 resumes, generating 5 cover letters. If every call goes to Claude Opus 4.7 at $5 per million input tokens, the daily cost adds up.

But not every call needs Opus 4.7. Scoring a job listing for basic keyword match is a simple task. A $0.07-per-million-token model (DeepSeek V4 Flash) handles it fine. Tailoring a resume to pass an ATS scanner is a medium task. Kimi K2.6 at $0.60 handles it. Writing a cover letter that sounds like a real person who wants this specific job is a hard task. That's where Opus 4.7 earns its price.

The routing engine classifies each request by complexity and sends it to the cheapest model that can handle it.

The architecture

Request → Complexity Classifier → Model Router → Response
                                      ↓
                              Simple → DeepSeek V4 Flash ($0.07)
                              Medium → Kimi K2.6 ($0.60)
                              Hard   → Claude Opus 4.7 ($5.00)

The classifier can be rule-based or model-based:

Rule-based: Define complexity by task type. Scoring = simple. Resume keyword adjustment = medium. Cover letter writing = hard. Straightforward, no overhead, but doesn't adapt to edge cases.
Model-based: Send a one-line description of the task to a cheap model and ask it to classify complexity as simple/medium/hard. Adds one cheap LLM call per request but handles novel tasks better.

Start rule-based. Switch to model-based when you hit tasks that don't fit your rules.

The router maps complexity to model and API endpoint. A simple lookup table:

const ROUTES = {
  simple:  { model: 'deepseek-v4-flash', cost: 0.07 },
  medium:  { model: 'kimi-k2.6',         cost: 0.60 },
  hard:    { model: 'claude-opus-4-7',    cost: 5.00 },
};

The fallback: If the cheap model's response fails a quality check (too short, doesn't contain expected fields, confidence score below threshold), re-route to the next tier up. This catches the cases where the simple model wasn't enough without paying the expensive-model price on every call.

The cost math

Without routing (all calls to Opus 4.7):

50 scoring calls × ~2K tokens each = 100K tokens = $0.50
5 resume tailoring calls × ~5K tokens each = 25K tokens = $0.13
5 cover letters × ~3K tokens each = 15K tokens = $0.08
Daily total: ~$0.71

With routing:

50 scoring calls via DeepSeek Flash = 100K tokens × $0.07/M = $0.007
5 resume tailoring via Kimi K2.6 = 25K tokens × $0.60/M = $0.015
5 cover letters via Opus 4.7 = 15K tokens × $5.00/M = $0.075
Daily total: ~$0.10

That's an 85% cost reduction for equivalent output quality on the tasks that matter. Over a month of daily job searching, the difference is $21.30 vs $3.00.

The inference routing pattern applies well beyond job searching. Any workflow that mixes simple and complex LLM calls benefits from the same architecture. Audit tools, content pipelines, customer support triage, code review. Match the model to the task, not the other way around.

If the job search itself is part of a bigger plan to escape the paycheck dependency cycle, The W-2 Trap covers why most six-figure earners stay broke and what the exit routes actually look like. Search "The W-2 Trap" on Amazon Kindle.

Fact-check notes and sources

Jobright, Loopcv, AIApply, Huntr, JobCopilot: All publicly available AI job search platforms as of April 2026. jobright.ai, loopcv.pro, aiapply.co, huntr.co, jobcopilot.com.
DeepSeek V4 Flash pricing ($0.07/M input): llm-stats.com.
Kimi K2.6 pricing ($0.60/M input): llm-stats.com.
Claude Opus 4.7 pricing ($5.00/M input): artificialanalysis.ai.

This post is informational, not career or financial advice. Mentions of all platforms and model providers are nominative fair use. No affiliation is implied.

← Back to Blog

Build an AI Job Search Agent and an Inference Engine That Sizes Itself

The job search agent

The pipeline

The tools

The inference routing engine

The architecture

The cost math

Related reading

Fact-check notes and sources

Send a Message