Building Blocks
5 Min

Agentic Design Pattern: Prompt Routing

Send each request to the model, prompt, or tool that's actually good at it — instead of asking one big model to be good at everything.

Filed underAgentic Design PatternLLMOrchestration
5 Min Read
Building Blocks
On this page

A single prompt asked to "do everything" ends up mediocre at everything. Prompt Routing fixes that. A small, fast classifier reads the incoming request, decides what kind of work it is, and hands it to the prompt, model, or tool built for that job. The router stays dumb on purpose — its only job is to pick a lane.

The hop
  1. 01
    Classify
    A small model labels the request: which lane does it belong in?
  2. 02
    Dispatch
    Send to the specialist prompt, model, or tool for that label.
  3. 03
    Respond
    The specialist answers. The router never sees the output.

You're not building one bigger brain. You're building a triage nurse in front of a hallway of specialists.


When to reach for it

Use it
  • Your agent handles clearly different request types (lookup vs. analysis vs. generation)
  • One prompt is getting bloated with "if the user asks X do Y, if Z do W" branches
  • Specialists outperform generalists — coding model for code, search for facts, cheap model for chit-chat
  • Cost or latency matters enough that sending everything to the biggest model is wasteful
Skip it
  • All requests look the same — a router with one lane is just overhead
  • Your specialist set isn't actually specialized — same model, same prompt, different label
  • Latency budget can't afford the extra hop and a single capable model is good enough
  • You can't write crisp labels — a fuzzy taxonomy makes the router worse than no router

The three roles

The router and specialists are usually different models. The router is cheap and fast; the specialists are whatever they need to be.

Routerclassifies the request into one lane
System Prompt
You are a request classifier. Read the user's message and return exactly one label from this list:

- code: questions about writing, debugging, or explaining code
- research: factual questions that need up-to-date sources
- writing: drafting prose, emails, or long-form content
- chitchat: small talk, greetings, anything off-topic

Rules:
- Return JSON only: { "label": "...", "confidence": 0-1 }
- If no label fits, return { "label": "chitchat", "confidence": 0 }.
- Do not answer the question. Do not explain your choice.
Specialistsone prompt per lane, tuned for that lane only
System Prompt
You are a [code | research | writing | chitchat] specialist.

Rules:
- Assume the router already decided this request belongs to you.
- Do not second-guess the classification. Do not ask "did you mean...".
- Use the tools, sources, and tone appropriate for your lane only.
- If the request truly doesn't fit, return { "escalate": true, "reason": "..." } so the orchestrator can re-route.
Fallbackcatches low-confidence picks and escalations
System Prompt
You are a generalist. You only run when:
- The router's confidence is below 0.5, OR
- A specialist escalated with { "escalate": true }.

Rules:
- Answer directly using a capable general-purpose model.
- Log the request so the taxonomy can be improved later — every fallback is a sign the lanes are wrong, not a sign the system is broken.

Want to skip the router on obvious cases? Add a deterministic pre-filter — regex for URLs goes straight to research, code fences go straight to code — and only call the router when the pre-filter is unsure.


Failure modes

  1. Fuzzy labels.
    If "research" and "writing" overlap, the router flips between them and the user sees inconsistent behavior. Write labels that a stranger could apply.
  2. Router too smart.
    If you let the router reason about the answer, it starts solving the problem itself — and then doing it again in the specialist. Keep the router's prompt boring.
  3. No confidence threshold.
    A low-confidence pick is worse than no pick. Route < 0.5 confidence to the fallback, not the most-likely lane.
  4. Specialists drift.
    Over time, each specialist accretes "just in case" instructions until it's a generalist again. Audit the prompts quarterly and delete anything the lane doesn't need.
  5. Missing escape hatch.
    If the specialist can't escalate, a misrouted request gets answered confidently in the wrong lane. Always give specialists a way to say "not mine".
  6. Latency tax.
    Every request now pays for two model calls. Use a cheap, fast model for the router — and pre-filter the obvious cases deterministically.