Small LLMs Are Eating the Edge

Small is not weak

A 7 billion parameter model in 2026 outperforms what a 70 billion parameter model could do in 2024. The progress is mostly about better data and better training, not raw scale. For a lot of product surface area, small is now the right answer.

Where small wins

Routing. Pick which agent or tool handles a request.
Classification. Tag, score, and bucket inputs.
Extraction. Pull structured fields out of messy text.
Summarisation. Compress a thread or document.
Rewriting. Tone changes, grammar, light editing.

Production pattern: Use a small model to decide what to do, and only call a large model for the part that truly needs reasoning. The cost savings are dramatic and quality often improves.

Where small still loses

Multi-step reasoning over long context.
Open-ended writing that needs taste.
Tool-heavy agents with branching plans.

How we deploy them

For server workloads we run small models on a single GPU with vLLM and batch heavily. For on-device we use Ollama on Mac, llama.cpp on Linux servers, and ONNX runtime on Windows. For mobile we lean on Phi and Gemma quantised to 4-bit.

A simple rule

If you can write the task spec on a sticky note, a small model can probably do it. If you cannot, you need a bigger model or a smarter pipeline.

For the model shortlist, see Open-Source LLMs That Matter. For the production architecture, see Designing AI Applications That Survive Production.

Small LLMs Are Eating the Edge

Small is not weak

Where small wins

Where small still loses

How we deploy them

A simple rule

Mubashir

Want to ship something like this on your product?

Table of Contents

Related Insights

Thinking in AI: From Deterministic to Probabilistic Systems

User Experience (UX) Testing

The AI Pair Programmer: Coding in the Age of LLMs

Related Insights

Thinking in AI: From Deterministic to Probabilistic Systems

User Experience (UX) Testing

The AI Pair Programmer: Coding in the Age of LLMs

Small LLMs Are Eating the Edge

Small is not weak

Where small wins

Where small still loses

How we deploy them

A simple rule

Related reading

Mubashir

Want to ship something like this on your product?

Table of Contents

Related Insights

Thinking in AI: From Deterministic to Probabilistic Systems

User Experience (UX) Testing

The AI Pair Programmer: Coding in the Age of LLMs

Related Insights

Thinking in AI: From Deterministic to Probabilistic Systems

User Experience (UX) Testing

The AI Pair Programmer: Coding in the Age of LLMs