June 26, 2025

Small but Mighty: Enterprises should take note of small language models

Type

Deep Dives

Contributors

Anthony Spaelti

For most of the current AI boom, progress felt like a straightforward, but expensive, arms race: the bigger the model, the better it is. This held true, especially in the realm of Large Language Models (LLMs), which are responsible for this boom. When we talk about model size, we typically mean the number of “parameters” in the model. These parameters make up the neural networks and other components of modern AI models. Most publicly available models we consider LLMs have more than 100 billion of those parameters, and some (like Meta’s Llama 4 Behemoth or OpenAI’sGPT-4o) have over a trillion parameters. It seems straightforward to think that the more of those parameters there are, the better a model will perform. What it also means is that it’s more expensive to run and train. These models are huge and require hundreds of gigabytes of memory to run. To put this into perspective: Your laptop likely has something in the order of 32GB of memory / RAM. Now multiply this by 10 or 100 and you’re in the realm of what is needed to run these LLMs. 

Back to the question at hand, for the most part, bigger did mean better. But without becoming too philosophical, “better” isn’t always necessary. AI models have a job to do, and if the job to be done can only be achieved by large models, then we should use large models. But turns out, smaller models have become much “better” over the past 18 months as well! It seems they are starting to punch far above their weight class, starting to match or even surpass systems that were ten times larger only a year ago.

We call this new category of models “small language models” (SLM). While there is no consensus on a definition, we consider SLM to contain fewer than 20 billion parameters. It’s not surprising that these models are far more energy and computationally efficient. In this article, we will focus on the impact of 7-8 billion parameter models – that’s a model size that just 18 months ago was more or less useless, but almost all open-source model providers now have a 7 or 8 billion parameter model.

To grasp how dramatic the leap has been, imagine a standardized test that spans fifty-seven university subjects, from medicine to law. In February 2023, Meta’s original Llama 7B model answered around one in three of those questions correctly on that exam, known in research circles as MMLU, a so-called “model benchmark test” that evaluates how good models perform across a variety of subjects. For reference, humans achieve around 25-30% on this general knowledge test, and subject-matter experts around 70-80%. Fast forward to today, and Alibaba’s Qwen2.5 7 billion parameter model finally achieved a score above 70% in September 2024. It is likely that we’re going to see 7 billion parameter models in 2025 that will achieve scores above 80%.

What made this new world possible?

What pushed these lean machines so far, so fast? The answer is threefold: Specialization, better architecture, and more efficient training. 

First, we realized models don’t have to do “everything.” The first generation of LLMs were typically general-purpose and could answer questions about the seven dwarfs in Snow White and talk about advanced nuclear physics. However, especially for enterprise applications, you don’t necessarily need general-purpose models. If you want to automate bookkeeping, you’re probably fine if your model is only good at accounting but bad at naming dwarfs. If you need less knowledge, you also need fewer parameters, since each parameter ultimately holds a little bit of the model’s knowledge. 

The second piece lies in clever architecture. For example, some models use “rounded” numbers to calculate the output from the neural network. This simply means instead of, e.g., ten digits after the comma, the model only uses five. Scaling this to billions of numbers, this saves billions of bytes in the process. There are several other architectural innovations from the past 18 months that, taken together, made 7 billion parameter models really powerful.  

The final critical element is how we feed these improved architectures. The process of adjusting model parameters, effectively “teaching” the model, is called training. At its simplest, training involves taking sets of questions and answers, then tuning model parameters so that, given a specific question, the model reliably produces the intended answer. This imprinting is straightforward when you have billions of parameters because there’s ample “storage” space; it’s acceptable if some parameters are inefficiently tuned. However, every parameter in a smaller model has to “work harder,” making it much more sensitive to noisy or redundant training data. As a result, over the past two years, the community has become obsessed with meticulous data curation: aggressively removing duplicates, ensuring factual accuracy, and filtering out low-quality boilerplate. As the signal-to-noise ratio of training data improves, each parameter captures more meaningful information. This means a well-trained 7 billion parameter model fed pristine data can now match, or even surpass, the performance of models ten times larger but trained on less carefully curated datasets.

Why Enterprises Should Take Note

While the technical milestones are impressive, the business consequences can be even more significant. SLMs are significantly cheaper to run and much easier to deploy.

For example, a national retailer recently replaced a 34 billion parameter classification system with a 7 billion Mistral model for customer support triage. The change trimmed twenty GPU servers down to four – and these cost savings translate into profits 1:1. Or in life sciences, a medical device manufacturer now ships laptops to field reps in hospitals loaded with a specially trained Gemma 7 billion model; the on-device model summarizes regulatory PDFs without the need to connect to, e.g., public hotspots in the hospital and send confidential PDFs over public Wi-Fi to the cloud.

While the model creators used these anonymized examples on their public websites, none of these projects made headlines, yet together they hint at an inflection point… 

For enterprises, the message is simple: the era of “go big or go home” AI is over, or never existed. A right-sized model, trained on the right data and deployed in the right place, can deliver immediate returns while preserving the option to tap larger systems for moonshot projects. The companies that master this “small plus big” strategy—lean local intelligence backed by heavyweight support—will move faster, spend less, and protect their data more closely than competitors still stuck in one-size-fits-all thinking.

Finally, these developments will also advance agentic AI. We will explore in a future article how putting together a number of SLMs can create powerful, enterprise-ready AI agents that can perform tasks incredibly well without hallucinating.

times
#
# #