Font size:
Print
Shift to Smaller Language Models (SLMs)
Context:
The trend of scaling LLMs began with GPT-3 (175 billion parameters) by OpenAI in 2020, followed by GPT-4 (1.7 trillion parameters). However, by 2024, the focus shifted towards smaller language models due to diminishing returns from scaling LLMs further.
What is SLM?
- Small Language Models (SLMs) are distinguished by their compact generative AI structures.
- They feature fewer parameters and utilise a reduced volume of training data.
- The reduced footprint results in lower memory and processing demands.
- SLMs are well-suited for on-device deployments and applications prioritising resource efficiency.
- SLMs are compact versions of larger language models (LLMs), broadening high-quality model options for customers and offering practical choices.
Development of Smaller Language Models (SLMs):
- In 2024, Big Tech firms began exploring smaller models:
- Google DeepMind released Gemini Ultra, Nano, and Flash models.
- OpenAI and Meta launched smaller versions like GPT-4o mini and Llama 3.
- Anthropic AI launched Claude 3 and Haiku alongside Opus.
Advantages of Small Language Models:
- Cost and Efficiency: SLMs are cheaper, require less time and computational resources, and are ideal for specialized tasks.
- Specific Use Cases: They excel at focused applications rather than general AI tasks, making them suitable for edge devices like smartphones.
- Examples:
- Mistral AI, a startup, offers small models that are as efficient as LLMs for specific use cases.
- Microsoft’s Phi-3-mini, with 3.8 billion parameters.
- Apple Intelligence, running on iPhones and iPads, uses on-device SLMs for specific applications.
Drawbacks of Small Language Models:
- Limited Complexity: While efficient for basic tasks like language translation, SLMs struggle with complex benchmarks such as coding or solving logical problems.
- Performance Ceiling: Smaller parameter counts inherently limit their problem-solving capacity compared to Large Language Models (LLMs).
Use Cases for Large vs. Small Models:
- Small Language Models (SLMs): Great for focused, simpler tasks such as translation, basic customer service, and other specific applications. Example: WhatsApp using Llama 8B for language learning.
- Large Language Models (LLMs): Excel at more complex tasks like coding, logical reasoning, and solving intricate problems.
- The analogy to human brains: Just like humans have large brains for complex tasks, LLMs have a larger parameter size for more advanced capabilities, while SMs are designed for narrower, simpler tasks.
Relevance for India:
- In India, where AI adoption is growing but resources may be limited, SMs are ideal due to their affordability and ability to meet specific needs.
- Projects like Visvam from IIIT Hyderabad are building small language models tailored for healthcare, agriculture, education, and promoting language and cultural diversity.
- Sarvam AI is also working to create AI solutions that cater to the needs of a billion Indians.