Microsoft unveils Phi-3 family of compact language models

Ryan Daws — Wed, 24 Apr 2024 13:44:28 +0000

Microsoft has announced the Phi-3 family of open small language models (SLMs), touting them as the most capable and cost-effective of their size available. The innovative training approach developed by Microsoft researchers has allowed the Phi-3 models to outperform larger models on language, coding, and math benchmarks.

“What we’re going to start to see is not a shift from large to small, but a shift from a singular category of models to a portfolio of models where customers get the ability to make a decision on what is the best model for their scenario,” said Sonali Yadav, Principal Product Manager for Generative AI at Microsoft.

The first Phi-3 model, Phi-3-mini at 3.8 billion parameters, is now publicly available in Azure AI Model Catalog, Hugging Face, Ollama, and as an NVIDIA NIM microservice. Despite its compact size, Phi-3-mini outperforms models twice its size. Additional Phi-3 models like Phi-3-small (7B parameters) and Phi-3-medium (14B parameters) will follow soon.

phi-3-mini: 3.8B model matching Mixtral 8x7B and GPT-3.5

Plus a 7B model that matches Llama 3 8B in many benchmarks.

Plus a 14B model.https://t.co/2h0xahzUUS pic.twitter.com/XaED6mJL1V
— Mira (@_Mira___Mira_) April 23, 2024

“Some customers may only need small models, some will need big models and many are going to want to combine both in a variety of ways,” said Luis Vargas, Microsoft VP of AI.

The key advantage of SLMs is their smaller size enabling on-device deployment for low-latency AI experiences without network connectivity. Potential use cases include smart sensors, cameras, farming equipment, and more. Privacy is another benefit by keeping data on the device.

(Credit: Microsoft)

Large language models (LLMs) excel at complex reasoning over vast datasets—strengths suited to applications like drug discovery by understanding interactions across scientific literature. However, SLMs offer a compelling alternative for simpler query answering, summarisation, content generation, and the like.

“Rather than chasing ever-larger models, Microsoft is developing tools with more carefully curated data and specialised training,” commented Victor Botev, CTO and Co-Founder of Iris.ai.

“This allows for improved performance and reasoning abilities without the massive computational costs of models with trillions of parameters. Fulfilling this promise would mean tearing down a huge adoption barrier for businesses looking for AI solutions.”

Breakthrough training technique

What enabled Microsoft’s SLM quality leap was an innovative data filtering and generation approach inspired by bedtime story books.

“Instead of training on just raw web data, why don’t you look for data which is of extremely high quality?” asked Sebastien Bubeck, Microsoft VP leading SLM research.

Ronen Eldan’s nightly reading routine with his daughter sparked the idea to generate a ‘TinyStories’ dataset of millions of simple narratives created by prompting a large model with combinations of words a 4-year-old would know. Remarkably, a 10M parameter model trained on TinyStories could generate fluent stories with perfect grammar.

Building on that early success, the team procured high-quality web data vetted for educational value to create the ‘CodeTextbook’ dataset. This was synthesised through rounds of prompting, generation, and filtering by both humans and large AI models.

“A lot of care goes into producing these synthetic data,” Bubeck said. “We don’t take everything that we produce.”

The high-quality training data proved transformative. “Because it’s reading from textbook-like material…you make the task of the language model to read and understand this material much easier,” Bubeck explained.

Mitigating AI safety risks

Despite the thoughtful data curation, Microsoft emphasises applying additional safety practices to the Phi-3 release mirroring its standard processes for all generative AI models.

“As with all generative AI model releases, Microsoft’s product and responsible AI teams used a multi-layered approach to manage and mitigate risks in developing Phi-3 models,” a blog post stated.

This included further training examples to reinforce expected behaviours, assessments to identify vulnerabilities through red-teaming, and offering Azure AI tools for customers to build trustworthy applications atop Phi-3.

(Photo by Tadas Sar)

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Microsoft unveils Phi-3 family of compact language models appeared first on AI News.

Stability AI unveils 12B parameter Stable LM 2 model and updated 1.6B variant

Ryan Daws — Tue, 09 Apr 2024 16:40:24 +0000

Stability AI has introduced the latest additions to its Stable LM 2 language model series: a 12 billion parameter base model and an instruction-tuned variant. These models were trained on an impressive two trillion tokens across seven languages: English, Spanish, German, Italian, French, Portuguese, and Dutch.

The 12 billion parameter model aims to strike a balance between strong performance, efficiency, memory requirements, and speed. It follows the established framework of Stability AI’s previously released Stable LM 2 1.6B technical report. This new release extends the company’s model range, offering developers a transparent and powerful tool for innovating with AI language technology.

Alongside the 12B model, Stability AI has also released a new version of its Stable LM 2 1.6B model. This updated 1.6B variant improves conversation abilities across the same seven languages while maintaining remarkably low system requirements.

Stable LM 2 12B is designed as an efficient open model tailored for multilingual tasks with smooth performance on widely available hardware.

According to Stability AI, this model can handle tasks typically feasible only for significantly larger models, which often require substantial computational and memory resources, such as large Mixture-of-Experts (MoEs). The instruction-tuned version is particularly well-suited for various uses, including as a central part of retrieval RAG systems, due to its high performance in tool usage and function calling.

In performance comparisons with popular strong language models like Mixtral, Llama2, Qwen 1.5, Gemma, and Mistral, Stable LM 2 12B offers solid results when tested on zero-shot and few-shot tasks across general benchmarks outlined in the Open LLM leaderboard:

With this new release, Stability AI extends the StableLM 2 family into the 12B category, providing an open and transparent model without compromising power and accuracy. The company is confident that this release will enable developers and businesses to continue developing the future while retaining full control over their data.

Developers and businesses can use Stable LM 2 12B now for commercial and non-commercial purposes with a Stability AI Membership.

(Photo by Muha Ajjan)

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Stability AI unveils 12B parameter Stable LM 2 model and updated 1.6B variant appeared first on AI News.

language models Archives - AI News

Microsoft unveils Phi-3 family of compact language models

Breakthrough training technique

Mitigating AI safety risks

Stability AI unveils 12B parameter Stable LM 2 model and updated 1.6B variant