Model Archives - AI News

GPT-4o delivers human-like AI interaction with text, audio, and vision integration

Ryan Daws — Tue, 14 May 2024 12:43:56 +0000

OpenAI has launched its new flagship model, GPT-4o, which seamlessly integrates text, audio, and visual inputs and outputs, promising to enhance the naturalness of machine interactions.

GPT-4o, where the “o” stands for “omni,” is designed to cater to a broader spectrum of input and output modalities. “It accepts as input any combination of text, audio, and image and generates any combination of text, audio, and image outputs,” OpenAI announced.

Users can expect a response time as quick as 232 milliseconds, mirroring human conversational speed, with an impressive average response time of 320 milliseconds.

Pioneering capabilities

The introduction of GPT-4o marks a leap from its predecessors by processing all inputs and outputs through a single neural network. This approach enables the model to retain critical information and context that were previously lost in the separate model pipeline used in earlier versions.

Prior to GPT-4o, ‘Voice Mode’ could handle audio interactions with latencies of 2.8 seconds for GPT-3.5 and 5.4 seconds for GPT-4. The previous setup involved three distinct models: one for transcribing audio to text, another for textual responses, and a third for converting text back to audio. This segmentation led to loss of nuances such as tone, multiple speakers, and background noise.

As an integrated solution, GPT-4o boasts notable improvements in vision and audio understanding. It can perform more complex tasks such as harmonising songs, providing real-time translations, and even generating outputs with expressive elements like laughter and singing. Examples of its broad capabilities include preparing for interviews, translating languages on the fly, and generating customer service responses.

Nathaniel Whittemore, Founder and CEO of Superintelligent, commented: “Product announcements are going to inherently be more divisive than technology announcements because it’s harder to tell if a product is going to be truly different until you actually interact with it. And especially when it comes to a different mode of human-computer interaction, there is even more room for diverse beliefs about how useful it’s going to be.

“That said, the fact that there wasn’t a GPT-4.5 or GPT-5 announced is also distracting people from the technological advancement that this is a natively multimodal model. It’s not a text model with a voice or image addition; it is a multimodal token in, multimodal token out. This opens up a huge array of use cases that are going to take some time to filter into the consciousness.”

Performance and safety

GPT-4o matches GPT-4 Turbo performance levels in English text and coding tasks but outshines significantly in non-English languages, making it a more inclusive and versatile model. It sets a new benchmark in reasoning with a high score of 88.7% on 0-shot COT MMLU (general knowledge questions) and 87.2% on the 5-shot no-CoT MMLU.

The model also excels in audio and translation benchmarks, surpassing previous state-of-the-art models like Whisper-v3. In multilingual and vision evaluations, it demonstrates superior performance, enhancing OpenAI’s multilingual, audio, and vision capabilities.

OpenAI has incorporated robust safety measures into GPT-4o by design, incorporating techniques to filter training data and refining behaviour through post-training safeguards. The model has been assessed through a Preparedness Framework and complies with OpenAI’s voluntary commitments. Evaluations in areas like cybersecurity, persuasion, and model autonomy indicate that GPT-4o does not exceed a ‘Medium’ risk level across any category.

Further safety assessments involved extensive external red teaming with over 70 experts in various domains, including social psychology, bias, fairness, and misinformation. This comprehensive scrutiny aims to mitigate risks introduced by the new modalities of GPT-4o.

Availability and future integration

Starting today, GPT-4o’s text and image capabilities are available in ChatGPT—including a free tier and extended features for Plus users. A new Voice Mode powered by GPT-4o will enter alpha testing within ChatGPT Plus in the coming weeks.

Developers can access GPT-4o through the API for text and vision tasks, benefiting from its doubled speed, halved price, and enhanced rate limits compared to GPT-4 Turbo.

OpenAI plans to expand GPT-4o’s audio and video functionalities to a select group of trusted partners via the API, with broader rollout expected in the near future. This phased release strategy aims to ensure thorough safety and usability testing before making the full range of capabilities publicly available.

“It’s hugely significant that they’ve made this model available for free to everyone, as well as making the API 50% cheaper. That is a massive increase in accessibility,” explained Whittemore.

OpenAI invites community feedback to continuously refine GPT-4o, emphasising the importance of user input in identifying and closing gaps where GPT-4 Turbo might still outperform.

(Image Credit: OpenAI)

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post GPT-4o delivers human-like AI interaction with text, audio, and vision integration appeared first on AI News.

Mixtral 8x22B sets new benchmark for open models

Ryan Daws — Thu, 18 Apr 2024 14:39:18 +0000

Mistral AI has released Mixtral 8x22B, which sets a new benchmark for open source models in performance and efficiency. The model boasts robust multilingual capabilities and superior mathematical and coding prowess.

Mixtral 8x22B operates as a Sparse Mixture-of-Experts (SMoE) model, utilising just 39 billion of its 141 billion parameters when active.

Beyond its efficiency, the Mixtral 8x22B boasts fluency in multiple major languages including English, French, Italian, German, and Spanish. Its adeptness extends into technical domains with strong mathematical and coding capabilities. Notably, the model supports native function calling paired with a ‘constrained output mode,’ facilitating large-scale application development and tech upgrades.

Mixtral 8x22B Instruct is out. It significantly outperforms existing open models, and only uses 39B active parameters (making it significantly faster than 70B models during inference). 1/n pic.twitter.com/EbDLMHcBOq
— Guillaume Lample (@GuillaumeLample) April 17, 2024

With a substantial 64K tokens context window, Mixtral 8x22B ensures precise information recall from voluminous documents, further appealing to enterprise-level utilisation where handling extensive data sets is routine.

In line with fostering a collaborative and innovative AI research environment, Mistral AI has released Mixtral 8x22B under the Apache 2.0 license. This highly permissive open-source license ensures no-restriction usage and enables widespread adoption.

Statistically, Mixtral 8x22B outclasses many existing models. In head-to-head comparisons on standard industry benchmarks – ranging from common sense, reasoning, to subject-specific knowledge – Mistral’s new innovation excels. Figures released by Mistral AI illustrate that Mixtral 8x22B significantly outperforms LLaMA 2 70B model in varied linguistic contexts across critical reasoning and knowledge benchmarks:

Furthermore, in the arenas of coding and maths, Mixtral continues its dominance among open models. Updated results show an impressive performance improvement in mathematical benchmarks, following the release of an instructed version of the model:

Prospective users and developers are urged to explore Mixtral 8x22B on La Plateforme, Mistral AI’s interactive platform. Here, they can engage directly with the model.

In an era where AI’s role is ever-expanding, Mixtral 8x22B’s blend of high performance, efficiency, and open accessibility marks a significant milestone in the democratisation of advanced AI tools.

(Photo by Joshua Golde)

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Mixtral 8x22B sets new benchmark for open models appeared first on AI News.

Hugging Face launches Idefics2 vision-language model

Ryan Daws — Tue, 16 Apr 2024 11:04:20 +0000

Hugging Face has announced the release of Idefics2, a versatile model capable of understanding and generating text responses based on both images and texts. The model sets a new benchmark for answering visual questions, describing visual content, story creation from images, document information extraction, and even performing arithmetic operations based on visual input.

Idefics2 leapfrogs its predecessor, Idefics1, with just eight billion parameters and the versatility afforded by its open license (Apache 2.0), along with remarkably enhanced Optical Character Recognition (OCR) capabilities.

The model not only showcases exceptional performance in visual question answering benchmarks but also holds its ground against far larger contemporaries such as LLava-Next-34B and MM1-30B-chat:

Central to Idefics2’s appeal is its integration with Hugging Face’s Transformers from the outset, ensuring ease of fine-tuning for a broad array of multimodal applications. For those eager to dive in, models are available for experimentation on the Hugging Face Hub.

A standout feature of Idefics2 is its comprehensive training philosophy, blending openly available datasets including web documents, image-caption pairs, and OCR data. Furthermore, it introduces an innovative fine-tuning dataset dubbed ‘The Cauldron,’ amalgamating 50 meticulously curated datasets for multifaceted conversational training.

Idefics2 exhibits a refined approach to image manipulation, maintaining native resolutions and aspect ratios—a notable deviation from conventional resizing norms in computer vision. Its architecture benefits significantly from advanced OCR capabilities, adeptly transcribing textual content within images and documents, and boasts improved performance in interpreting charts and figures.

Simplifying the integration of visual features into the language backbone marks a shift from its predecessor’s architecture, with the adoption of a learned Perceiver pooling and MLP modality projection enhancing Idefics2’s overall efficacy.

This advancement in vision-language models opens up new avenues for exploring multimodal interactions, with Idefics2 poised to serve as a foundational tool for the community. Its performance enhancements and technical innovations underscore the potential of combining visual and textual data in creating sophisticated, contextually-aware AI systems.

For enthusiasts and researchers looking to leverage Idefics2’s capabilities, Hugging Face provides a detailed fine-tuning tutorial.

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Hugging Face launches Idefics2 vision-language model appeared first on AI News.

Anthropic says Claude 3 Haiku is the fastest model in its class

Ryan Daws — Thu, 14 Mar 2024 17:05:49 +0000

Anthropic has released Claude 3 Haiku, the fastest and most affordable AI model in its intelligence class. Boasting state-of-the-art vision capabilities and strong performance on industry benchmarks, Haiku is touted as a versatile solution for a wide range of enterprise applications.

The model is now available alongside Anthropic’s Sonnet and Opus models in the Claude API and on Claude.ai for Claude Pro subscribers.

“Speed is essential for our enterprise users who need to quickly analyse large datasets and generate timely output for tasks like customer support,” an Anthropic spokesperson said.

“Claude 3 Haiku is three times faster than its peers for the vast majority of workloads, processing 21K tokens (~30 pages) per second for prompts under 32K tokens.”

Haiku is designed to generate swift output, enabling responsive, engaging chat experiences, and the execution of many small tasks simultaneously.

With state-of-the-art vision capabilities and strong performance on industry benchmarks across reasoning, math, and coding, Haiku is a versatile solution for a wide range of enterprise applications. pic.twitter.com/ssMa7L1bgJ
— Anthropic (@AnthropicAI) March 13, 2024

Anthropic’s pricing model for Haiku has an input-to-output token ratio of 1:5, designed explicitly for enterprise workloads which often involve longer prompts. The company says businesses can rely on Haiku to quickly analyse large volumes of documents, such as quarterly filings, contracts, or legal cases, for half the cost of other models in its performance tier.

As an example, Claude 3 Haiku can process and analyse 400 Supreme Court cases or 2,500 images for just one US dollar.

Alongside its speed and affordability, Anthropic says Claude 3 Haiku prioritises enterprise-grade security and robustness. The company conducts rigorous testing to reduce the likelihood of harmful outputs and jailbreaks. Additional security layers include continuous systems monitoring, endpoint hardening, secure coding practices, strong data encryption protocols, and stringent access controls.

Anthropic also conducts regular security audits and works with experienced penetration testers to proactively identify and address vulnerabilities.

From today, customers can use Claude 3 Haiku through Anthropic’s API or with a Claude Pro subscription. Haiku is available on Amazon Bedrock and will be coming soon to Google Cloud Vertex AI.

(Image Credit: Anthropic)

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Anthropic says Claude 3 Haiku is the fastest model in its class appeared first on AI News.

Stability AI previews Stable Diffusion 3 text-to-image model

Ryan Daws — Fri, 23 Feb 2024 16:49:01 +0000

London-based AI lab Stability AI has announced an early preview of its new text-to-image model, Stable Diffusion 3. The advanced generative AI model aims to create high-quality images from text prompts with improved performance across several key areas.

The announcement comes just days after Stability AI’s largest rival, OpenAI, unveiled Sora—a brand new AI model capable of generating nearly-realistic, high-definition videos from simple text prompts.

Sora, which isn’t available to the general public yet either, sparked concerns about its potential to create realistic-looking fake footage. OpenAI said it’s working with experts in misinformation and hateful content to test the tool before making it widely available.

According to Stability AI, Stable Diffusion 3 has significantly better abilities for handling multi-subject image generation compared to previous versions. This allows users to include more detailed prompts with multiple elements and achieve better results.

In addition to improvements with complex prompts, the new model boasts upgraded overall image quality and spelling accuracy. Stability AI claims these upgrades solve some consistency and coherence issues that have impacted past text-to-image models.

While not yet publicly available, Stability AI has opened a waitlist for people interested in early access to Stable Diffusion 3. The preview phase will allow Stability AI to gather feedback and continue refining the model before a full release planned later this year.

Stability AI said it is also working with experts to test Stable Diffusion 3 and ensure it mitigates potential harms, similar to OpenAI’s approach with Sora.

“We believe in safe, responsible AI practices. This means we have taken and continue to take reasonable steps to prevent the misuse of Stable Diffusion 3 by bad actors. Safety starts when we begin training our model and continues throughout the testing, evaluation, and deployment,” said Stability AI.

“In preparation for this early preview, we’ve introduced numerous safeguards. By continually collaborating with researchers, experts, and our community, we expect to innovate further with integrity as we approach the model’s public release.”

Stable Diffusion 3 is being offered in a range of model sizes from 800 million parameters on the low-end to 8 billion on the high-end. Stability AI said this spectrum of options aims to balance creative performance and accessibility to users with varying computational resources.

“Our commitment to ensuring generative AI is open, safe, and universally accessible remains steadfast,” explained Stability AI.

“With Stable Diffusion 3, we strive to offer adaptable solutions that enable individuals, developers, and enterprises to unleash their creativity, aligning with our mission to activate humanity’s potential.”

(Image Credit: Stability AI)

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Stability AI previews Stable Diffusion 3 text-to-image model appeared first on AI News.

Google pledges to fix Gemini’s inaccurate and biased image generation

Ryan Daws — Thu, 22 Feb 2024 15:11:11 +0000

Google’s Gemini model has come under fire for its production of historically-inaccurate and racially-skewed images, reigniting concerns about bias in AI systems.

The controversy arose as users on social media platforms flooded feeds with examples of Gemini generating pictures depicting racially-diverse Nazis, black medieval English kings, and other improbable scenarios.

Google Gemini Image generation model receives criticism for being 'Woke'.

Gemini generated diverse images for historically specific prompts, sparking debates on accuracy versus inclusivity. pic.twitter.com/YKTt2YY265
— Darosham (@Darosham_) February 22, 2024

Meanwhile, critics also pointed out Gemini’s refusal to depict Caucasians, churches in San Francisco out of respect for indigenous sensitivities, and sensitive historical events like Tiananmen Square in 1989.

In response to the backlash, Jack Krawczyk, the product lead for Google’s Gemini Experiences, acknowledged the issue and pledged to rectify it. Krawczyk took to social media platform X to reassure users:

We are aware that Gemini is offering inaccuracies in some historical image generation depictions, and we are working to fix this immediately.

As part of our AI principles https://t.co/BK786xbkey, we design our image generation capabilities to reflect our global user base, and we…
— Jack Krawczyk (@JackK) February 21, 2024

For now, Google says it is pausing the image generation of people:

We're already working to address recent issues with Gemini's image generation feature. While we do this, we're going to pause the image generation of people and will re-release an improved version soon. https://t.co/SLxYPGoqOZ
— Google Communications (@Google_Comms) February 22, 2024

While acknowledging the need to address diversity in AI-generated content, some argue that Google’s response has been an overcorrection.

Marc Andreessen, the co-founder of Netscape and a16z, recently created an “outrageously safe” parody AI model called Goody-2 LLM that refuses to answer questions deemed problematic. Andreessen warns of a broader trend towards censorship and bias in commercial AI systems, emphasising the potential consequences of such developments.

Addressing the broader implications, experts highlight the centralisation of AI models under a few major corporations and advocate for the development of open-source AI models to promote diversity and mitigate bias.

Yann LeCun, Meta’s chief AI scientist, has stressed the importance of fostering a diverse ecosystem of AI models akin to the need for a free and diverse press:

We need open source AI foundation models so that a highly diverse set of specialized models can be built on top of them.
We need a free and diverse set of AI assistants for the same reasons we need a free and diverse press.
They must reflect the diversity of languages, culture,… https://t.co/9WuEy8EPG5
— Yann LeCun (@ylecun) February 21, 2024

Bindu Reddy, CEO of Abacus.AI, has similar concerns about the concentration of power without a healthy ecosystem of open-source models:

If we don't have open-source LLMs, history will be completely distorted and obfuscated by proprietary LLMs

We already live in a very dangerous and censored world where you are not allowed to speak your mind.

Censorship and concentration of power is the very definition of an…
— Bindu Reddy (@bindureddy) February 21, 2024

As discussions around the ethical and practical implications of AI continue, the need for transparent and inclusive AI development frameworks becomes increasingly apparent.

(Photo by Matt Artz on Unsplash)

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Google pledges to fix Gemini’s inaccurate and biased image generation appeared first on AI News.

Reddit is reportedly selling data for AI training

Ryan Daws — Mon, 19 Feb 2024 11:11:40 +0000

Reddit has negotiated a content licensing deal to allow its data to be used for training AI models, according to a Bloomberg report.

Just ahead of a potential $5 billion initial public offering (IPO) debut in March, Reddit has reportedly signed a $60 million deal with an undisclosed major AI company. This move could be seen as a last-minute effort to showcase potential revenue streams in the rapidly growing AI industry to prospective investors.

Although Reddit has yet to confirm the deal, the decision could have significant implications. If true, it would mean that Reddit’s vast trove of user-generated content – including posts from popular subreddits, comments from both prominent and obscure users, and discussions on a wide range of topics – could be used to train and enhance existing large language models (LLMs) or provide the foundation for the development of new generative AI systems.

However, this decision by Reddit may not sit well with its user base, as the company has faced increasing opposition from its community regarding its recent business decisions.

Last year, when Reddit announced plans to start charging for access to its application programming interfaces (APIs), thousands of Reddit forums temporarily shut down in protest. Days later, a group of Reddit hackers threatened to release previously stolen site data unless the company reversed the API plan or paid a ransom of $4.5 million.

Reddit has recently made other controversial decisions, such as removing years of private chat logs and messages from users’ accounts. The platform also implemented new automatic moderation features and removed the option for users to turn off personalised advertising, fuelling additional discontent among its users.

This latest reported deal to sell Reddit’s data for AI training could generate even more backlash from users, as the debate over the ethics of using public data, art, and other human-created content to train AI systems continues to intensify across various industries and platforms.

(Photo by Brett Jordan on Unsplash)

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Reddit is reportedly selling data for AI training appeared first on AI News.

Google launches Gemini to replace Bard chatbot

Ryan Daws — Fri, 09 Feb 2024 09:54:16 +0000

Google has launched its AI chatbot called Gemini, which replaces its short-lived Bard service.

Unveiled in December, Bard was touted as a competitor to chatbots like ChatGPT but failed to impress in demos. Google staff even called the launch “botched” and slammed CEO Sundar Pichai.

Now rebranded as Gemini, Google says it represents the company’s “most capable family of models” for natural conversations. Two experiences are being launched: Gemini Advanced and a mobile app.

Gemini Advanced grants access to Ultra 1.0, billed by Google as its “largest and most capable state-of-the-art AI model.” In blind evaluations, third-party raters preferred Gemini Advanced with Ultra 1.0 over alternatives in complex tasks like coding, logical reasoning, and creative collaboration.

The AI can serve as a tutor by creating personalised lessons and quizzes. Developers are aided for trickier coding problems. Gemini Advanced is designed to spark ideas and strategise ways that creators can grow their audiences.

Google plans to expand Gemini Advanced’s capabilities over time with exclusive features like expanded multimodal interactions, interactive coding, deeper data analysis, and more. The service already supports over 150 countries in English and will add more languages soon.

Access to Gemini Advanced is granted through a new $19.99 (£18.99) per month Google One AI Premium Plan, including a free two-month trial. Subscribers get the latest Google AI advancements plus 2TB of storage from the existing Premium plan.

Google claims Gemini Advanced underwent extensive trust and safety checks before its launch, including external reviews, to mitigate issues around unsafe content and bias. More details are available in an updated technical report (PDF).

Lastly, Google launched new mobile apps on Android and iOS to access basic Gemini features on-the-go. Users can ask for help with images, tasks, and more while out-and-about. Over time, the plan is for Gemini to become a true personal AI assistant.

The Gemini mobile apps are now available in the US as a dedicated app on Android and in the Google app on iOS, supporting English conversations initially. Next week, the apps expand to Japan and Korea, followed by more countries and languages thereafter.

(Image Credit: Google)

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with Digital Transformation Week and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Google launches Gemini to replace Bard chatbot appeared first on AI News.

OpenAI releases new models and lowers API pricing

Ryan Daws — Fri, 26 Jan 2024 13:25:01 +0000

OpenAI has announced several updates that will benefit developers using its AI services, including new embedding models, a lower price for GPT-3.5 Turbo, an updated GPT-4 Turbo preview, and more robust content moderation capabilities.

The San Francisco-based AI lab said its new text-embedding-3-small and text-embedding-3-large models offer upgraded performance over previous generations. For example, text-embedding-3-large achieves average scores of 54.9 percent on the MIRACL benchmark and 64.6 percent on the MTEB benchmark, up from 31.4 percent and 61 percent respectively for the older text-embedding-ada-002 model.

Additionally, OpenAI revealed the price per 1,000 tokens has dropped 5x for text-embedding-3-small compared to text-embedding-ada-002, from $0.0001 to $0.00002. The company said developers can also shorten embeddings to reduce costs without significantly impacting accuracy.

Next week, OpenAI plans to release an updated GPT-3.5 Turbo model and cut its pricing by 50 percent for input tokens and 25 percent for output tokens. This will mark the third price reduction for GPT-3.5 Turbo in the past year as OpenAI aims to drive more adoption.

OpenAI has additionally updated its GPT-4 Turbo preview to version gpt-4-0125-preview, noting over 70 percent of requests have transitioned to the model since its debut. Improvements include more thorough completion of code generation and other tasks.

To support developers building safe AI apps, OpenAI has also rolled out its most advanced content moderation model yet in text-moderation-007. The company said this identifies potentially harmful text more accurately than previous versions.

Finally, developers now have more control over API keys and visibility into usage metrics. OpenAI says developers can assign permissions to keys and view consumption on a per-key level to better track individual products or projects.

OpenAI says that more platform improvements are planned over the coming months to cater for larger development teams.

(Photo by Jonathan Kemper on Unsplash)

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with Digital Transformation Week and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post OpenAI releases new models and lowers API pricing appeared first on AI News.

Microsoft unveils 2.7B parameter language model Phi-2

Ryan Daws — Wed, 13 Dec 2023 16:59:31 +0000

Microsoft’s 2.7 billion-parameter model Phi-2 showcases outstanding reasoning and language understanding capabilities, setting a new standard for performance among base language models with less than 13 billion parameters.

Phi-2 builds upon the success of its predecessors, Phi-1 and Phi-1.5, by matching or surpassing models up to 25 times larger—thanks to innovations in model scaling and training data curation.

The compact size of Phi-2 makes it an ideal playground for researchers, facilitating exploration in mechanistic interpretability, safety improvements, and fine-tuning experimentation across various tasks.

Phi-2’s achievements are underpinned by two key aspects:

Training data quality: Microsoft emphasises the critical role of training data quality in model performance. Phi-2 leverages “textbook-quality” data, focusing on synthetic datasets designed to impart common sense reasoning and general knowledge. The training corpus is augmented with carefully selected web data, filtered based on educational value and content quality.

Innovative scaling techniques: Microsoft adopts innovative techniques to scale up Phi-2 from its predecessor, Phi-1.5. Knowledge transfer from the 1.3 billion parameter model accelerates training convergence, leading to a clear boost in benchmark scores.

Performance evaluation

Phi-2 has undergone rigorous evaluation across various benchmarks, including Big Bench Hard, commonsense reasoning, language understanding, math, and coding.

With only 2.7 billion parameters, Phi-2 outperforms larger models – including Mistral and Llama-2 – and matches or outperforms Google’s recently-announced Gemini Nano 2:

Beyond benchmarks, Phi-2 showcases its capabilities in real-world scenarios. Tests involving prompts commonly used in the research community reveal Phi-2’s prowess in solving physics problems and correcting student mistakes, showcasing its versatility beyond standard evaluations:

Phi-2 is a Transformer-based model with a next-word prediction objective, trained on 1.4 trillion tokens from synthetic and web datasets. The training process – conducted on 96 A100 GPUs over 14 days – focuses on maintaining a high level of safety and claims to surpass open-source models in terms of toxicity and bias.

With the announcement of Phi-2, Microsoft continues to push the boundaries of what smaller base language models can achieve.

(Image Credit: Microsoft)

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Microsoft unveils 2.7B parameter language model Phi-2 appeared first on AI News.