benchmarks Archives - AI News https://www.artificialintelligence-news.com/tag/benchmarks/ Artificial Intelligence News Tue, 14 May 2024 12:43:58 +0000 en-GB hourly 1 https://www.artificialintelligence-news.com/wp-content/uploads/sites/9/2020/09/ai-icon-60x60.png benchmarks Archives - AI News https://www.artificialintelligence-news.com/tag/benchmarks/ 32 32 GPT-4o delivers human-like AI interaction with text, audio, and vision integration https://www.artificialintelligence-news.com/2024/05/14/gpt-4o-human-like-ai-interaction-text-audio-vision-integration/ https://www.artificialintelligence-news.com/2024/05/14/gpt-4o-human-like-ai-interaction-text-audio-vision-integration/#respond Tue, 14 May 2024 12:43:56 +0000 https://www.artificialintelligence-news.com/?p=14811 OpenAI has launched its new flagship model, GPT-4o, which seamlessly integrates text, audio, and visual inputs and outputs, promising to enhance the naturalness of machine interactions. GPT-4o, where the “o” stands for “omni,” is designed to cater to a broader spectrum of input and output modalities. “It accepts as input any combination of text, audio,... Read more »

The post GPT-4o delivers human-like AI interaction with text, audio, and vision integration appeared first on AI News.

]]>
OpenAI has launched its new flagship model, GPT-4o, which seamlessly integrates text, audio, and visual inputs and outputs, promising to enhance the naturalness of machine interactions.

GPT-4o, where the “o” stands for “omni,” is designed to cater to a broader spectrum of input and output modalities. “It accepts as input any combination of text, audio, and image and generates any combination of text, audio, and image outputs,” OpenAI announced.

Users can expect a response time as quick as 232 milliseconds, mirroring human conversational speed, with an impressive average response time of 320 milliseconds.

Pioneering capabilities

The introduction of GPT-4o marks a leap from its predecessors by processing all inputs and outputs through a single neural network. This approach enables the model to retain critical information and context that were previously lost in the separate model pipeline used in earlier versions.

Prior to GPT-4o, ‘Voice Mode’ could handle audio interactions with latencies of 2.8 seconds for GPT-3.5 and 5.4 seconds for GPT-4. The previous setup involved three distinct models: one for transcribing audio to text, another for textual responses, and a third for converting text back to audio. This segmentation led to loss of nuances such as tone, multiple speakers, and background noise.

As an integrated solution, GPT-4o boasts notable improvements in vision and audio understanding. It can perform more complex tasks such as harmonising songs, providing real-time translations, and even generating outputs with expressive elements like laughter and singing. Examples of its broad capabilities include preparing for interviews, translating languages on the fly, and generating customer service responses.

Nathaniel Whittemore, Founder and CEO of Superintelligent, commented: “Product announcements are going to inherently be more divisive than technology announcements because it’s harder to tell if a product is going to be truly different until you actually interact with it. And especially when it comes to a different mode of human-computer interaction, there is even more room for diverse beliefs about how useful it’s going to be.

“That said, the fact that there wasn’t a GPT-4.5 or GPT-5 announced is also distracting people from the technological advancement that this is a natively multimodal model. It’s not a text model with a voice or image addition; it is a multimodal token in, multimodal token out. This opens up a huge array of use cases that are going to take some time to filter into the consciousness.”

Performance and safety

GPT-4o matches GPT-4 Turbo performance levels in English text and coding tasks but outshines significantly in non-English languages, making it a more inclusive and versatile model. It sets a new benchmark in reasoning with a high score of 88.7% on 0-shot COT MMLU (general knowledge questions) and 87.2% on the 5-shot no-CoT MMLU.

The model also excels in audio and translation benchmarks, surpassing previous state-of-the-art models like Whisper-v3. In multilingual and vision evaluations, it demonstrates superior performance, enhancing OpenAI’s multilingual, audio, and vision capabilities.

OpenAI has incorporated robust safety measures into GPT-4o by design, incorporating techniques to filter training data and refining behaviour through post-training safeguards. The model has been assessed through a Preparedness Framework and complies with OpenAI’s voluntary commitments. Evaluations in areas like cybersecurity, persuasion, and model autonomy indicate that GPT-4o does not exceed a ‘Medium’ risk level across any category.

Further safety assessments involved extensive external red teaming with over 70 experts in various domains, including social psychology, bias, fairness, and misinformation. This comprehensive scrutiny aims to mitigate risks introduced by the new modalities of GPT-4o.

Availability and future integration

Starting today, GPT-4o’s text and image capabilities are available in ChatGPT—including a free tier and extended features for Plus users. A new Voice Mode powered by GPT-4o will enter alpha testing within ChatGPT Plus in the coming weeks.

Developers can access GPT-4o through the API for text and vision tasks, benefiting from its doubled speed, halved price, and enhanced rate limits compared to GPT-4 Turbo.

OpenAI plans to expand GPT-4o’s audio and video functionalities to a select group of trusted partners via the API, with broader rollout expected in the near future. This phased release strategy aims to ensure thorough safety and usability testing before making the full range of capabilities publicly available.

“It’s hugely significant that they’ve made this model available for free to everyone, as well as making the API 50% cheaper. That is a massive increase in accessibility,” explained Whittemore.

OpenAI invites community feedback to continuously refine GPT-4o, emphasising the importance of user input in identifying and closing gaps where GPT-4 Turbo might still outperform.

(Image Credit: OpenAI)

See also: OpenAI takes steps to boost AI-generated content transparency

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post GPT-4o delivers human-like AI interaction with text, audio, and vision integration appeared first on AI News.

]]>
https://www.artificialintelligence-news.com/2024/05/14/gpt-4o-human-like-ai-interaction-text-audio-vision-integration/feed/ 0
Inflection-2 beats Google’s PaLM 2 across common benchmarks https://www.artificialintelligence-news.com/2023/11/23/inflection-2-beats-google-palm-2-across-common-benchmarks/ https://www.artificialintelligence-news.com/2023/11/23/inflection-2-beats-google-palm-2-across-common-benchmarks/#respond Thu, 23 Nov 2023 09:54:15 +0000 https://www.artificialintelligence-news.com/?p=13947 Inflection, an AI startup aiming to create “personal AI for everyone”, has announced a new large language model dubbed Inflection-2 that beats Google’s PaLM 2. Inflection-2 was trained on over 5,000 NVIDIA GPUs to reach 1.025 quadrillion floating point operations (FLOPs), putting it in the same league as PaLM 2 Large. However, early benchmarks show... Read more »

The post Inflection-2 beats Google’s PaLM 2 across common benchmarks appeared first on AI News.

]]>
Inflection, an AI startup aiming to create “personal AI for everyone”, has announced a new large language model dubbed Inflection-2 that beats Google’s PaLM 2.

Inflection-2 was trained on over 5,000 NVIDIA GPUs to reach 1.025 quadrillion floating point operations (FLOPs), putting it in the same league as PaLM 2 Large. However, early benchmarks show Inflection-2 outperforming Google’s model on tests of reasoning ability, factual knowledge, and stylistic prowess.

On a range of common academic AI benchmarks, Inflection-2 achieved higher scores than PaLM 2 on most. This included outscoring the search giant’s flagship on the diverse Multi-task Middle-school Language Understanding (MMLU) tests, as well as TriviaQA, HellaSwag, and the Grade School Math (GSM8k) benchmarks:

The startup’s new model will soon power its personal assistant app Pi to enable more natural conversations and useful features.

Inflection said its transition from NVIDIA A100 to H100 GPUs for inference – combined with optimisation work – will increase serving speed and reduce costs despite Inflection-2 being much larger than its predecessor.  

An Inflection spokesperson said this latest model brings them “a big milestone closer” towards fulfilling the mission of providing AI assistants for all. They added the team is “already looking forward” to training even larger models on their 22,000 GPU supercluster.

Safety is said to be a top priority for the researchers, with Inflection being one of the first signatories to the White House’s July 2023 voluntary AI commitments. The company said its safety team continues working to ensure models are rigorously evaluated and rely on best practices for alignment.

With impressive benchmarks and plans to scale further, Inflection’s latest effort poses a serious challenge to tech giants like Google and Microsoft who have so far dominated the field of large language models. The race is on to deliver the next generation of AI.

(Photo by Johann Walter Bantz on Unsplash)

See also: Anthropic upsizes Claude 2.1 to 200K tokens, nearly doubling GPT-4

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with Digital Transformation Week.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Inflection-2 beats Google’s PaLM 2 across common benchmarks appeared first on AI News.

]]>
https://www.artificialintelligence-news.com/2023/11/23/inflection-2-beats-google-palm-2-across-common-benchmarks/feed/ 0