multimodal Archives

GPT-4o delivers human-like AI interaction with text, audio, and vision integration

OpenAI has launched its new flagship model, GPT-4o, which seamlessly integrates text, audio, and visual inputs and outputs, promising to enhance the naturalness of machine interactions.

GPT-4o, where the “o” stands for “omni,” is designed to cater to a broader spectrum of input and output modalities. “It accepts as input any combination of text, audio, and image and generates any combination of text, audio, and image outputs,” OpenAI announced.

Users can...

14 May 2024 | Applications

Google’s next-gen AI model Gemini outperforms GPT-4

Google has unveiled Gemini, a cutting-edge AI model that stands as the company's most capable and versatile to date.

Demis Hassabis, CEO and Co-Founder of Google DeepMind, introduced Gemini as a multimodal model that is capable of seamlessly understanding and combining various types of information, including text, code, audio, image, and video.

https://youtu.be/jV1vkHv4zq8

Gemini comes in three optimised versions: Ultra, Pro, and Nano. The Ultra model boasts...