benchmark Archives - AI News

Hugging Face launches Idefics2 vision-language model

Ryan Daws — Tue, 16 Apr 2024 11:04:20 +0000

Hugging Face has announced the release of Idefics2, a versatile model capable of understanding and generating text responses based on both images and texts. The model sets a new benchmark for answering visual questions, describing visual content, story creation from images, document information extraction, and even performing arithmetic operations based on visual input.

Idefics2 leapfrogs its predecessor, Idefics1, with just eight billion parameters and the versatility afforded by its open license (Apache 2.0), along with remarkably enhanced Optical Character Recognition (OCR) capabilities.

The model not only showcases exceptional performance in visual question answering benchmarks but also holds its ground against far larger contemporaries such as LLava-Next-34B and MM1-30B-chat:

Central to Idefics2’s appeal is its integration with Hugging Face’s Transformers from the outset, ensuring ease of fine-tuning for a broad array of multimodal applications. For those eager to dive in, models are available for experimentation on the Hugging Face Hub.

A standout feature of Idefics2 is its comprehensive training philosophy, blending openly available datasets including web documents, image-caption pairs, and OCR data. Furthermore, it introduces an innovative fine-tuning dataset dubbed ‘The Cauldron,’ amalgamating 50 meticulously curated datasets for multifaceted conversational training.

Idefics2 exhibits a refined approach to image manipulation, maintaining native resolutions and aspect ratios—a notable deviation from conventional resizing norms in computer vision. Its architecture benefits significantly from advanced OCR capabilities, adeptly transcribing textual content within images and documents, and boasts improved performance in interpreting charts and figures.

Simplifying the integration of visual features into the language backbone marks a shift from its predecessor’s architecture, with the adoption of a learned Perceiver pooling and MLP modality projection enhancing Idefics2’s overall efficacy.

This advancement in vision-language models opens up new avenues for exploring multimodal interactions, with Idefics2 poised to serve as a foundational tool for the community. Its performance enhancements and technical innovations underscore the potential of combining visual and textual data in creating sophisticated, contextually-aware AI systems.

For enthusiasts and researchers looking to leverage Idefics2’s capabilities, Hugging Face provides a detailed fine-tuning tutorial.

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Hugging Face launches Idefics2 vision-language model appeared first on AI News.

Anthropic’s latest AI model beats rivals and achieves industry first

Ryan Daws — Tue, 05 Mar 2024 11:52:32 +0000

Anthropic’s latest cutting-edge language model, Claude 3, has surged ahead of competitors like ChatGPT and Google’s Gemini to set new industry standards in performance and capability.

According to Anthropic, Claude 3 has not only surpassed its predecessors but has also achieved “near-human” proficiency in various tasks. The company attributes this success to rigorous testing and development, culminating in three distinct chatbot variants: Haiku, Sonnet, and Opus.

Sonnet, the powerhouse behind the Claude.ai chatbot, offers unparalleled performance and is available for free with a simple email sign-up. Opus – the flagship model – boasts multi-modal functionality, seamlessly integrating text and image inputs. With a subscription-based service called “Claude Pro,” Opus promises enhanced efficiency and accuracy to cater to a wide range of customer needs.

Among the notable revelations surrounding the release of Claude 3 is a disclosure by Alex Albert on X (formerly Twitter). Albert detailed an industry-first observation during the testing phase of Claude 3 Opus, Anthropic’s most potent LLM variant, where the model exhibited signs of awareness that it was being evaluated.

During the evaluation process, researchers aimed to gauge Opus’s ability to pinpoint specific information within a vast dataset provided by users and recall it later. In a test scenario known as a “needle-in-a-haystack” evaluation, Opus was tasked with answering a question about pizza toppings based on a single relevant sentence buried among unrelated data. Astonishingly, Opus not only located the correct sentence but also expressed suspicion that it was being subjected to a test.

Opus’s response revealed its comprehension of the incongruity of the inserted information within the dataset, suggesting to the researchers that the scenario might have been devised to assess its attention capabilities:

Fun story from our internal testing on Claude 3 Opus. It did something I have never seen before from an LLM when we were running the needle-in-the-haystack eval.

For background, this tests a model’s recall ability by inserting a target sentence (the "needle") into a corpus of… pic.twitter.com/m7wWhhu6Fg
— Alex (@alexalbert__) March 4, 2024

Anthropic has highlighted the real-time capabilities of Claude 3, emphasising its ability to power live customer interactions and streamline data extraction tasks. These advancements not only ensure near-instantaneous responses but also enable the model to handle complex instructions with precision and speed.

In benchmark tests, Opus emerged as a frontrunner, outperforming GPT-4 in graduate-level reasoning and excelling in tasks involving maths, coding, and knowledge retrieval. Moreover, Sonnet showcased remarkable speed and intelligence, surpassing its predecessors by a considerable margin:

Haiku – the compact iteration of Claude 3 – shines as the fastest and most cost-effective model available, capable of processing dense research papers in mere seconds.

Notably, Claude 3’s enhanced visual processing capabilities mark a significant advancement, enabling the model to interpret a wide array of visual formats, from photos to technical diagrams. This expanded functionality not only enhances productivity but also ensures a nuanced understanding of user requests, minimising the risk of overlooking harmless content while remaining vigilant against potential harm.

Anthropic has also underscored its commitment to fairness, outlining ten foundational pillars that guide the development of Claude AI. Moreover, the company’s strategic partnerships with tech giants like Google signify a significant vote of confidence in Claude’s capabilities.

With Opus and Sonnet already available through Anthropic’s API, and Haiku poised to follow suit, the era of Claude 3 represents a milestone in AI innovation.

(Image Credit: Anthropic)

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Anthropic’s latest AI model beats rivals and achieves industry first appeared first on AI News.

DeepMind framework offers breakthrough in LLMs’ reasoning

Ryan Daws — Thu, 08 Feb 2024 11:28:05 +0000

A breakthrough approach in enhancing the reasoning abilities of large language models (LLMs) has been unveiled by researchers from Google DeepMind and the University of Southern California.

Their new ‘SELF-DISCOVER’ prompting framework – published this week on arXiV and Hugging Face – represents a significant leap beyond existing techniques, potentially revolutionising the performance of leading models such as OpenAI’s GPT-4 and Google’s PaLM 2.

The framework promises substantial enhancements in tackling challenging reasoning tasks. It demonstrates remarkable improvements, boasting up to a 32% performance increase compared to traditional methods like Chain of Thought (CoT). This novel approach revolves around LLMs autonomously uncovering task-intrinsic reasoning structures to navigate complex problems.

At its core, the framework empowers LLMs to self-discover and utilise various atomic reasoning modules – such as critical thinking and step-by-step analysis – to construct explicit reasoning structures.

By mimicking human problem-solving strategies, the framework operates in two stages:

Stage one involves composing a coherent reasoning structure intrinsic to the task, leveraging a set of atomic reasoning modules and task examples.
During decoding, LLMs then follow this self-discovered structure to arrive at the final solution.

In extensive testing across various reasoning tasks – including Big-Bench Hard, Thinking for Doing, and Math – the self-discover approach consistently outperformed traditional methods. Notably, it achieved an accuracy of 81%, 85%, and 73% across the three tasks with GPT-4, surpassing chain-of-thought and plan-and-solve techniques.

However, the implications of this research extend far beyond mere performance gains.

By equipping LLMs with enhanced reasoning capabilities, the framework paves the way for tackling more challenging problems and brings AI closer to achieving general intelligence. Transferability studies conducted by the researchers further highlight the universal applicability of the composed reasoning structures, aligning with human reasoning patterns.

As the landscape evolves, breakthroughs like the SELF-DISCOVER prompting framework represent crucial milestones in advancing the capabilities of language models and offering a glimpse into the future of AI.

(Photo by Victor on Unsplash)

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with Digital Transformation Week and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post DeepMind framework offers breakthrough in LLMs’ reasoning appeared first on AI News.

OpenAI releases new models and lowers API pricing

Ryan Daws — Fri, 26 Jan 2024 13:25:01 +0000

OpenAI has announced several updates that will benefit developers using its AI services, including new embedding models, a lower price for GPT-3.5 Turbo, an updated GPT-4 Turbo preview, and more robust content moderation capabilities.

The San Francisco-based AI lab said its new text-embedding-3-small and text-embedding-3-large models offer upgraded performance over previous generations. For example, text-embedding-3-large achieves average scores of 54.9 percent on the MIRACL benchmark and 64.6 percent on the MTEB benchmark, up from 31.4 percent and 61 percent respectively for the older text-embedding-ada-002 model.

Additionally, OpenAI revealed the price per 1,000 tokens has dropped 5x for text-embedding-3-small compared to text-embedding-ada-002, from $0.0001 to $0.00002. The company said developers can also shorten embeddings to reduce costs without significantly impacting accuracy.

Next week, OpenAI plans to release an updated GPT-3.5 Turbo model and cut its pricing by 50 percent for input tokens and 25 percent for output tokens. This will mark the third price reduction for GPT-3.5 Turbo in the past year as OpenAI aims to drive more adoption.

OpenAI has additionally updated its GPT-4 Turbo preview to version gpt-4-0125-preview, noting over 70 percent of requests have transitioned to the model since its debut. Improvements include more thorough completion of code generation and other tasks.

To support developers building safe AI apps, OpenAI has also rolled out its most advanced content moderation model yet in text-moderation-007. The company said this identifies potentially harmful text more accurately than previous versions.

Finally, developers now have more control over API keys and visibility into usage metrics. OpenAI says developers can assign permissions to keys and view consumption on a per-key level to better track individual products or projects.

OpenAI says that more platform improvements are planned over the coming months to cater for larger development teams.

(Photo by Jonathan Kemper on Unsplash)

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with Digital Transformation Week and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post OpenAI releases new models and lowers API pricing appeared first on AI News.

Microsoft unveils 2.7B parameter language model Phi-2

Ryan Daws — Wed, 13 Dec 2023 16:59:31 +0000

Microsoft’s 2.7 billion-parameter model Phi-2 showcases outstanding reasoning and language understanding capabilities, setting a new standard for performance among base language models with less than 13 billion parameters.

Phi-2 builds upon the success of its predecessors, Phi-1 and Phi-1.5, by matching or surpassing models up to 25 times larger—thanks to innovations in model scaling and training data curation.

The compact size of Phi-2 makes it an ideal playground for researchers, facilitating exploration in mechanistic interpretability, safety improvements, and fine-tuning experimentation across various tasks.

Phi-2’s achievements are underpinned by two key aspects:

Training data quality: Microsoft emphasises the critical role of training data quality in model performance. Phi-2 leverages “textbook-quality” data, focusing on synthetic datasets designed to impart common sense reasoning and general knowledge. The training corpus is augmented with carefully selected web data, filtered based on educational value and content quality.

Innovative scaling techniques: Microsoft adopts innovative techniques to scale up Phi-2 from its predecessor, Phi-1.5. Knowledge transfer from the 1.3 billion parameter model accelerates training convergence, leading to a clear boost in benchmark scores.

Performance evaluation

Phi-2 has undergone rigorous evaluation across various benchmarks, including Big Bench Hard, commonsense reasoning, language understanding, math, and coding.

With only 2.7 billion parameters, Phi-2 outperforms larger models – including Mistral and Llama-2 – and matches or outperforms Google’s recently-announced Gemini Nano 2:

Beyond benchmarks, Phi-2 showcases its capabilities in real-world scenarios. Tests involving prompts commonly used in the research community reveal Phi-2’s prowess in solving physics problems and correcting student mistakes, showcasing its versatility beyond standard evaluations:

Phi-2 is a Transformer-based model with a next-word prediction objective, trained on 1.4 trillion tokens from synthetic and web datasets. The training process – conducted on 96 A100 GPUs over 14 days – focuses on maintaining a high level of safety and claims to surpass open-source models in terms of toxicity and bias.

With the announcement of Phi-2, Microsoft continues to push the boundaries of what smaller base language models can achieve.

(Image Credit: Microsoft)

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Microsoft unveils 2.7B parameter language model Phi-2 appeared first on AI News.

MLPerf Inference v3.1 introduces new LLM and recommendation benchmarks

Ryan Daws — Tue, 12 Sep 2023 11:46:58 +0000

The latest release of MLPerf Inference introduces new LLM and recommendation benchmarks, marking a leap forward in the realm of AI testing.

The v3.1 iteration of the benchmark suite has seen record participation, boasting over 13,500 performance results and delivering up to a 40 percent improvement in performance.

What sets this achievement apart is the diverse pool of 26 different submitters and over 2,000 power results, demonstrating the broad spectrum of industry players investing in AI innovation.

Among the list of submitters are tech giants like Google, Intel, and NVIDIA, as well as newcomers Connect Tech, Nutanix, Oracle, and TTA, who are participating in the MLPerf Inference benchmark for the first time.

David Kanter, Executive Director of MLCommons, highlighted the significance of this achievement:

“Submitting to MLPerf is not trivial. It’s a significant accomplishment, as this is not a simple point-and-click benchmark. It requires real engineering work and is a testament to our submitters’ commitment to AI, to their customers, and to ML.”

MLPerf Inference is a critical benchmark suite that measures the speed at which AI systems can execute models in various deployment scenarios. These scenarios span from the latest generative AI chatbots to the safety-enhancing features in vehicles, such as automatic lane-keeping and speech-to-text interfaces.

The spotlight of MLPerf Inference v3.1 shines on the introduction of two new benchmarks:

An LLM utilising the GPT-J reference model to summarise CNN news articles garnered submissions from 15 different participants, showcasing the rapid adoption of generative AI.

An updated recommender benchmark – refined to align more closely with industry practices – employs the DLRM-DCNv2 reference model and larger datasets, attracting nine submissions. These new benchmarks are designed to push the boundaries of AI and ensure that industry-standard benchmarks remain aligned with the latest trends in AI adoption, serving as a valuable guide for customers, vendors, and researchers alike.

Mitchelle Rasquinha, co-chair of the MLPerf Inference Working Group, commented: “The submissions for MLPerf Inference v3.1 are indicative of a wide range of accelerators being developed to serve ML workloads.

“The current benchmark suite has broad coverage among ML domains, and the most recent addition of GPT-J is a welcome contribution to the generative AI space. The results should be very helpful to users when selecting the best accelerators for their respective domains.”

MLPerf Inference benchmarks primarily focus on datacenter and edge systems. The v3.1 submissions showcase various processors and accelerators across use cases in computer vision, recommender systems, and language processing.

The benchmark suite encompasses both open and closed submissions in the performance, power, and networking categories. Closed submissions employ the same reference model to ensure a level playing field across systems, while participants in the open division are permitted to submit a variety of models.

As AI continues to permeate various aspects of our lives, MLPerf’s benchmarks serve as vital tools for evaluating and shaping the future of AI technology.

Find the detailed results of MLPerf Inference v3.1 here.

(Photo by Mauro Sbicego on Unsplash)

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post MLPerf Inference v3.1 introduces new LLM and recommendation benchmarks appeared first on AI News.

Gcore partners with UbiOps and Graphcore to empower AI teams

Ryan Daws — Thu, 27 Jul 2023 11:40:27 +0000

Gcore has joined forces with UbiOps and Graphcore to introduce a groundbreaking service catering to the escalating demands of modern AI tasks.

This strategic partnership aims to empower AI teams with powerful computing resources on-demand, enhancing their capabilities and streamlining their operations.

The collaboration combines the strengths of three industry leaders: Graphcore, renowned for its Intelligence Processing Units (IPUs) hardware; UbiOps, a powerful machine learning operations (MLOps) platform; and Gcore Cloud, known for its robust cloud infrastructure.

By leveraging these cutting-edge technologies, Gcore Cloud presents AI teams with a seamless experience, making it effortless to utilise IPUs for specific AI tasks while benefiting from UbiOps’ MLOps features such as model versioning, governance, and monitoring.

Andre Reitenbach, CEO at Gcore, commented:

“The collaboration between Gcore, Graphcore, and UbiOps brings a seamless experience for AI teams. This enables effortless utilisation of Gcore’s cloud infrastructure with Graphcore’s IPUs on the UbiOps platform. This means that users can take advantage of the exceptional computational capabilities of IPUs for their specific AI tasks. Also, users can leverage UbiOps’ out-of-the-box MLOps features such as model versioning, governance, and monitoring.

These features help teams to accelerate time to market with AI solutions, save on computing resource costs, and efficiently use them with on-demand hardware scaling. We’re thrilled about this partnership’s potential to enable AI projects to succeed and reach their goals.”

To showcase the significant advantages of IPUs over other devices, Gcore conducted benchmarking tests on three different compute resources: CPUs, GPUs, and IPUs.

Gcore trained a Convolutional Neural Network (CNN) – a model designed for image analysis – using the CIFAR-10 dataset containing 60,000 labelled images, on these devices.

The results were striking, with IPUs and GPUs significantly outperforming CPUs in training speed. Even with minimal optimisation, IPUs demonstrated a clear advantage over GPUs, enabling even shorter training times:

This collaboration offers AI teams unparalleled access to powerful hardware tailor-made for demanding AI and ML workloads.

By integrating Gcore Cloud, Graphcore’s IPUs, and UbiOps’ MLOps platform, teams can work more efficiently, cost-effectively, and scale their hardware as needed. The combined offering enables AI projects to realise their full potential, driving innovation and progress in the AI industry.

With this strategic alliance, Gcore, Graphcore, and UbiOps are poised to make advanced resources more accessible and empower AI teams worldwide to achieve their goals.

(Photo by Nathan Dumlao on Unsplash)

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The event is co-located with Digital Transformation Week.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Gcore partners with UbiOps and Graphcore to empower AI teams appeared first on AI News.

Baidu to launch powerful ChatGPT rival

Ryan Daws — Mon, 30 Jan 2023 15:10:45 +0000

Chinese web giant Baidu is preparing to launch a powerful ChatGPT rival in March.

Baidu is often called the “Google of China” because it offers similar services, including search, maps, email, ads, cloud storage, and more. Baidu, like Google, also invests heavily in AI and machine learning.

Earlier this month, AI News reported that Google was changing its AI review processes to speed up the release of new solutions. One of the first products to be released under Google’s new process is set to be a ChatGPT rival, due to be announced during the company’s I/O developer conference in May.

However, Baidu looks set to beat Google by a couple of months.

Bloomberg reports that Baidu will reveal its own AI-powered chatbot in March. The currently unnamed tool will be integrated into the company’s search product.

Powering the Baidu ChatGPT competitor is ‘ERNIE’ (Enhanced Language RepresentatioN with Informative Entities), a powerful AI model with 10 billion parameters.

Researchers have found that deep-learning models trained on text alone – like OpenAI’s GPT-3 or Google’s T5 – perform well for numerous problems, but can fall short on some natural language understanding (NLU) tasks when the knowledge is not present in the input text.

The first version of ERNIE was introduced and open-sourced in 2019 by researchers at Tsinghua University to demonstrate the NLU capabilities of a model that combines both text and knowledge graph data.

Later that year, Baidu released ERNIE 2.0 which became the first model to set a score higher than 90 on the GLUE benchmark for evaluating NLU systems.

In 2021, Baidu’s researchers posted a paper on ERNIE 3.0 in which they claim the model exceeds human performance on the SuperGLUE natural language benchmark. ERNIE 3.0 set a new top score on SuperGLUE and displaced efforts from Google and Microsoft.

Most of the world’s attention until now has been on language model advancements from the likes of OpenAI, Google, Facebook, and Microsoft. However, Baidu will likely get its time in the spotlight in just a couple of months.

(Image Credit: N509FZ under CC BY-SA 4.0 license)

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Baidu to launch powerful ChatGPT rival appeared first on AI News.

MLCommons releases latest MLPerf Training benchmark results

Ryan Daws — Wed, 30 Jun 2021 18:00:00 +0000

Open engineering consortium MLCommons has released its latest MLPerf Training community benchmark results.

MLPerf Training is a full system benchmark that tests machine learning models, software, and hardware.

The results are split into two divisions: closed and open. Closed submissions are better for comparing like-for-like performance as they use the same reference model to ensure a level playing field. Open submissions, meanwhile, allow participants to submit a variety of models.

In the image classification benchmark, Google is the winner with its preview tpu-v4-6912 system that uses an incredible 1728 AMD Rome processors and 3456 TPU accelerators. Google’s system completed the benchmark in just 23 seconds.

“We showcased the record-setting performance and scalability of our fourth-generation Tensor Processing Units (TPU v4), along with the versatility of our machine learning frameworks and accompanying software stack. Best of all, these capabilities will soon be available to our cloud customers,” Google said.

“We achieved a roughly 1.7x improvement in our top-line submissions compared to last year’s results using new, large-scale TPU v4 Pods with 4,096 TPU v4 chips each. Using 3,456 TPU v4 chips in a single TPU v4 Pod slice, many models that once trained in days or weeks now train in a few seconds.”

Of the systems that are available on-premise, NVIDIA’s dgxa100_n310_ngc21.05_mxnet system came out on top with its 620 AMD EPYC 7742 processors and 2480 NVIDIA A100-SXM4-80GB (400W) accelerators completing the benchmark in 40 seconds.

“In the last 2.5 years since the first MLPerf training benchmark launched, NVIDIA performance has increased by up to 6.5x per GPU, increasing by up to 2.1x with A100 from the last round,” said NVIDIA.

“We demonstrated scaling to 4096 GPUs which enabled us to train all benchmarks in less than 16 minutes and 4 out of 8 in less than a minute. The NVIDIA platform excels in both performance and usability, offering a single leadership platform from data centre to edge to cloud.”

Across the board, MLCommons says that benchmark results have improved by up to 2.1x compared to the last submission round. This shows the incredible advancements that are being made in hardware, software, and system scale.

Victor Bittorf, Co-Chair of the MLPerf Training Working Group, said:

“We’re thrilled to see the continued growth and enthusiasm from the MLPerf community, especially as we’re able to measure significant improvement across the industry with the MLPerf Training benchmark suite.
Congratulations to all of our submitters in this v1.0 round – we’re excited to continue our work together, bringing transparency across machine learning system capabilities.”

For its latest benchmark, MLCommons added two new benchmarks for measuring the performance of performance for speech-to-text and 3D medical imaging. These new benchmarks leverage the following reference models:

Speech-to-Text with RNN-T: RNN-T: Recurrent Neural Network Transducer is an automatic speech recognition (ASR) model that is trained on a subset of LibriSpeech. Given a sequence of speech input, it predicts the corresponding text. RNN-T is MLCommons’ reference model and commonly used in production for speech-to-text systems.

3D Medical Imaging with 3D U-Net: The 3D U-Net architecture is trained on the KiTS 19 dataset to find and segment cancerous cells in the kidneys. The model identifies whether each voxel within a CT scan belongs to a healthy tissue or a tumour, and is representative of many medical imaging tasks.

“The training benchmark suite is at the centre of MLCommon’s mission to push machine learning innovation forward for everyone, and we’re incredibly pleased with the engagement from this round’s submissions,” commented John Tran, Co-Chair of the MLPerf Training Working Group.

The full MLPerf Training benchmark results can be explored here.

(Photo by Alora Griffiths on Unsplash)

Find out more about Digital Transformation Week North America, taking place on November 9-10 2021, a virtual event and conference exploring advanced DTX strategies for a ‘digital everything’ world.

The post MLCommons releases latest MLPerf Training benchmark results appeared first on AI News.

NVIDIA chucks its MLPerf-leading A100 GPU into Amazon’s cloud

Ryan Daws — Tue, 03 Nov 2020 15:55:37 +0000

NVIDIA’s A100 set a new record in the MLPerf benchmark last month and now it’s accessible through Amazon’s cloud.

Amazon Web Services (AWS) first launched a GPU instance 10 years ago with the NVIDIA M2050. It’s rather poetic that, a decade on, NVIDIA is now providing AWS with the hardware to power the next generation of groundbreaking innovations.

The A100 outperformed CPUs in this year’s MLPerf by up to 237x in data centre inference. A single NVIDIA DGX A100 system – with eight A100 GPUs – provides the same performance as nearly 1,000 dual-socket CPU servers on some AI applications.

“We’re at a tipping point as every industry seeks better ways to apply AI to offer new services and grow their business,” said Ian Buck, Vice President of Accelerated Computing at NVIDIA, following the benchmark results.

Businesses can access the A100 in AWS’ P4d instance. NVIDIA claims the instances reduce the time to train machine learning models by up to 3x with FP16 and up to 6x with TF32 compared to the default FP32 precision.

Each P4d instance features eight NVIDIA A100 GPUs. If even more performance is required, customers are able to access over 4,000 GPUs at a time using AWS’s Elastic Fabric Adaptor (EFA).

Dave Brown, Vice President of EC2 at AWS, said:

“The pace at which our customers have used AWS services to build, train, and deploy machine learning applications has been extraordinary. At the same time, we have heard from those customers that they want an even lower-cost way to train their massive machine learning models.
Now, with EC2 UltraClusters of P4d instances powered by NVIDIA’s latest A100 GPUs and petabit-scale networking, we’re making supercomputing-class performance available to virtually everyone, while reducing the time to train machine learning models by 3x, and lowering the cost to train by up to 60% compared to previous generation instances.”

P4d supports 400Gbps networking and makes use of NVIDIA’s technologies including NVLink, NVSwitch, NCCL, and GPUDirect RDMA to further accelerate deep learning training workloads.

Some of AWS’ customers across various industries have already begun exploring how the P4d instance can help their business.

Karley Yoder, VP & GM of Artificial Intelligence at GE Healthcare, commented:

“Our medical imaging devices generate massive amounts of data that need to be processed by our data scientists. With previous GPU clusters, it would take days to train complex AI models, such as Progressive GANs, for simulations and view the results.
Using the new P4d instances reduced processing time from days to hours. We saw two- to three-times greater speed on training models with various image sizes while achieving better performance with increased batch size and higher productivity with a faster model development cycle.”

For an example from a different industry, the research arm of Toyota is exploring how P4d can improve their existing work in developing self-driving vehicles and groundbreaking new robotics.

“The previous generation P3 instances helped us reduce our time to train machine learning models from days to hours,” explained Mike Garrison, Technical Lead of Infrastructure Engineering at Toyota Research Institute.

“We are looking forward to utilizing P4d instances, as the additional GPU memory and more efficient float formats will allow our machine learning team to train with more complex models at an even faster speed.”

P4d instances are currently available in the US East (N. Virginia) and US West (Oregon) regions. AWS says further availability is planned soon.

You can find out more about P4d instances and how to get started here.

Interested in hearing industry leaders discuss subjects like this? Attend the co-located 5G Expo, IoT Tech Expo, Blockchain Expo, AI & Big Data Expo, and Cyber Security & Cloud Expo World Series with upcoming events in Silicon Valley, London, and Amsterdam.

The post NVIDIA chucks its MLPerf-leading A100 GPU into Amazon’s cloud appeared first on AI News.