ai inference Archives - AI News

Google expands partnership with Anthropic to enhance AI safety

Ryan Daws — Fri, 10 Nov 2023 15:56:36 +0000

Google has announced the expansion of its partnership with Anthropic to work towards achieving the highest standards of AI safety.

The collaboration between Google and Anthropic dates back to the founding of Anthropic in 2021. The two companies have closely collaborated, with Anthropic building one of the largest Google Kubernetes Engine (GKE) clusters in the industry.

“Our longstanding partnership with Google is founded on a shared commitment to develop AI responsibly and deploy it in a way that benefits society,” said Dario Amodei, co-founder and CEO of Anthropic.

“We look forward to our continued collaboration as we work to make steerable, reliable and interpretable AI systems available to more businesses around the world.”

Anthropic utilises Google’s AlloyDB, a fully managed PostgreSQL-compatible database, for handling transactional data with high performance and reliability. Additionally, Google’s BigQuery data warehouse is employed to analyse vast datasets, extracting valuable insights for Anthropic’s operations.

As part of the expanded partnership, Anthropic will leverage Google’s latest generation Cloud TPU v5e chips for AI inference. Anthropic will use the chips to efficiently scale its powerful Claude large language model, which ranks only behind GPT-4 in many benchmarks.

The announcement comes on the heels of both companies participating in the inaugural AI Safety Summit (AISS) at Bletchley Park, hosted by the UK government. The summit brought together government officials, technology leaders, and experts to address concerns around frontier AI.

Google and Anthropic are also engaged in collaborative efforts with the Frontier Model Forum and MLCommons, contributing to the development of robust measures for AI safety.

To enhance security for organisations deploying Anthropic’s models on Google Cloud, Anthropic is now utilising Google Cloud’s security services. This includes Chronicle Security Operations, Secure Enterprise Browsing, and Security Command Center, providing visibility, threat detection, and access control.

“Anthropic and Google Cloud share the same values when it comes to developing AI–it needs to be done in both a bold and responsible way,” commented Thomas Kurian, CEO of Google Cloud.

“This expanded partnership with Anthropic – built on years of working together – will bring AI to more people safely and securely, and provides another example of how the most innovative and fastest growing AI startups are building on Google Cloud.”

Google and Anthropic’s expanded partnership promises to be a critical step in advancing AI safety standards and fostering responsible development.

(Photo by charlesdeluvio on Unsplash)

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with Digital Transformation Week.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Google expands partnership with Anthropic to enhance AI safety appeared first on AI News.

Damian Bogunowicz, Neural Magic: On revolutionising deep learning with CPUs

Ryan Daws — Mon, 24 Jul 2023 11:27:02 +0000

AI News spoke with Damian Bogunowicz, a machine learning engineer at Neural Magic, to shed light on the company’s innovative approach to deep learning model optimisation and inference on CPUs.

One of the key challenges in developing and deploying deep learning models lies in their size and computational requirements. However, Neural Magic tackles this issue head-on through a concept called compound sparsity.

Compound sparsity combines techniques such as unstructured pruning, quantisation, and distillation to significantly reduce the size of neural networks while maintaining their accuracy.

“We have developed our own sparsity-aware runtime that leverages CPU architecture to accelerate sparse models. This approach challenges the notion that GPUs are necessary for efficient deep learning,” explains Bogunowicz.

Bogunowicz emphasised the benefits of their approach, highlighting that more compact models lead to faster deployments and can be run on ubiquitous CPU-based machines. The ability to optimise and run specified networks efficiently without relying on specialised hardware is a game-changer for machine learning practitioners, empowering them to overcome the limitations and costs associated with GPU usage.

When asked about the suitability of sparse neural networks for enterprises, Bogunowicz explained that the vast majority of companies can benefit from using sparse models.

By removing up to 90 percent of parameters without impacting accuracy, enterprises can achieve more efficient deployments. While extremely critical domains like autonomous driving or autonomous aeroplanes may require maximum accuracy and minimal sparsity, the advantages of sparse models outweigh the limitations for the majority of businesses.

Looking ahead, Bogunowicz expressed his excitement about the future of large language models (LLMs) and their applications.

“I’m particularly excited about the future of large language models LLMs. Mark Zuckerberg discussed enabling AI agents, acting as personal assistants or salespeople, on platforms like WhatsApp,” says Bogunowicz.

One example that caught his attention was a chatbot used by Khan Academy—an AI tutor that guides students to solve problems by providing hints rather than revealing solutions outright. This application demonstrates the value that LLMs can bring to the education sector, facilitating the learning process while empowering students to develop problem-solving skills.

“Our research has shown that you can optimise LLMs efficiently for CPU deployment. We have published a research paper on SparseGPT that demonstrates the removal of around 100 billion parameters using one-shot pruning without compromising model quality,” explains Bogunowicz.

“This means there may not be a need for GPU clusters in the future of AI inference. Our goal is to soon provide open-source LLMs to the community and empower enterprises to have control over their products and models, rather than relying on big tech companies.”

As for Neural Magic’s future, Bogunowicz revealed two exciting developments they will be sharing at the upcoming AI & Big Data Expo Europe.

Firstly, they will showcase their support for running AI models on edge devices, specifically x86 and ARM architectures. This expands the possibilities for AI applications in various industries.

Secondly, they will unveil their model optimisation platform, Sparsify, which enables the seamless application of state-of-the-art pruning, quantisation, and distillation algorithms through a user-friendly web app and simple API calls. Sparsify aims to accelerate inference without sacrificing accuracy, providing enterprises with an elegant and intuitive solution.

Neural Magic’s commitment to democratising machine learning infrastructure by leveraging CPUs is impressive. Their focus on compound sparsity and their upcoming advancements in edge computing demonstrate their dedication to empowering businesses and researchers alike.

As we eagerly await the developments presented at AI & Big Data Expo Europe, it’s clear that Neural Magic is poised to make a significant impact in the field of deep learning.

You can watch our full interview with Bogunowicz below:

(Photo by Google DeepMind on Unsplash)

Neural Magic is a key sponsor of this year’s AI & Big Data Expo Europe, which is being held in Amsterdam between 26-27 September 2023.

Swing by Neural Magic’s booth at stand #178 to learn more about how the company enables organisations to use compute-heavy models in a cost-efficient and scalable way.

The post Damian Bogunowicz, Neural Magic: On revolutionising deep learning with CPUs appeared first on AI News.

NVIDIA sets another AI inference record in MLPerf

Ryan Daws — Thu, 22 Oct 2020 09:16:41 +0000

NVIDIA has set yet another record for AI inference in MLPerf with its A100 Tensor Core GPUs.

MLPerf consists of five inference benchmarks which cover the main three AI applications today: image classification, object detection, and translation.

“Industry-standard MLPerf benchmarks provide relevant performance data on widely used AI networks and help make informed AI platform buying decisions,” said Rangan Majumder, VP of Search and AI at Microsoft.

Last year, NVIDIA led all five benchmarks for both server and offline data centre scenarios with its Turing GPUs. A dozen companies participated.

23 companies participated in this year’s MLPerf but NVIDIA maintained its lead with the A100 outperforming CPUs by up to 237x in data centre inference.

For perspective, NVIDIA notes that a single NVIDIA DGX A100 system – with eight A100 GPUs – provides the same performance as nearly 1,000 dual-socket CPU servers on some AI applications.

“We’re at a tipping point as every industry seeks better ways to apply AI to offer new services and grow their business,” said Ian Buck, Vice President of Accelerated Computing at NVIDIA.

“The work we’ve done to achieve these results on MLPerf gives companies a new level of AI performance to improve our everyday lives.”

The widespread availability of NVIDIA’s AI platform through every major cloud and data centre infrastructure provider is unlocking huge potential for companies across various industries to improve their operations.

Interested in hearing industry leaders discuss subjects like this? Attend the co-located 5G Expo, IoT Tech Expo, Blockchain Expo, AI & Big Data Expo, and Cyber Security & Cloud Expo World Series with upcoming events in Silicon Valley, London, and Amsterdam.

The post NVIDIA sets another AI inference record in MLPerf appeared first on AI News.