large language models Archives - AI News https://www.artificialintelligence-news.com/tag/large-language-models/ Artificial Intelligence News Fri, 14 Jun 2024 16:07:59 +0000 en-GB hourly 1 https://www.artificialintelligence-news.com/wp-content/uploads/sites/9/2020/09/ai-icon-60x60.png large language models Archives - AI News https://www.artificialintelligence-news.com/tag/large-language-models/ 32 32 NLEPs: Bridging the gap between LLMs and symbolic reasoning https://www.artificialintelligence-news.com/2024/06/14/nleps-bridging-the-gap-between-llms-symbolic-reasoning/ https://www.artificialintelligence-news.com/2024/06/14/nleps-bridging-the-gap-between-llms-symbolic-reasoning/#respond Fri, 14 Jun 2024 16:07:57 +0000 https://www.artificialintelligence-news.com/?p=15021 Researchers have introduced a novel approach called natural language embedded programs (NLEPs) to improve the numerical and symbolic reasoning capabilities of large language models (LLMs). The technique involves prompting LLMs to generate and execute Python programs to solve user queries, then output solutions in natural language. While LLMs like ChatGPT have demonstrated impressive performance on... Read more »

The post NLEPs: Bridging the gap between LLMs and symbolic reasoning appeared first on AI News.

]]>
Researchers have introduced a novel approach called natural language embedded programs (NLEPs) to improve the numerical and symbolic reasoning capabilities of large language models (LLMs). The technique involves prompting LLMs to generate and execute Python programs to solve user queries, then output solutions in natural language.

While LLMs like ChatGPT have demonstrated impressive performance on various tasks, they often struggle with problems requiring numerical or symbolic reasoning.

NLEPs follow a four-step problem-solving template: calling necessary packages, importing natural language representations of required knowledge, implementing a solution-calculating function, and outputting results as natural language with optional data visualisation.

This approach offers several advantages, including improved accuracy, transparency, and efficiency. Users can investigate generated programs and fix errors directly, avoiding the need to rerun entire models for troubleshooting. Additionally, a single NLEP can be reused for multiple tasks by replacing certain variables.

The researchers found that NLEPs enabled GPT-4 to achieve over 90% accuracy on various symbolic reasoning tasks, outperforming task-specific prompting methods by 30%

Beyond accuracy improvements, NLEPs could enhance data privacy by running programs locally, eliminating the need to send sensitive user data to external companies for processing. The technique may also boost the performance of smaller language models without costly retraining.

However, NLEPs rely on a model’s program generation capability and may not work as well with smaller models trained on limited datasets. Future research will explore methods to make smaller LLMs generate more effective NLEPs and investigate the impact of prompt variations on reasoning robustness.

The research, supported in part by the Center for Perceptual and Interactive Intelligence of Hong Kong, will be presented at the Annual Conference of the North American Chapter of the Association for Computational Linguistics later this month.

(Photo by Alex Azabache)

See also: Apple is reportedly getting free ChatGPT access

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post NLEPs: Bridging the gap between LLMs and symbolic reasoning appeared first on AI News.

]]>
https://www.artificialintelligence-news.com/2024/06/14/nleps-bridging-the-gap-between-llms-symbolic-reasoning/feed/ 0
NVIDIA unveils Blackwell architecture to power next GenAI wave  https://www.artificialintelligence-news.com/2024/03/19/nvidia-unveils-blackwell-architecture-power-next-genai-wave/ https://www.artificialintelligence-news.com/2024/03/19/nvidia-unveils-blackwell-architecture-power-next-genai-wave/#respond Tue, 19 Mar 2024 10:44:25 +0000 https://www.artificialintelligence-news.com/?p=14575 NVIDIA has announced its next-generation Blackwell GPU architecture, designed to usher in a new era of accelerated computing and enable organisations to build and run real-time generative AI on trillion-parameter large language models. The Blackwell platform promises up to 25 times lower cost and energy consumption compared to its predecessor: the Hopper architecture. Named after... Read more »

The post NVIDIA unveils Blackwell architecture to power next GenAI wave  appeared first on AI News.

]]>
NVIDIA has announced its next-generation Blackwell GPU architecture, designed to usher in a new era of accelerated computing and enable organisations to build and run real-time generative AI on trillion-parameter large language models.

The Blackwell platform promises up to 25 times lower cost and energy consumption compared to its predecessor: the Hopper architecture. Named after pioneering mathematician and statistician David Harold Blackwell, the new GPU architecture introduces six transformative technologies.

“Generative AI is the defining technology of our time. Blackwell is the engine to power this new industrial revolution,” said Jensen Huang, Founder and CEO of NVIDIA. “Working with the most dynamic companies in the world, we will realise the promise of AI for every industry.”

The key innovations in Blackwell include the world’s most powerful chip with 208 billion transistors, a second-generation Transformer Engine to support double the compute and model sizes, fifth-generation NVLink interconnect for high-speed multi-GPU communication, and advanced engines for reliability, security, and data decompression.

Central to Blackwell is the NVIDIA GB200 Grace Blackwell Superchip, which combines two B200 Tensor Core GPUs with a Grace CPU over an ultra-fast 900GB/s NVLink interconnect. Multiple GB200 Superchips can be combined into systems like the liquid-cooled GB200 NVL72 platform with up to 72 Blackwell GPUs and 36 Grace CPUs, offering 1.4 exaflops of AI performance.

NVIDIA has already secured support from major cloud providers like Amazon Web Services, Google Cloud, Microsoft Azure, and Oracle Cloud Infrastructure to offer Blackwell-powered instances. Other partners planning Blackwell products include Dell Technologies, Meta, Microsoft, OpenAI, Oracle, Tesla, and many others across hardware, software, and sovereign clouds.

Sundar Pichai, CEO of Alphabet and Google, said: “We are fortunate to have a longstanding partnership with NVIDIA, and look forward to bringing the breakthrough capabilities of the Blackwell GPU to our Cloud customers and teams across Google to accelerate future discoveries.”

The Blackwell architecture and supporting software stack will enable new breakthroughs across industries from engineering and chip design to scientific computing and generative AI.

Mark Zuckerberg, Founder and CEO of Meta, commented: “AI already powers everything from our large language models to our content recommendations, ads, and safety systems, and it’s only going to get more important in the future.

“We’re looking forward to using NVIDIA’s Blackwell to help train our open-source Llama models and build the next generation of Meta AI and consumer products.”

With its massive performance gains and efficiency, Blackwell could be the engine to finally make real-time trillion-parameter AI a reality for enterprises.

See also: Elon Musk’s xAI open-sources Grok

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post NVIDIA unveils Blackwell architecture to power next GenAI wave  appeared first on AI News.

]]>
https://www.artificialintelligence-news.com/2024/03/19/nvidia-unveils-blackwell-architecture-power-next-genai-wave/feed/ 0
AIs in India will need government permission before launching https://www.artificialintelligence-news.com/2024/03/04/ai-india-need-government-permission-before-launching/ https://www.artificialintelligence-news.com/2024/03/04/ai-india-need-government-permission-before-launching/#respond Mon, 04 Mar 2024 17:03:13 +0000 https://www.artificialintelligence-news.com/?p=14478 In an advisory issued by India’s Ministry of Electronics and Information Technology (MeitY) last Friday, it was declared that any AI technology still in development must acquire explicit government permission before being released to the public. Developers will also only be able to deploy these technologies after labelling the potential fallibility or unreliability of the... Read more »

The post AIs in India will need government permission before launching appeared first on AI News.

]]>
In an advisory issued by India’s Ministry of Electronics and Information Technology (MeitY) last Friday, it was declared that any AI technology still in development must acquire explicit government permission before being released to the public.

Developers will also only be able to deploy these technologies after labelling the potential fallibility or unreliability of the output generated.

Furthermore, the document outlines plans for implementing a “consent popup” mechanism to inform users about potential defects or errors produced by AI. It also mandates the labelling of deepfakes with permanent unique metadata or other identifiers to prevent misuse.

In addition to these measures, the advisory orders all intermediaries or platforms to ensure that any AI model product – including large language models (LLM) – does not permit bias, discrimination, or threaten the integrity of the electoral process.

Some industry figures have criticised India’s plans as going too far:

Developers are requested to comply with the advisory within 15 days of its issuance. It has been suggested that after compliance and application for permission to release a product, developers may be required to perform a demo for government officials or undergo stress testing.

Although the advisory is not legally binding at present, it signifies the government’s expectations and hints at the future direction of regulation in the AI sector.

“We are doing it as an advisory today asking you (the AI platforms) to comply with it,” said IT minister Rajeev Chandrasekhar. He added that this stance would eventually be encoded in legislation.

“Generative AI or AI platforms available on the internet will have to take full responsibility for what the platform does, and cannot escape the accountability by saying that their platform is under testing,” continued Chandrasekhar, as reported by local media.

(Photo by Naveed Ahmed on Unsplash)

See also: Elon Musk sues OpenAI over alleged breach of nonprofit agreement

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post AIs in India will need government permission before launching appeared first on AI News.

]]>
https://www.artificialintelligence-news.com/2024/03/04/ai-india-need-government-permission-before-launching/feed/ 0
DeepMind framework offers breakthrough in LLMs’ reasoning https://www.artificialintelligence-news.com/2024/02/08/deepmind-framework-offers-breakthrough-llm-reasoning/ https://www.artificialintelligence-news.com/2024/02/08/deepmind-framework-offers-breakthrough-llm-reasoning/#respond Thu, 08 Feb 2024 11:28:05 +0000 https://www.artificialintelligence-news.com/?p=14338 A breakthrough approach in enhancing the reasoning abilities of large language models (LLMs) has been unveiled by researchers from Google DeepMind and the University of Southern California. Their new ‘SELF-DISCOVER’ prompting framework – published this week on arXiV and Hugging Face – represents a significant leap beyond existing techniques, potentially revolutionising the performance of leading... Read more »

The post DeepMind framework offers breakthrough in LLMs’ reasoning appeared first on AI News.

]]>
A breakthrough approach in enhancing the reasoning abilities of large language models (LLMs) has been unveiled by researchers from Google DeepMind and the University of Southern California.

Their new ‘SELF-DISCOVER’ prompting framework – published this week on arXiV and Hugging Face – represents a significant leap beyond existing techniques, potentially revolutionising the performance of leading models such as OpenAI’s GPT-4 and Google’s PaLM 2.

The framework promises substantial enhancements in tackling challenging reasoning tasks. It demonstrates remarkable improvements, boasting up to a 32% performance increase compared to traditional methods like Chain of Thought (CoT). This novel approach revolves around LLMs autonomously uncovering task-intrinsic reasoning structures to navigate complex problems.

At its core, the framework empowers LLMs to self-discover and utilise various atomic reasoning modules – such as critical thinking and step-by-step analysis – to construct explicit reasoning structures.

By mimicking human problem-solving strategies, the framework operates in two stages:

  • Stage one involves composing a coherent reasoning structure intrinsic to the task, leveraging a set of atomic reasoning modules and task examples.
  • During decoding, LLMs then follow this self-discovered structure to arrive at the final solution.

In extensive testing across various reasoning tasks – including Big-Bench Hard, Thinking for Doing, and Math – the self-discover approach consistently outperformed traditional methods. Notably, it achieved an accuracy of 81%, 85%, and 73% across the three tasks with GPT-4, surpassing chain-of-thought and plan-and-solve techniques.

However, the implications of this research extend far beyond mere performance gains.

By equipping LLMs with enhanced reasoning capabilities, the framework paves the way for tackling more challenging problems and brings AI closer to achieving general intelligence. Transferability studies conducted by the researchers further highlight the universal applicability of the composed reasoning structures, aligning with human reasoning patterns.

As the landscape evolves, breakthroughs like the SELF-DISCOVER prompting framework represent crucial milestones in advancing the capabilities of language models and offering a glimpse into the future of AI.

(Photo by Victor on Unsplash)

See also: The UK is outpacing the US for AI hiring

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with Digital Transformation Week and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post DeepMind framework offers breakthrough in LLMs’ reasoning appeared first on AI News.

]]>
https://www.artificialintelligence-news.com/2024/02/08/deepmind-framework-offers-breakthrough-llm-reasoning/feed/ 0
NCSC: AI to significantly boost cyber threats over next two years https://www.artificialintelligence-news.com/2024/01/24/ncsc-ai-significantly-boost-cyber-threats-next-two-years/ https://www.artificialintelligence-news.com/2024/01/24/ncsc-ai-significantly-boost-cyber-threats-next-two-years/#respond Wed, 24 Jan 2024 16:50:10 +0000 https://www.artificialintelligence-news.com/?p=14257 A report published by the UK’s National Cyber Security Centre (NCSC) warns that AI will substantially increase cyber threats over the next two years.  The centre warns of a surge in ransomware attacks in particular; involving hackers deploying malicious software to encrypt a victim’s files or entire system and demanding a ransom payment for the... Read more »

The post NCSC: AI to significantly boost cyber threats over next two years appeared first on AI News.

]]>
A report published by the UK’s National Cyber Security Centre (NCSC) warns that AI will substantially increase cyber threats over the next two years. 

The centre warns of a surge in ransomware attacks in particular; involving hackers deploying malicious software to encrypt a victim’s files or entire system and demanding a ransom payment for the decryption key.

The NCSC assessment predicts AI will enhance threat actors’ capabilities mainly in carrying out more persuasive phishing attacks that trick individuals into providing sensitive information or clicking on malicious links.

“Generative AI can already create convincing interactions like documents that fool people, free of the translation and grammatical errors common in phishing emails,” the report states. 

The advent of generative AI, capable of creating convincing interactions and documents free of common phishing red flags, is identified as a key contributor to the rising threat landscape over the next two years.

The NCSC assessment identifies challenges in cyber resilience, citing the difficulty in verifying the legitimacy of emails and password reset requests due to generative AI and large language models. The shrinking time window between security updates and threat exploitation further complicates rapid vulnerability patching for network managers.

James Babbage, director general for threats at the National Crime Agency, commented: “AI services lower barriers to entry, increasing the number of cyber criminals, and will boost their capability by improving the scale, speed, and effectiveness of existing attack methods.”

However, the NCSC report also outlined how AI could bolster cybersecurity through improved attack detection and system design. It calls for further research on how developments in defensive AI solutions can mitigate evolving threats.

Access to quality data, skills, tools, and time makes advanced AI-powered cyber operations feasible mainly for highly capable state actors currently. But the NCSC warns these barriers to entry will progressively fall as capable groups monetise and sell AI-enabled hacking tools.

Extent of capability uplift by AI over next two years:

(Credit: NCSC)

Lindy Cameron, CEO of the NCSC, stated: “We must ensure that we both harness AI technology for its vast potential and manage its risks – including its implications on the cyber threat.”

The UK government has allocated £2.6 billion under its Cyber Security Strategy 2022 to strengthen the country’s resilience to emerging high-tech threats.

AI is positioned to substantially change the cyber risk landscape in the near future. Continuous investment in defensive capabilities and research will be vital to counteract its potential to empower attackers.

A full copy of the NCSC’s report can be found here.

(Photo by Muha Ajjan on Unsplash)

See also: AI-generated Biden robocall urges Democrats not to vote

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with Digital Transformation Week and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post NCSC: AI to significantly boost cyber threats over next two years appeared first on AI News.

]]>
https://www.artificialintelligence-news.com/2024/01/24/ncsc-ai-significantly-boost-cyber-threats-next-two-years/feed/ 0
AI & Big Data Expo: Demystifying AI and seeing past the hype https://www.artificialintelligence-news.com/2023/12/07/ai-big-data-expo-demystifying-ai-seeing-past-hype/ https://www.artificialintelligence-news.com/2023/12/07/ai-big-data-expo-demystifying-ai-seeing-past-hype/#respond Thu, 07 Dec 2023 16:29:45 +0000 https://www.artificialintelligence-news.com/?p=14032 In a presentation at AI & Big Data Expo Global, Adam Craven, Director at Y-Align, shed light on the practical applications of AI and the pitfalls often overlooked in the hype surrounding it. Craven — with an extensive background in engineering and leadership roles at McKinsey & Company, HSBC, Nokia, among others — shared his... Read more »

The post AI & Big Data Expo: Demystifying AI and seeing past the hype appeared first on AI News.

]]>
In a presentation at AI & Big Data Expo Global, Adam Craven, Director at Y-Align, shed light on the practical applications of AI and the pitfalls often overlooked in the hype surrounding it.

Craven — with an extensive background in engineering and leadership roles at McKinsey & Company, HSBC, Nokia, among others — shared his experiences as a consultant helping C-level executives navigate the complex landscape of AI adoption. The core message revolved around understanding AI beyond the hype to make informed decisions that align with organisational goals.

Breaking down the AI hype

Craven introduced a systematic approach to demystifying AI, emphasising the need to break down the overarching concept into smaller, manageable components. He outlined key attributes of neural networks, embeddings, and transformers, focusing on large language models as a shared foundation.

  • Neural networks — described as probabilistic and adaptable — form the backbone of AI, mimicking human learning processes.
  • Embeddings allow computers to navigate between levels of abstraction, somewhat akin to human cognition.
  • Transformers — the “attention” mechanism — are the linchpin of the AI revolution, allowing machines to understand context and meaning.

LLMs as search and research engines

Craven assesses if LLMs alone make good search engines. They understand search intent exceptionally well but don’t have access to vast data, give accurate results, or reference sources—all of which are key search requirements.

However, Craven highlighted that large language models (LLMs) are powerful summarising engines for research. He emphasised their ability to summarise data, translate between languages, and serve as research assistants:

Craven went on to caution against relying solely on LLMs for complex tasks—showcasing a study where consultants using language models underperformed in nuanced analysis.

De-hyping AI: Setting realistic expectations

The presentation concluded with practical use cases for organisations, such as documentation tools, high-level decision-making, code review tools, and multimodal decision-makers. Craven advised a thoughtful evaluation of when LLMs are useful, ensuring they align with organisational values and principles.

However, Craven warns against inflated claims about AI’s performance—citing examples where language models enhanced certain tasks but fell short in others. He urged the audience to consider the context and nuances when evaluating AI’s impact, avoiding unwarranted expectations.

Craven offered actionable insights for implementation, urging organisations to capture data for future use, create test cases for specific use cases, and apply a systematic framework to develop a strategy. The emphasis remained on seeing through the hype, saving millions by strategically incorporating AI into existing workflows.

In a world inundated with AI promises, Adam Craven’s pragmatic approach provides a roadmap for organisations to leverage the power of AI while avoiding common pitfalls.

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with Cyber Security & Cloud Expo and Digital Transformation Week.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post AI & Big Data Expo: Demystifying AI and seeing past the hype appeared first on AI News.

]]>
https://www.artificialintelligence-news.com/2023/12/07/ai-big-data-expo-demystifying-ai-seeing-past-hype/feed/ 0
Amdocs, NVIDIA and Microsoft Azure build custom LLMs for telcos https://www.artificialintelligence-news.com/2023/11/16/amdocs-nvidia-microsoft-azure-build-custom-llms-for-telcos/ https://www.artificialintelligence-news.com/2023/11/16/amdocs-nvidia-microsoft-azure-build-custom-llms-for-telcos/#respond Thu, 16 Nov 2023 12:09:48 +0000 https://www.artificialintelligence-news.com/?p=13907 Amdocs has partnered with NVIDIA and Microsoft Azure to build custom Large Language Models (LLMs) for the $1.7 trillion global telecoms industry. Leveraging the power of NVIDIA’s AI foundry service on Microsoft Azure, Amdocs aims to meet the escalating demand for data processing and analysis in the telecoms sector. The telecoms industry processes hundreds of... Read more »

The post Amdocs, NVIDIA and Microsoft Azure build custom LLMs for telcos appeared first on AI News.

]]>
Amdocs has partnered with NVIDIA and Microsoft Azure to build custom Large Language Models (LLMs) for the $1.7 trillion global telecoms industry.

Leveraging the power of NVIDIA’s AI foundry service on Microsoft Azure, Amdocs aims to meet the escalating demand for data processing and analysis in the telecoms sector.

The telecoms industry processes hundreds of petabytes of data daily. With the anticipation of global data transactions surpassing 180 zettabytes by 2025, telcos are turning to generative AI to enhance efficiency and productivity.

NVIDIA’s AI foundry service – comprising the NVIDIA AI Foundation Models, NeMo framework, and DGX Cloud AI supercomputing – provides an end-to-end solution for creating and optimising custom generative AI models.

Amdocs will utilise the AI foundry service to develop enterprise-grade LLMs tailored for the telco and media industries, facilitating the deployment of generative AI use cases across various business domains.

This collaboration builds on the existing Amdocs-Microsoft partnership, ensuring the adoption of applications in secure, trusted environments, both on-premises and in the cloud.

Enterprises are increasingly focusing on developing custom models to perform industry-specific tasks. Amdocs serves over 350 of the world’s leading telecom and media companies across 90 countries. This partnership with NVIDIA opens avenues for exploring generative AI use cases, with initial applications focusing on customer care and network operations.

In customer care, the collaboration aims to accelerate the resolution of inquiries by leveraging information from across company data. In network operations, the companies are exploring solutions to address configuration, coverage, or performance issues in real-time.

This move by Amdocs positions the company at the forefront of ushering in a new era for the telecoms industry by harnessing the capabilities of custom generative AI models.

(Photo by Danist Soh on Unsplash)

See also: Wolfram Research: Injecting reliability into generative AI

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with Digital Transformation Week.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Amdocs, NVIDIA and Microsoft Azure build custom LLMs for telcos appeared first on AI News.

]]>
https://www.artificialintelligence-news.com/2023/11/16/amdocs-nvidia-microsoft-azure-build-custom-llms-for-telcos/feed/ 0
Damian Bogunowicz, Neural Magic: On revolutionising deep learning with CPUs https://www.artificialintelligence-news.com/2023/07/24/damian-bogunowicz-neural-magic-revolutionising-deep-learning-cpus/ https://www.artificialintelligence-news.com/2023/07/24/damian-bogunowicz-neural-magic-revolutionising-deep-learning-cpus/#respond Mon, 24 Jul 2023 11:27:02 +0000 https://www.artificialintelligence-news.com/?p=13305 AI News spoke with Damian Bogunowicz, a machine learning engineer at Neural Magic, to shed light on the company’s innovative approach to deep learning model optimisation and inference on CPUs. One of the key challenges in developing and deploying deep learning models lies in their size and computational requirements. However, Neural Magic tackles this issue... Read more »

The post Damian Bogunowicz, Neural Magic: On revolutionising deep learning with CPUs appeared first on AI News.

]]>
AI News spoke with Damian Bogunowicz, a machine learning engineer at Neural Magic, to shed light on the company’s innovative approach to deep learning model optimisation and inference on CPUs.

One of the key challenges in developing and deploying deep learning models lies in their size and computational requirements. However, Neural Magic tackles this issue head-on through a concept called compound sparsity.

Compound sparsity combines techniques such as unstructured pruning, quantisation, and distillation to significantly reduce the size of neural networks while maintaining their accuracy. 

“We have developed our own sparsity-aware runtime that leverages CPU architecture to accelerate sparse models. This approach challenges the notion that GPUs are necessary for efficient deep learning,” explains Bogunowicz.

Bogunowicz emphasised the benefits of their approach, highlighting that more compact models lead to faster deployments and can be run on ubiquitous CPU-based machines. The ability to optimise and run specified networks efficiently without relying on specialised hardware is a game-changer for machine learning practitioners, empowering them to overcome the limitations and costs associated with GPU usage.

When asked about the suitability of sparse neural networks for enterprises, Bogunowicz explained that the vast majority of companies can benefit from using sparse models.

By removing up to 90 percent of parameters without impacting accuracy, enterprises can achieve more efficient deployments. While extremely critical domains like autonomous driving or autonomous aeroplanes may require maximum accuracy and minimal sparsity, the advantages of sparse models outweigh the limitations for the majority of businesses.

Looking ahead, Bogunowicz expressed his excitement about the future of large language models (LLMs) and their applications.

“I’m particularly excited about the future of large language models LLMs. Mark Zuckerberg discussed enabling AI agents, acting as personal assistants or salespeople, on platforms like WhatsApp,” says Bogunowicz.

One example that caught his attention was a chatbot used by Khan Academy—an AI tutor that guides students to solve problems by providing hints rather than revealing solutions outright. This application demonstrates the value that LLMs can bring to the education sector, facilitating the learning process while empowering students to develop problem-solving skills.

“Our research has shown that you can optimise LLMs efficiently for CPU deployment. We have published a research paper on SparseGPT that demonstrates the removal of around 100 billion parameters using one-shot pruning without compromising model quality,” explains Bogunowicz.

“This means there may not be a need for GPU clusters in the future of AI inference. Our goal is to soon provide open-source LLMs to the community and empower enterprises to have control over their products and models, rather than relying on big tech companies.”

As for Neural Magic’s future, Bogunowicz revealed two exciting developments they will be sharing at the upcoming AI & Big Data Expo Europe.

Firstly, they will showcase their support for running AI models on edge devices, specifically x86 and ARM architectures. This expands the possibilities for AI applications in various industries.

Secondly, they will unveil their model optimisation platform, Sparsify, which enables the seamless application of state-of-the-art pruning, quantisation, and distillation algorithms through a user-friendly web app and simple API calls. Sparsify aims to accelerate inference without sacrificing accuracy, providing enterprises with an elegant and intuitive solution.

Neural Magic’s commitment to democratising machine learning infrastructure by leveraging CPUs is impressive. Their focus on compound sparsity and their upcoming advancements in edge computing demonstrate their dedication to empowering businesses and researchers alike.

As we eagerly await the developments presented at AI & Big Data Expo Europe, it’s clear that Neural Magic is poised to make a significant impact in the field of deep learning.

You can watch our full interview with Bogunowicz below:

(Photo by Google DeepMind on Unsplash)

Neural Magic is a key sponsor of this year’s AI & Big Data Expo Europe, which is being held in Amsterdam between 26-27 September 2023.

Swing by Neural Magic’s booth at stand #178 to learn more about how the company enables organisations to use compute-heavy models in a cost-efficient and scalable way.

The post Damian Bogunowicz, Neural Magic: On revolutionising deep learning with CPUs appeared first on AI News.

]]>
https://www.artificialintelligence-news.com/2023/07/24/damian-bogunowicz-neural-magic-revolutionising-deep-learning-cpus/feed/ 0
Databricks acquires LLM pioneer MosaicML for $1.3B https://www.artificialintelligence-news.com/2023/06/28/databricks-acquires-llm-pioneer-mosaicml-for-1-3b/ https://www.artificialintelligence-news.com/2023/06/28/databricks-acquires-llm-pioneer-mosaicml-for-1-3b/#respond Wed, 28 Jun 2023 09:22:15 +0000 https://www.artificialintelligence-news.com/?p=13238 Databricks has announced its definitive agreement to acquire MosaicML, a pioneer in large language models (LLMs). This strategic move aims to make generative AI accessible to organisations of all sizes, allowing them to develop, possess, and safeguard their own generative AI models using their own data.  The acquisition, valued at ~$1.3 billion – inclusive of... Read more »

The post Databricks acquires LLM pioneer MosaicML for $1.3B appeared first on AI News.

]]>
Databricks has announced its definitive agreement to acquire MosaicML, a pioneer in large language models (LLMs).

This strategic move aims to make generative AI accessible to organisations of all sizes, allowing them to develop, possess, and safeguard their own generative AI models using their own data. 

The acquisition, valued at ~$1.3 billion – inclusive of retention packages – showcases Databricks’ commitment to democratising AI and reinforcing the company’s Lakehouse platform as a leading environment for building generative AI and LLMs.

Naveen Rao, Co-Founder and CEO at MosaicML, said:

“At MosaicML, we believe in a world where everyone is empowered to build and train their own models, imbued with their own opinions and viewpoints — and joining forces with Databricks will help us make that belief a reality.

We started MosaicML to solve the hard engineering and research problems necessary to make large-scale training more accessible to everyone. With the recent generative AI wave, this mission has taken centre stage.

Together with Databricks, we will tip the scales in the favour of many — and we’ll do it as kindred spirits: researchers turned entrepreneurs sharing a similar mission. We look forward to continuing this journey together with the AI community.”

MosaicML has gained recognition for its cutting-edge MPT large language models, with millions of downloads for MPT-7B and the recent release of MPT-30B.

The platform has demonstrated how organisations can swiftly construct and train their own state-of-the-art models cost-effectively by utilising their own data. Esteemed customers like AI2, Generally Intelligent, Hippocratic AI, Replit, and Scatter Labs have leveraged MosaicML for a diverse range of generative AI applications.

The primary objective of this acquisition is to provide organisations with a simple and rapid method to develop, own, and secure their models. By combining the capabilities of Databricks’ Lakehouse Platform with MosaicML’s technology, customers can maintain control, security, and ownership of their valuable data without incurring exorbitant costs.

MosaicML’s automatic optimisation of model training enables 2x-7x faster training compared to standard approaches, and the near linear scaling of resources allows for the training of multi-billion-parameter models within hours. Consequently, Databricks and MosaicML aim to reduce the cost of training and utilising LLMs from millions to thousands of dollars.

The integration of Databricks’ unified Data and AI platform with MosaicML’s generative AI training capabilities will result in a robust and flexible platform capable of serving the largest organisations and addressing various AI use cases.

Upon the completion of the transaction, the entire MosaicML team – including its renowned research team – is expected to join Databricks.

MosaicML’s machine learning and neural networks experts are at the forefront of AI research, striving to enhance model training efficiency. They have contributed to popular open-source foundational models like MPT-30B, as well as the training algorithms powering MosaicML’s products.

The MosaicML platform will be progressively supported, scaled, and integrated to provide customers with a seamless unified platform where they can build, own, and secure their generative AI models. The partnership between Databricks and MosaicML empowers customers with the freedom to construct their own models, train them using their unique data, and develop differentiating intellectual property for their businesses.

The completion of the proposed acquisition is subject to customary closing conditions, including regulatory clearances.

(Photo by Glen Carrie on Unsplash)

See also: MosaicML’s latest models outperform GPT-3 with just 30B parameters

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The event is co-located with Digital Transformation Week.

The post Databricks acquires LLM pioneer MosaicML for $1.3B appeared first on AI News.

]]>
https://www.artificialintelligence-news.com/2023/06/28/databricks-acquires-llm-pioneer-mosaicml-for-1-3b/feed/ 0