Reddit is reportedly selling data for AI training

Reddit has negotiated a content licensing deal to allow its data to be used for training AI models, according to a Bloomberg report.

Just ahead of a potential $5 billion initial public offering (IPO) debut in March, Reddit has reportedly signed a $60 million deal with an undisclosed major AI company. This move could be seen as a last-minute effort to showcase potential revenue streams in the rapidly growing AI industry to prospective investors.

Although Reddit has yet to...

OpenAI: Copyrighted data ‘impossible’ to avoid for AI training

OpenAI made waves this week with its bold assertion to a UK parliamentary committee that it would be "impossible" to develop today's leading AI systems without using vast amounts of copyrighted data.

The company argued that advanced AI tools like ChatGPT require such broad training that adhering to copyright law would be utterly unworkable.

In written testimony, OpenAI stated that between expansive copyright laws and the ubiquity of protected online content, "virtually...

Nightshade ‘poisons’ AI models to fight copyright theft

University of Chicago researchers have unveiled Nightshade, a tool designed to disrupt AI models attempting to learn from artistic imagery.

The tool – still in its developmental phase – allows artists to protect their work by subtly altering pixels in images, rendering them imperceptibly different to the human eye but confusing to AI models.

Many artists and creators have expressed concern over the use of their work in training commercial AI products without their...

OpenAI is not currently training GPT-5

Experts calling for a pause on AI development will be glad to hear that OpenAI isn’t currently training GPT-5.

OpenAI CEO Sam Altman spoke remotely at an MIT event and was quizzed about AI by computer scientist and podcaster Lex Fridman.

Altman confirmed that OpenAI is not currently developing a fifth version of its Generative Pre-trained Transformer model and is instead focusing on enhancing the capabilities of GPT-4, the latest version.

Altman was asked...

Adobe may train its algorithms with your work unless you opt-out

Unless you specifically opt-out, Adobe may assume that it’s ok to use your work to train its algorithms.

An eagle-eyed developer at the Krita Foundation noticed that Adobe had automatically opted them into a “content analysis” initiative. The program allows Adobe to “analyze your content using techniques such as machine learning (e.g. for pattern recognition) to develop and improve our products and services.”

The rule was implemented in August 2022 but managed...

Devang Sachdev, Snorkel AI: On easing the laborious process of labelling data

Correctly labelling training data for AI models is vital to avoid serious problems, as is using sufficiently large datasets. However, manually labelling massive amounts of data is time-consuming and laborious.

Using pre-labelled datasets can be problematic, as evidenced by MIT having to pull its 80 Million Tiny Images datasets. For those unaware, the popular dataset was found to contain thousands of racist and misogynistic labels that could have been used to train AI...

Meta claims its new AI supercomputer will set records

Meta (formerly Facebook) has unveiled an AI supercomputer that it claims will be the world’s fastest.

The supercomputer is called the AI Research SuperCluster (RSC) and is yet to be fully complete. However, Meta’s researchers have already begun using it for training large natural language processing (NLP) and computer vision models.

https://www.youtube.com/watch?v=fZnykn1tDSE

RSC is set to be fully built in mid-2022. Meta says that it will be the fastest in the...

OpenAI now allows developers to customise GPT-3 models

OpenAI is making it easy for developers to “fine-tune” GPT-3, enabling custom models for their applications.

The company says that existing datasets of “virtually any shape and size” can be used for custom models.

A single command in the OpenAI command-line tool, alongside a user-provided file, is all that it takes to begin training. The custom GPT-3 model will then be available for use in OpenAI’s API immediately.

One customer says that it was...

MLCommons releases latest MLPerf Training benchmark results

Open engineering consortium MLCommons has released its latest MLPerf Training community benchmark results.

MLPerf Training is a full system benchmark that tests machine learning models, software, and hardware.

The results are split into two divisions: closed and open. Closed submissions are better for comparing like-for-like performance as they use the same reference model to ensure a level playing field. Open submissions, meanwhile, allow participants to submit a...

Enterprise AI platform Dataiku announces fully-managed online service

Dataiku has announced a fully-managed online version of its enterprise AI platform to help smaller companies get started.

The data science platform enables raw data to be converted into actionable insights through data visualisation or the creation of dashboards and also supports training machine learning models.

https://www.youtube.com/watch?v=XTBms2LeGII

“Accessibility has always been of the utmost importance at Dataiku. We developed Dataiku Online to address...