datasets Archives

Devang Sachdev, Snorkel AI: On easing the laborious process of labelling data

Correctly labelling training data for AI models is vital to avoid serious problems, as is using sufficiently large datasets. However, manually labelling massive amounts of data is time-consuming and laborious.

Using pre-labelled datasets can be problematic, as evidenced by MIT having to pull its 80 Million Tiny Images datasets. For those unaware, the popular dataset was found to contain thousands of racist and misogynistic labels that could have been used to train AI...

30 September 2022 | Development

Editorial: Our predictions for the AI industry in 2022

The AI industry continued to thrive this year as companies sought ways to support business continuity through rapidly-changing situations. For those already invested, many are now doubling-down after reaping the benefits.

As we wrap up the year, it’s time to look ahead at what to expect from the AI industry in 2022.

Tackling bias

Our ‘Ethics & Society’ category got more use than most others this year, and with good reason. AI cannot thrive when it’s...

23 December 2021 | Applications

Nvidia and Microsoft develop 530 billion parameter AI model, but it still suffers from bias

Nvidia and Microsoft have developed an incredible 530 billion parameter AI model, but it still suffers from bias.

The pair claim their Megatron-Turing Natural Language Generation (MT-NLG) model is the "most powerful monolithic transformer language model trained to date".

For comparison, OpenAI’s much-lauded GPT-3 has 175 billion parameters.

The duo trained their impressive model on 15 datasets with a total of 339 billion tokens. Various sampling weights...