MIT has removed a dataset which leads to misogynistic, racist AI models

About the Author

By Ryan Daws | July 2, 2020 https://twitter.com/gadget_ry

Categories: Ethics & Society, Machine Learning,

Ryan Daws is a senior editor at TechForge Media with over a decade of experience in crafting compelling narratives and making complex topics accessible. His articles and interviews with industry leaders have earned him recognition as a key influencer by organisations like Onalytica. Under his leadership, publications have been praised by analyst firms such as Forrester for their excellence and performance. Connect with him on X (@gadget_ry) or Mastodon (@gadgetry@techhub.social)

MIT has apologised for, and taken offline, a dataset which trains AI models with misogynistic and racist tendencies.

The dataset in question is called 80 Million Tiny Images and was created in 2008. Designed for training AIs to detect objects, the dataset is a huge collection of pictures which are individually labelled based on what they feature.

Machine-learning models are trained using these images and their labels. An image of a street – when fed into an AI trained on such a dataset – could tell you about things it contains such as cars, streetlights, pedestrians, and bikes.

Two researchers – Vinay Prabhu, chief scientist at UnifyID, and Abeba Birhane, a PhD candidate at University College Dublin in Ireland – analysed the images and found thousands of concerning labels.

MIT’s training set was found to label women as “bitches” or “whores,” and people from BAME communities with the kind of derogatory terms I’m sure you don’t need me to write. The Register notes the dataset also contained close-up images of female genitalia labeled with the C-word.

The Register alerted MIT to the concerning issues found by Prabhu and Birhane with the dataset and the college promptly took it offline. MIT went a step further and urged anyone using the dataset to stop using it and delete any copies.

A statement on MIT’s website claims it was unaware of the offensive labels and they were “a consequence of the automated data collection procedure that relied on nouns from WordNet.”

The statement goes on to explain the 80 million images contained in the dataset, with sizes of just 32×32 pixels, means that manual inspection would be almost impossible and cannot guarantee all offensive images will be removed.

“Biases, offensive and prejudicial images, and derogatory terminology alienates an important part of our community – precisely those that we are making efforts to include. It also contributes to harmful biases in AI systems trained on such data,” wrote Antonio Torralba, Rob Fergus, and Bill Freeman from MIT.

“Additionally, the presence of such prejudicial images hurts efforts to foster a culture of inclusivity in the computer vision community. This is extremely unfortunate and runs counter to the values that we strive to uphold.”

You can find a full pre-print copy of Prabhu and Birhane’s paper here (PDF)

(Photo by Clay Banks on Unsplash)

Interested in hearing industry leaders discuss subjects like this? Attend the co-located 5G Expo, IoT Tech Expo, Blockchain Expo, AI & Big Data Expo, and Cyber Security & Cloud Expo World Series with upcoming events in Silicon Valley, London, and Amsterdam.

Tags: ai, ai model, artificial intelligence, dataset, equality, Featured, machine learning, misogynistic, mit, racism, racist, training

MIT has removed a dataset which leads to misogynistic, racist AI models

About the Author

Leave a Reply Cancel reply

LATEST ARTICLES

Meta unveils five AI models for multi-modal processing, music generation, and more

The rise and fall of AI at the McDonald’s drive-thru

The impact of AI on online slot gaming in the UK

Snap introduces advanced AI for next-level augmented reality

AI comes to Ireland’s remote Islands through Microsoft’s ‘Skill Up’ program

Register

Personal Details

Company Details

Account Details

About the Author

Leave a Reply Cancel reply

Join our community

Create your free account now to access all our premium content and recieve the latest tech news to your inbox.

LATEST ARTICLES

Meta unveils five AI models for multi-modal processing, music generation, and more

The rise and fall of AI at the McDonald’s drive-thru

The impact of AI on online slot gaming in the UK

Snap introduces advanced AI for next-level augmented reality

AI comes to Ireland’s remote Islands through Microsoft’s ‘Skill Up’ program

Login

Register

Personal Details

Company Details

Account Details