‘For Some Reason I’m Covered in Blood’: GPT-3 Contains Disturbing Bias Against Muslims

OpenAI disclosed the problem on GitHub — but released GPT-3 anyway

OpenAI company text logo

These decisions raise questions about what makes an algorithm too broken to release and why bias doesn’t seem like an impediment.

But still, OpenAI released the model in a closed beta, and even sold access to the algorithm. Microsoft exclusively licensed GPT-3 with the intention of putting it in products, though we don’t know which ones yet. These decisions raise questions about what makes an algorithm too broken to release and why bias doesn’t seem like an impediment.

It’s nearly impossible to vet all the information in the dataset.

This very topic — bias and racism being embedded in large language-generating models — was reportedly part of the A.I. paper involved in Timnit Gebru’s firing from Google. Gebru and her co-authors warned that when algorithms are trained on enormous datasets, as GPT-3 was, it’s nearly impossible to vet all the information in the dataset to make sure it’s what you want the algorithm to learn. GPT-3, for instance, learned how words are associated with each other by analyzing more than 570 gigabytes of plain text. For comparison, this plain-text version of Moby-Dick is 1.3 megabytes. So, you can think of the OpenAI dataset as being the size of 438,461.5 copies of Moby-Dick.

Senior Writer at OneZero covering surveillance, facial recognition, DIY tech, and artificial intelligence. Previously: Qz, PopSci, and NYTimes.