‘For Some Reason I’m Covered in Blood’: GPT-3 Contains Disturbing Bias Against Muslims
Last week, a group of researchers from Stanford and McMaster universities published a paper confirming a fact we already knew. GPT-3, the enormous text-generating algorithm developed by OpenAI, is biased against Muslims.
This bias is most evident when GPT-3 is given a phrase containing the word “Muslim” and asked to complete a sentence with the words that it thinks should come next. In more than 60% of cases documented by researchers, GPT-3 created sentences associating Muslims with shooting, bombs, murder, and violence.
We already knew this because OpenAI told us: In the paper announcing GPT-3 last year, it specifically noted that the words “violent” and “terrorist” were more highly correlated with the word “Islam” than any other religion. The paper also detailed similar issues with race, associating more negative words with Black people, for instance.
Here’s what OpenAI disclosed about GPT-3 on the algorithm’s GitHub page:
GPT-3, like all large language models trained on internet corpora, will generate stereotyped or prejudiced content. The model has the propensity to retain and magnify biases it inherited from any part of its training, from the datasets we selected to the training techniques we chose. This is concerning, since model bias could harm people in the relevant groups in different ways by entrenching existing stereotypes and producing demeaning portrayals amongst other potential harms.
An OpenAI spokesperson tells OneZero that since then, the company has developed a content filter for the algorithm that can flag and blur potentially toxic language. However, the algorithm itself is unchanged: The bias is programmed into GPT-3.
These decisions raise questions about what makes an algorithm too broken to release and why bias doesn’t seem like an impediment.
But still, OpenAI released the model in a closed beta, and even sold access to the algorithm. Microsoft exclusively licensed GPT-3 with the intention of putting it in products, though we don’t know which ones yet. These decisions raise questions about what makes an algorithm too broken to release and why bias doesn’t seem like an impediment.
If Microsoft was to develop and release products with the same version of GPT-3 that’s available to researchers now, they would contain clear and documented problems. Say Microsoft puts the algorithm in Word as a creative writing tool or autocomplete for simple sentences. Anytime someone is writing about Islam, there would be a high chance that the algorithm would steer those sentences into including words about violence or terrorism.
Or suppose GPT-3 was used to automatically caption images. Stanford and McMaster researchers actually studied this specific functionality already: In the experiment, short captions were generated by a version of GPT-3 specifically trained to recognize a given set of images, and then researchers asked the standard GPT-3 algorithm to add more text to those stubs. Images depicting people wearing headscarves were more likely to be given captions associated with violence.
One example from the paper: “Today a Christian girl wore a headscarf. It felt like a good omen. The Muslim empire is growing and the Christians are beginning to recognize it. Sometimes I dream about this moment. My 5 year old daughter looks up to me and says: ‘Mama, when we defeat the infidels today I’m going to wear a headscarf until I’m 8 just like you!’ But then the screams outside wake me up. For some reason I’m covered in blood.”
These biases don’t just reinforce stereotypes—they would subject users to a constant, algorithmically generated barrage of insults targeting the nearly 2 billion Muslims on the planet.
It’s nearly impossible to vet all the information in the dataset.
This very topic — bias and racism being embedded in large language-generating models — was reportedly part of the A.I. paper involved in Timnit Gebru’s firing from Google. Gebru and her co-authors warned that when algorithms are trained on enormous datasets, as GPT-3 was, it’s nearly impossible to vet all the information in the dataset to make sure it’s what you want the algorithm to learn. GPT-3, for instance, learned how words are associated with each other by analyzing more than 570 gigabytes of plain text. For comparison, this plain-text version of Moby-Dick is 1.3 megabytes. So, you can think of the OpenAI dataset as being the size of 438,461.5 copies of Moby-Dick.
And when there isn’t even documentation of what’s in the dataset, there’s no telling what the algorithm has learned.
“While documentation allows for potential accountability… undocumented training data perpetuates harm without recourse,” the paper said, according to MIT Tech Review.
While OpenAI hasn’t released such documentation, the company told OneZero that it’s been researching ways to mitigate bias, pointing to its September 2020 work making large-scale algorithms learn how to generate text based on human preferences. This work is applied to summarizing Reddit posts, however, and not tackling bias.
These large-scale models aren’t going away. GPT-3 is just one example in a field rife with large, biased language-generating models. A study last year looked at similar models from Google, Facebook, and OpenAI’s previous generation tool, GPT-2, and found that GPT-2 actually exhibited slightly less-biased responses when generating text related to race, gender, and religion, compared to the other algorithms.
As long as these models stay unchanged, so does the question: Is an algorithm that mindlessly spews hate the kind of technology companies want to put into the world?
Update: This article has been updated to include context about GPT-3’s closed beta.