General Intelligence

Facebook Scraped 1 Billion Pictures From Instagram to Train Its A.I. — But Spared European Users

The team purposely excluded Instagram images from the European Union, likely because of GDPR

Dave Gershgorn
Published in
3 min readMar 5, 2021
Photo illustration source: Alexander Koerner/Stringer/Getty Images

OneZero’s General Intelligence is a roundup of the most important artificial intelligence and facial recognition news of the week.

Facebook researchers announced a breakthrough yesterday: They have trained a “self-supervised” algorithm using 1 billion Instagram images, proving that the algorithm doesn’t need human-labeled images to learn to accurately recognize objects.

Typically, the most accurate image recognition algorithms require humans to label images as containing dogs, horses, people, or any other subject, and then the algorithm can find similarities between images humans have indicated contain the same objects. Facebook’s chief A.I. scientist Yann LeCun has been on a mission to change A.I.’s reliance on labels for decades, calling it the “holy grail” of A.I.

But Facebook didn’t just select any billion Instagram images to train the algorithm. The team purposely excluded Instagram images from the European Union, noting in its paper that images were “random, public, and non-EU images.” While the rest of the world’s Instagram images are fair game, EU residents don’t have to worry about their images being used to generate Facebook’s next big algorithm.

OneZero asked Facebook whether the exclusion was motivated by the EU’s GDPR regulations, which gives users greater insight into how companies use their data and protects against data use without consent. A Facebook spokesperson acknowledged the question, but did not immediately reply to the request for comment.

Whether it was because the use of data would be a GDPR violation, or just that Facebook didn’t want to give the impression of impropriety, it’s likely that the law had a chilling effect on the use of private data.

Jules Polonetsky, CEO of Future of Privacy Forum, told OneZero in a message that it’s not unusual for companies to err on the side of caution when collecting data in Europe.



Dave Gershgorn

Senior Writer at OneZero covering surveillance, facial recognition, DIY tech, and artificial intelligence. Previously: Qz, PopSci, and NYTimes.