What Algorithms Know About You Based on Your Grocery Cart
Anyone who has ever visited Jones Beach on Long Island, New York, will have driven under a series of bridges on their way to the ocean. These bridges, primarily built to filter people on and off the highway, have an unusual feature. As they gently arc over the traffic, they hang extraordinarily low, sometimes leaving as few as nine feet of clearance from the tarmac.
There’s a reason for this strange design. In the 1920s, Robert Moses, a powerful urban planner in New York, was keen to keep his newly finished, award-winning state park at Jones Beach the preserve of white and wealthy Americans. Knowing that his preferred clientele would travel to the beach in their private cars, while people from poor black neighborhoods would get there by bus, Moses deliberately tried to limit access by building hundreds of low-lying bridges along the highway. Too low for the 12-foot buses to pass under.
Racist bridges aren’t the only inanimate objects that have had quiet, clandestine control over people. History is littered with examples of objects and inventions with a power beyond their professed purpose. Sometimes it’s deliberately and maliciously factored into their design, but at other times, it’s a result of thoughtless omissions.
Modern inventions are no different. Just ask the residents of Scunthorpe, in the north of England, who were blocked from opening AOL accounts after the internet giant created a new profanity filter that objected to the name of their town. Or Chukwuemeka Afigbo, the Nigerian man who discovered an automatic hand-soap dispenser that perfectly released soap whenever his white friend placed their hand under the machine but refused to acknowledge his darker skin. Or Mark Zuckerberg, who, when writing the code for Facebook in his dorm room in Harvard in 2004, would never have imagined his creation would go on to be accused of helping manipulate votes in elections around the globe.
Behind each of these inventions is an algorithm. The invisible pieces of code that form the gears and cogs of the modern machine age, algorithms have given the world everything from social media feeds to search engines and satellite navigation to music recommendation systems. They are as much a part of our modern infrastructure as bridges, buildings, and factories ever were. Algorithms have learned our likes and dislikes; they tell us what to watch, what to read, and who to date. And all the while, they have the hidden power to slowly and subtly change how we live as humans.
Put simply, an algorithm is just series of logical steps that take you from some input to some output. In theory, a cake recipe counts as an algorithm. The ingredients are the input; the cake is the output. Normally, however, when people use the word “algorithm,” they tend to be describing a recipe that happens within a computer. The output can take on a variety of forms, but the ingredients are almost always our data.
Supermarkets were among the first to recognize the value of our data. Early in the days of online shopping, British supermarket Tesco introduced a feature known as “My Favourites,” in which any items that were bought using the store’s loyalty card would appear prominently when the customer logged on to the Tesco website. Shortly after the launch of the feature, one woman contacted Tesco to complain that her data was wrong. She’d been shopping online and saw condoms among her list of “My Favourites.” They couldn’t be her husband’s, she explained, because he didn’t use them. At her request, the Tesco analysts looked into the data and discovered that her list was accurate. Rather than be the cause of a marital rift, however, they took the diplomatic decision to apologize for “corrupted data” and remove the offending items from her favorites. It was an important lesson: Shopping isn’t just what we buy. Groceries are personal.
By now, some quarter-century after the launch of the first supermarket loyalty cards, the algorithms retailers use can burrow into our data and uncover the most remarkable insights about who we are and what our future holds for us.
About a year ago, I got chatting to the chief data officer of a company that sells insurance. They had access to the full details of people’s shopping habits via a supermarket loyalty scheme. In their analysis, they discovered that home cooks were less likely to make claims on their home insurance and were therefore more profitable. It’s a finding that makes good intuitive sense. There probably isn’t much crossover between the group of people who are willing to invest time, effort, and money into creating an elaborate dish from scratch and the group who would let their children play football in the house.
But how did the analysts know which shoppers were home cooks? Well, a few items in someone’s basket were linked to low claim rates. The most significant item, the chief data officer told me—the one that gives you away as a responsible, house-proud person more than any other—was fresh fennel.
This is the reason companies are so hungry for our data. Every time you shop online, every time you sign up for a newsletter, or register on a website, or enquire about a new car, or fill out a warranty card, or buy a new home, or register to vote — you are unwittingly handing over a small clue as to who you are and how you behave. Behind the scenes, a data broker — the multibillion-dollar companies most of us have never heard of — will combine all that data, cross-reference the different pieces of information, and then create a single detailed file on virtually every single one of us: a data profile of our digital shadow.
Many of the insights they have on us are inferred. A subscription to Wired might imply that you’re interested in technology; a firearms license might imply that you’re interested in hunting. But the amount they can tell is staggering: whether you’ve had an abortion, whether your parents are divorced, whether you are a rape victim, your opinions on gun control, your projected sexual orientation, your real sexual orientation, and your gullibility.
It’s a system that, thanks to either thoughtless omission or deliberate design, has the potential to be exploitative. It means payday lenders can directly target people with bad credit ratings; betting ads can be directed to people who frequent gambling websites. And there are concerns about this kind of data profiling being used in an exclusionary way: motorbike enthusiasts being deemed to have a risky hobby or people who eat sugar-free sweets being flagged as diabetic and turned down for insurance as a result. A study from 2015 demonstrated that Google was serving far fewer ads for high-paying executive jobs to women who were surfing the web than to men. And after one African-American Harvard professor learned that Googling her own name returned ads targeting people with a criminal record (and as a result was forced to prove to a potential employer that she’d never been in trouble with the police), she began researching the ads delivered to different ethnic groups. She discovered that searches for “black-sounding names” were disproportionately likely to be linked to ads containing the word “arrest” (for example, “Have you been arrested?”) than those with “white-sounding names.”
There is also the unintended negative side of this great modern invention. When Heidi Waterhouse lost a much-wanted pregnancy, she unsubscribed from all the weekly emails updating her on her baby’s growth, telling her which fruit the fetus now matched in size. She unsubscribed from all the mailing lists and wish lists she had signed up for in eager anticipation of the birth. But, as she told an audience of developers at a conference in 2018, there was no power on earth that could unsubscribe her from the pregnancy ads that followed her around the internet. This digital shadow of a pregnancy continued to circulate alone, without the mother or the baby. “Nobody who built that system thought of that consequence,” Waterhouse explained.
We’ve led ourselves down a path that is difficult to turn back from. But it’s important to be aware of the potential dangers of collecting these vast, interconnected data sets in the first place. One of the dystopian repercussions of an application for these rich, interconnected data sets sounds like it belongs in the popular Netflix show Black Mirror, but it exists in reality.
It’s known as Sesame Credit, a citizen scoring system used by the Chinese government.
Imagine every piece of information that a data broker might have on you collapsed down into a single score. Everything goes into it. Your credit history, your mobile phone number, your address — the usual stuff. But also all your day-to-day behavior. Your social media posts, the data from your ride-hailing app, even records from your online matchmaking service. The result is a single number between 350 and 950 points. Li Yingyun, the company’s technology director, explained, “Someone who plays video games for 10 hours a day, for example, would be considered an idle person. Someone who frequently buys diapers would be considered as probably a parent, who on balance is more likely to have a sense of responsibility.”
If you’re Chinese, these scores matter. If your rating is above 600 points, you can take out a special credit card. More than 666 and you’ll be rewarded with a higher credit limit. Those with scores above 650 can hire a car without a deposit and use a VIP lane at the Beijing airport. Anyone above 750 can apply for a fast-tracked visa to Europe.
It’s all fun and games now, while the scheme is voluntary. But in 2020, when the citizen scoring system becomes mandatory, people with low scores stand to feel the repercussions in every aspect of their lives. The government’s own document on the system outlines examples of punishments that could be meted out to anyone deemed disobedient: “Restrictions on leaving the borders, restrictions on the purchase of… property, travelling on aircraft, on tourism and holidays or staying in star-ranked hotels.” It also warns that in the case of “gravely trust-breaking subjects,” it will “guide commercial banks… to limit their provision of loans, sales insurance, and other such services.” Loyalty is praised. Breaking trust is punished.
As Rogier Creemers, an academic specializing in Chinese law and governance at the Van Vollenhoven Institute at Leiden University, puts it, “The best way to understand it is as a sort of bastard love child of a loyalty scheme.”
I don’t have much comfort to offer in the case of Sesame Credit, but I don’t want to fill you completely with doom and gloom, either. There are glimmers of hope. However grim the journey ahead appears, there are signs that the tide is slowly turning. Many in the data science community have known about and objected to the exploitation of people’s information for profit for quite some time. But until the furor over Cambridge Analytica, these issues hadn’t drawn sustained, international front-page attention. When that scandal broke, in early 2018, the general public saw for the first time how algorithms are silently harvesting their data and acknowledged that, without oversight or regulation, it could have dramatic repercussions.
And regulation is coming. If you live in the EU, a new piece of legislation called the General Data Protection Regulation (GDPR) should, in theory, mean companies can no longer be allowed to store your data without an explicit purpose. That doesn’t necessarily mean the end of these kinds of practices, however. For one thing, we often don’t pay attention to the T&Cs when we’re clicking around online, so we may find ourselves consenting without realizing. But the tech companies are increasingly on our side: Apple has built “intelligent tracking prevention” into its Safari browser. Firefox has done the same. Facebook is severing ties with its data brokers. Argentina, Brazil, South Korea, and many more countries have all pushed through GDPR- like legislation. Europe might be ahead of the curve, but there is a global trend that is heading in the right direction.
Still, we would do well to remember that there’s no such thing as a free lunch. As the law catches up and the battle between corporate profits and social good plays out, we need to be careful not to be lulled into a false sense of privacy.
If data is the new gold, then we’ve been living in the Wild West. But I’m optimistic that — for many of us — the worst will soon be behind us.
Correction: The original version of this piece misstated the year when Sesame Credit is set to become mandatory. It is in 2020.