Machine Learning’s Crumbling Foundations

Doing ‘data science’ with bad data.

Cory Doctorow
Published in
7 min readAug 19, 2021


An industrial meat-grinder; on its intake belt is a procession of recycling bins heaped high with garbage; its output cone has been replaced with the glowing eye of HAL 9000, and it empties into a giant wheeled hopper full of ground-up trash. Image: Seydelmann (modified) CC BY-SA: Cryteria (modified) CC BY:

Technological debt is insidious, a kind of socio-infrastructural subprime crisis that’s unfolding around us in slow motion. Our digital infrastructure is built atop layers and layers and layers of code that’s insecure due to a combination of bad practices and bad frameworks.

Even people who write secure code import insecure libraries, or plug it into insecure authorization systems or databases. Like asbestos in the walls, this cruft has been fragmenting, drifting into our air a crumb at a time.

We ignored these, treating them as containable, little breaches and now the walls are rupturing and choking clouds of toxic waste are everywhere.

The infosec apocalypse was decades in the making. The machine learning apocalypse, on the other hand…

ML has serious, institutional problems, the kind of thing you’d expect in a nascent discipline, which you’d hope would be worked out before it went into wide deployment.

ML is rife with all forms of statistical malpractice — AND it’s being used for high-speed, high-stakes automated classification and decision-making, as if it was a proven science…