How Big Data Fails

How small errors and quick fixes in “little data” can lead to major problems in big data

Matt Parker
OneZero

--

A cut out illustration of a human head with an abstract code background.
Image: Greg Bajor/Moment/Getty Images

InIn the mid-­1990s, a new employee of Sun Microsystems in California kept disappearing from their database. Every time his details were entered, the system seemed to eat him whole; he would disappear without a trace. No one in HR could work out why poor Steve Null was database kryptonite. The staff in HR were entering the surname as “Null,” but they were blissfully unaware that, in a database, NULL represents a lack of data, so Steve became a non-entry. To computers, his name was Steve Zero or Steve McDoesNotExist. Apparently, it took a while to work out what was going on, as HR would happily reenter his details each time the issue was raised, never stopping to consider why the database was routinely removing him.

Since the 1990s, databases have become more sophisticated, but the problem persists. Null is still a legitimate surname and computer code still uses NULL to mean a lack of data. A modern variation on the problem is that a company database will accept an employee with the name Null, but then there is no way to search for them. If you look for people with the name Null, it claims there are, well, null of them. Because computers use NULL to represent a lack of data, you’ll occasionally see it appear when a computer system somewhere…

--

--