The Algorithmic Auditing Trap

‘Bias audits’ for discriminatory tools are a promising idea, but current approaches leave much to be desired

Image: LightFieldStudios/Getty Images

This op-ed was written by Mona Sloane, a sociologist and senior research scientist at the NYU Center for Responsible A.I. and a fellow at the NYU Institute for Public Knowledge. Her work focuses on design and inequality in the context of algorithms and artificial intelligence.

We have a new A.I. race on our hands: the race to define and steer what it means to audit algorithms. Governing bodies know that they must come up with solutions to the disproportionate harm algorithms can inflict.

This technology has disproportionate impacts on racial minorities, the economically disadvantaged, womxn, and people with disabilities, with applications ranging from health care to welfare, hiring, and education. Here, algorithms often serve as statistical tools that analyze data about an individual to infer the likelihood of a future event—for example, the risk of becoming severely sick and needing medical care. This risk is quantified as a “risk score,” a method that can also be found in the lending and insurance industries and serves as a basis for making a decision in the present, such as how resources are distributed and to whom.

Now, a potentially impactful approach is materializing on the horizon: algorithmic auditing, a fast-developing field in both research and application, birthing a new crop of startups offering different forms of “algorithmic audits” that promise to check algorithmic models for bias or legal compliance.

Audits as a regulatory tool for hiring algorithms

Recently, the issue of algorithmic auditing has become particularly relevant in the context of A.I. used in hiring. New York City policymakers are debating Int. 1894–2020, a proposed bill that would regulate the sale of automated employment decision-making tools. This bill calls for regular “bias audits” of automated hiring and employment tools.

These tools — résumé parsers, tools that purport to predict personality based on social media profiles or text written by the candidate, or computer vision technologies that analyze a candidate’s “micro-expressions” — help companies maximize employee performance to gain a competitive advantage by helping them find the “right” candidate for the “right” job in a fast, cost-effective manner.

This is big business. The U.S. staffing and recruiting market, which includes firms that assist in recruiting new internal staff and those that directly provide temporary staff to fill specific functions (temporary or agency staffing), was worth $151.8 billion in 2019. In 2016, a company’s average cost per hire was $4,129, according to the Society for Human Resource Management.

Automated hiring and employment tools will play a fundamental role in rebuilding local economies after the Covid-19 pandemic. For example, since March 2020, New Yorkers were more likely than the national average to live in a household affected by loss of income. The economic impact of the pandemic also materializes along racial lines: In June 2020, only 13.9% of white New Yorkers were unemployed, compared to 23.7% of Black residents, 22.7% of Latinx residents, and 21.1% of Asian residents.

Automated hiring tools will reshape how these communities regain access to employment and how local economies are rebuilt. Against that backdrop, it is important and laudable that policymakers are working to mandate algorithmic auditing.

But we are facing an underappreciated concern: To date, there is no clear definition of “algorithmic audit.” Audits, which on their face sound rigorous, can end up as toothless reputation polishers, or even worse: They can legitimize technologies that shouldn’t even exist because they are based on dangerous pseudoscience.

In the context of hiring, the pseudoscience at work is physiognomy, which purports that character can be judged based on facial characteristics, and phrenology, which, similarly, is based on the idea that the conformation of the skull is indicative of mental faculties.

In the context of A.I. used for hiring, this may take the form of algorithms analyzing a candidate’s facial expressions, intonation, writing, social media behavior, or game performance as part of the assessment process.

Hijacking the audit

Earlier this year, HireVue, a hiring platform that uses an algorithm to assess job candidates, reportedly misrepresented an audit conducted on its technology by O’Neil Risk Consulting and Algorithmic Auditing. HireVue said in a press release that it would stop using facial recognition in its assessments, which otherwise “work as advertised with regard to fairness and bias issues.” Work by Alex Engler for Fast Company and Brookings highlighted the many limits the company imposed upon the auditors, effectively steering the auditing process from the outset.

A similar problem occurred in a recent peer-reviewed paper that set out to audit the algorithm used by Pymetrics, a company that deploys “behavioral assessments to evaluate job seekers” via “engaging games to fairly and accurately measure cognitive and emotional attributes.” The authors of the paper introduce the notion of a “collaborative audit” with the company, which simply describes the many ways in which Pymetrics prescribed the framing of the research questions and steered the data collection itself. Today, Pymetrics claims to use “audited A.I. technology” that results “in more diverse teams and more efficient processes.” In other words, work sponsored by the company is used to prop up problematic research.

This “collaborative audit” sets a dangerous precedent. It ignores the importance of genuine independence in conducting an audit and limits that notion to the question: “Does the algorithm do what we say it does?” That question strategically precludes a focus on broader issues of representation and bias across the whole algorithm life cycle, including bias in the dataset, representation in the design team, the context in which the algorithmic tool gets deployed, and the maintenance of the tool.

Legitimizing eugenicist tech

Importantly, this narrow approach precludes questioning underlying assumptions and politics of a given technology. That is a problem if the technology claims to predict ability and performance through the automated analysis of facial features and expressions.

There is a well-known, well-documented, and well-criticized history of physiognomy. This history shows that bodily features and abilities bear no significance on either personality or ability, let alone future job performance. “Science” that claims otherwise is not science—it is eugenicist ideology. This means technologies built on that assumption not only perpetuate but also scale eugenicist thought and practice.

By gracing such technologies with an audit that is steered by the organization being audited and only examines if an algorithm “works as intended,” researchers and technologists alike become complicit in legitimizing and normalizing weak and problematic notions of algorithmic auditing, as well as technologies that, by their very definition, are discriminatory.

As policymakers are looking to existing precedents of algorithmic audits, we run the risk of enshrining eugenicist tech in weak regulation and superficial standards.

What can be done

Against that backdrop, algorithmic auditing becomes an urgent problem to solve. I suggest we start with three steps.

First, there needs to be transparency about where and how algorithms and automated decision-making tools are deployed—in public institutions and in private contexts. For example, applicants have a right to know when an algorithmic tool is used in the hiring process.

Transparency about the use of these technologies is particularly important in the context of public agencies, because these agencies are accountable to, well, the public. The cities of Helsinki and Amsterdam created public registers of the algorithms and A.I. they use. New York City recently released a list of tools used in city administration, based on voluntary contributions by agencies. Both are notable efforts in creating transparency but are hampered by a lack of clear definitions for “algorithm,” “automated decision-making tool,” and “A.I.” This problem is well known but to date has not been successfully addressed.

We should focus on turning to library science experts, organizational ethnographers, and historians to develop strategies for documenting how digital technologies are used, in what context, and to what end — regardless of whether they classify as an “automated decision-making tool” or “A.I.” in that moment.

Second, we need to arrive at a clear definition of what “independent audit” means in the context of automated decision-making systems, algorithms, and A.I. What do we audit for? We need a clear overview of how audits have successfully been deployed in different industries. Existing frameworks of risk assessments and impact assessments must come into view. These must be set across the backdrop of the specifics of algorithmic technologies to generate an overview of what can and must be audited, starting with the datasets the algorithmic tools are built on, to the actual models and their performance, to the suitability of said tools for a particular context.

The last point underlines the importance of a holistic approach to algorithmic auditing, one that asks about the assumptions baked into the technology and allows room for the question of whether or not this technology should exist in the first place.

Third, we need to begin a conversation about how, realistically, algorithmic auditing can and must be operationalized in order to be effective. One obvious — and potentially powerful — mechanism for establishing algorithmic auditing is public procurement: 12% of the global GDP is spent following procurement regulation. Algorithmic auditability, once holistically defined, should be made an integral component of the procurement process, which would effectively and quickly establish good auditing practices. Doing so would also provide a unique opportunity to center the concept of “contestability by design,” the idea that systems and tools are designed so that system outputs (such as risk scores) can be contested by citizens.

The new A.I. race to define algorithmic audits will not be won tomorrow. But hopefully, it will be a fair contest, one in which the rules are clear and judges have all the information they need to make the right decision.

Mona Sloane is a sociologist, researcher and writer based at New York University. She works on design, inequality, and technology. Twitter: @mona_sloane.

Sign up for Pattern Matching

By OneZero

A newsletter that puts the week's most compelling tech stories in context, by OneZero senior writer Will Oremus. Take a look.

By signing up, you will create a Medium account if you don’t already have one. Review our Privacy Policy for more information about our privacy practices.

Check your inbox
Medium sent you an email at to complete your subscription.

The undercurrents of the future. A publication from Medium about technology and people.