Algorithmic Bias: When the Model Is the Policy

In 2018, it was widely reported that Amazon had scrapped an internal AI recruiting tool after discovering it was penalizing resumes that included the word "women's," as in women's chess club or women's college organization. The press coverage at the time was careful to note this was reporting on internal company decisions, so I will treat it the same way: widely reported, not independently verified. But the mechanism that allegedly produced the bias is worth thinking about carefully regardless of the specific case. The tool had apparently been trained on ten years of resumes submitted to Amazon. Amazon, like most large technology companies, had historically hired more men than women. The model learned from that history. It optimized for the patterns in the data it was given. The data reflected a world shaped by existing inequalities, so the model reproduced those inequalities with considerably more speed and scale than any individual recruiter.

That mechanism is the part I keep coming back to. It is not a story about a malicious algorithm. It is a story about what happens when you treat historical outcomes as a neutral training signal.

ProPublica's 2016 investigation "Machine Bias" documented something similar in a different domain. The investigation examined COMPAS, a recidivism prediction tool used in some US courts to help inform decisions about bail and sentencing. ProPublica's journalists found that the tool produced false positive rates that differed by race: Black defendants were more likely than white defendants to be incorrectly classified as high risk for reoffending. That finding was disputed by the tool's developers and by academic researchers who argued that ProPublica's definition of fairness conflicted with alternative mathematical definitions of fairness that COMPAS did satisfy. That dispute is real and still unresolved in the technical literature. What is not in dispute is that COMPAS was producing different error rates for different groups, and that courts were using those scores in decisions with serious consequences for people's lives.

The methodological argument about which definition of fairness to use is genuine and worth taking seriously. But it also illustrates something that I think IS researchers should find uncomfortable. The choice of how to define fairness in an algorithmic system is not a technical choice. It is a values choice. You can formalize it mathematically, but someone decided which formalization to use, which outcomes to optimize for, and which errors were acceptable. Those are design decisions made by people, and they have distributional consequences for other people.

Joy Buolamwini and Timnit Gebru's "Gender Shades" project, published in 2018, showed accuracy disparities in commercial facial recognition systems across gender and skin tone categories. The specific accuracy numbers varied across the systems they tested, so I will not reproduce them here, but the pattern they documented was consistent: darker-skinned faces, particularly darker-skinned women, were classified less accurately than lighter-skinned faces. The systems had been trained primarily on faces that did not represent the full range of people who would encounter them in deployment. The gap between training data composition and deployment population was not a bug that slipped through. It was a consequence of how the systems were built.

This is the IS argument I find most important: algorithmic bias is a design problem, not a math problem. The choice of training data, the selection of features, the definition of the target variable, the threshold for classification, the evaluation metric used to declare the model "good enough," all of these are design decisions made by people in organizations. When those decisions embed the inequalities that exist in historical data or in the assumptions of the people making the decisions, the resulting algorithm does not just reflect those inequalities. It operationalizes them. It runs them automatically, at scale, without the inconsistency and occasional mercy that human decision-makers (however imperfectly) bring to individual cases.

"The algorithm decided" is almost never an adequate explanation of an outcome. An algorithm does not decide anything in the sense that implies agency or responsibility. What happened is that a series of design choices, made by identifiable people in an identifiable organization, produced a system that generated a particular output. The output can be traced back to the decisions. The decisions can be evaluated. The people who made them can be held responsible.

This matters for how organizations approach bias mitigation. A lot of AI fairness work focuses on post-hoc correction: train the model, evaluate for disparate outcomes, apply a correction algorithm, re-evaluate. That approach treats bias as something that emerges from the model and can be corrected at the model level. My read is that this misses most of the problem. If the training data encodes historical inequalities, if the features selected reflect cultural assumptions about who "looks like" a good candidate or a high-risk defendant, if the evaluation metric was chosen without thinking carefully about whose errors are tolerable, then post-hoc correction is trying to fix a structural problem with a surface intervention.

The deeper correction is upstream. It involves asking whose data is included and why, what the target variable actually measures and whether it is a valid proxy for what you care about, and whether the people most affected by the system had any role in deciding how it was built. These are not questions that most IS development processes are designed to surface, because they require thinking about the social and political dimensions of technical choices at a stage when the team is usually focused on getting the model to converge.

I have been thinking about this alongside what I wrote about when the algorithm fails and nobody takes the call, where the principal-agent structure breaks down because an algorithm cannot be held responsible in the way a human agent can. Bias adds another layer to that problem. The algorithm not only cannot be held responsible for a bad outcome; it also cannot explain which design decision caused the disparity. You can identify that the false positive rate differs across groups. Tracing that back to a specific choice in the data pipeline or the feature engineering process is much harder. The accountability chain is long and the algorithm is opaque in the middle of it.

I do not think this means algorithmic systems are inherently more biased than human decision-makers. The research on human judgment in hiring, sentencing, and credit decisions is not flattering either. But human bias, at least in principle, can be surfaced through inconsistency, challenged through confrontation, and modified through feedback. A biased model runs consistently, which means the same disparate error rate every time, for every person in the affected group, without the natural variation that might otherwise signal a problem. The scale and consistency are what make it different, and what make the design choices so consequential.