Why the Productivity Paradox Disappears When You Measure Delegation

Brynjolfsson (1993) named four explanations for the productivity paradox. Mismeasurement, time lags, redistribution, and mismanagement. I have written before about why the paradox is still alive, and I keep circling back to the same detail. The first explanation, mismeasurement, is not just one of four. It is the one that refuses to die. It showed up in the 1980s when nobody could find computing power in the productivity statistics. It showed up again when Brynjolfsson and Hitt (1996) resolved the paradox at the firm level but only because they used better, more granular measures than the aggregate statistics provided. It is showing up again now with AI. The measurement problem returns every time the technology changes faster than the tools we use to evaluate it.

I think the reason mismeasurement keeps recurring is that we keep measuring the wrong thing. Or rather, we keep measuring the same thing, use, when the relationship between the person and the system has fundamentally changed.

Use was a reasonable construct when the system sat on a desk and a person operated it. As I wrote when I argued that use is the wrong construct for agentic systems, TAM and UTAUT were built for that world. The person intends, the person acts, the system responds. Brynjolfsson measured IT capital stock and found its marginal product, but the unit was still how much computing a firm deployed and what came out the other end. The mechanism inside the box was use.

Agentic AI breaks the box. The person no longer operates the system. The person transfers a task to the system, which then acts with some autonomy, and the person manages that transfer through appraisal, distribution, and coordination. Baird and Maruping (2021) gave us the vocabulary. Delegation is the transfer of rights and responsibilities for task execution and outcomes to another agent. The three mechanisms are appraisal (judging whether the agent can do it), distribution (allocating subtasks between human and agent), and coordination (managing interdependencies over time). These are not use constructs stretched to fit a new context. They are structurally different. You cannot capture them with a survey item that asks how frequently someone checks a chatbot.

This is where the measurement problem gets sharper for AI than it was for general computing. When Brynjolfsson wrote about mismeasurement in 1993, the complaint was that productivity statistics could not capture quality, variety, customization, or convenience. The value was real. The measures were too coarse. The same problem exists for AI, except now the missing value is not about quality or variety. It is about what the human transfers and what they get back.

Consider what happens when a marketing team delegates content drafting to a large language model. The old measure would ask: are employees using the tool? Logins, session counts, prompt frequency. Those numbers will look great. But they tell you almost nothing about what matters. What tasks were actually delegated? What responsibility transferred? Did the human appraise the agent's capability before delegating, or did they offload the task reflexively? How was work distributed between human judgment and model output? Who coordinated the final product, and how many revision cycles did that require? Baird and Maruping's framework makes each of these questions answerable with a distinct construct. Appraisal, distribution, and coordination are measurable. They just are not measured by use.

Markus and Robey (1988) warned about this kind of mismatch. Their three causal imperatives, technological, organizational, and emergent, are not just positions you declare at the start of a paper. They are assumptions embedded in your measures. A study that treats AI as a technological determinant of productivity is assuming that the causal story runs from spending to outcomes. A study that treats AI as an organizational tool that employees choose to adopt is assuming that the causal story runs through intention and behavior. The emergent position, which is almost always the right one for IS, says outcomes arise from the interaction between technology, human agency, and organizational context. When I wrote about why the IT artifact has to be theorized, the point was that most IS research treats the artifact as a setting rather than an actor. Delegation theory is what happens when you take the emergent position seriously and the artifact actually has agency. The causal story has to include what the system does, what the human delegates, and how they coordinate. Use collapses all of that into one variable.

Orlikowski and Iacono (2001) documented that 88 percent of ISR papers did not take the Ensemble view of the IT artifact. The Ensemble view is the one that links technology with social action. The other four views, Nominal, Computational, Tool, and Proxy, all let the artifact disappear from the theoretical story. I think the same problem is playing out in AI productivity research right now. When a firm measures AI adoption by counting licenses or tracking logins, it is treating the technology at the Tool level. The system is an instrument. The person is the actor. The causal story is adoption leading to use leading to output. But when the system can draft a legal motion, triage a patient, or route a fleet of delivery vehicles, that story is incomplete. The system has become an actor. The relationship is delegation, not use. Measuring it as use is like measuring a rental agreement by how many times the tenant walks through the door. The frequency of entry is not the mechanism. The transfer of rights and responsibilities is.

Yeh et al. (2025), I should note, is an external source and my recollection of it needs verification, but the general pattern in recent empirical work is that AI productivity gains at the task level are real and measurable, while AI productivity gains at the organizational level are small or absent. I think this gap is a delegation gap, not a technology gap. The tasks that get delegated first are the ones most suited to reflexive or supervisory delegation in Baird and Maruping's typology: routine, bounded, with clear feedback. They show up in individual productivity because they are easy to measure and easy to coordinate. Organizational productivity requires anticipatory and prescriptive delegation, where the human distributes judgment across multiple subtasks and coordinates interdependencies that cross functional boundaries. That is harder to measure, and as I wrote about the spending-growth gap, most organizations are not restructuring around AI in ways that would produce those gains.

The paradox does not disappear because AI suddenly works. It disappears as a paradox when you stop measuring use and start measuring what actually transfers. Brynjolfsson's four explanations are all still active. The measurement explanation is still the largest. But the thing being mismeasured has changed. It is not just output quality or variety that we fail to capture. It is the entire mechanism: what rights moved, what responsibilities shifted, how appraisal happened, what coordination looks like. When you measure delegation, the causal story gets richer. When you measure use, the causal story flattens into a correlation between spending and output that has no mechanism inside it.

Baird and Maruping gave us the reformulation. Markus and Robey gave us the causal structure that demands it. Orlikowski and Iacono gave us the diagnostic that shows why the old measures miss the artifact. The productivity paradox keeps returning because the measure keeps following the technology instead of the relationship.