Endogeneity: The Problem That Invalidates More IS Research Than People Admit

Here is a claim you see regularly in IS and management research: firms that invest in enterprise systems perform better. The number comes from a regression. Firms that adopted the system have higher productivity, or higher revenue per employee, or better customer satisfaction scores. The coefficient is positive and statistically significant. The authors conclude that the investment pays off.

The problem is that this regression is almost certainly wrong, not because the researchers made a calculation error, but because the model has endogeneity. And endogeneity, when present, makes ordinary regression estimates biased and inconsistent. The conclusions drawn from the analysis may be systematically wrong, not randomly noisy, but pointing in the wrong direction or overstating the effect size in ways that cannot be corrected by collecting more data.

Endogeneity occurs when an independent variable in your model is correlated with the error term. This can happen in three main ways. The first is omitted variables: there is a variable that affects both the independent variable and the outcome, and because it is not in the model, its effect gets absorbed into the error term. The second is reverse causality: instead of X causing Y, Y is also causing X, or causing it jointly, so the direction of influence in your model is ambiguous. The third is measurement error in the independent variable, which under certain conditions creates a correlation between the measured variable and the error term.

The enterprise system example has a classic omitted variable problem. Organizations that adopt major enterprise technology are not a random sample of organizations. They tend to be larger, better resourced, more professionally managed, and more growth-oriented than organizations that do not adopt. These characteristics also independently predict better performance. If you regress performance on technology adoption without controlling adequately for all of those organizational characteristics, the adoption coefficient absorbs the effect of organizational quality. You have estimated a selection effect, not a treatment effect. High-performing firms adopted the system faster. The system did not make them high-performing. Or at least, you cannot tell the difference from OLS regression on observational data.

This is selection bias operating as endogeneity. My notes from the IS research methods section of my comps preparation (day2.html, line 495) are direct about this: a study that wants to claim causation needs covariation, temporal precedence, and the absence of plausible alternative explanations. SEM and regression establish covariation. They do not by themselves rule out alternatives. Internal validity is the validity type that addresses exactly these conditions.

The econometric solution is instrumental variables (IV) estimation. The idea is to find a variable that predicts technology adoption but does not directly affect the performance outcome. This instrument shifts the probability of adoption in ways that are unrelated to the omitted organizational characteristics. If you can find such an instrument, you can use it to isolate the variation in adoption that is truly exogenous, essentially the part of adoption that the instrument caused, and use only that variation to estimate the effect on performance. The problem is that good instruments are hard to find and even harder to defend. In IS research on enterprise technology adoption, a convincing instrument might be something like the adoption behavior of industry peers, or the availability of local implementation partners, or prior technology investments that are historically determined. Each of these can be challenged, and the instrument validity assumptions are untestable in important ways.

The Heckman selection correction is another approach, widely cited in the econometrics literature, that models the selection into treatment as a first stage and then corrects the outcome regression for the predicted probability of selection. I am hedging here because my study-hub notes do not explicitly cover Heckman, but it is widely discussed in IS methodology papers that deal with selection bias, and the basic logic connects directly to the selection threat in Cook and Campbell's validity framework. The correction requires its own exclusion restriction, a variable that predicts selection but not the outcome, so it faces similar instrument validity problems as IV estimation. Neither approach is free. Both require strong theoretical justification that most IS studies do not provide.

Why does this matter in practice? Because IS and business journals regularly publish studies that draw causal conclusions from observational regression results without adequate endogeneity controls. The ERP literature is full of this. The IT outsourcing literature is full of this. Studies that show a positive association between some technology adoption decision and firm performance are cited as evidence that the technology delivers ROI. Industry analysts amplify the pattern. Gartner, for instance, regularly reports survey findings about technology adoption and organizational performance at the Gartner newsroom, and practitioners interpret those correlational findings as causal evidence. The issue is not that Gartner is being deceptive; it is that survey data about adoption and performance cannot establish causation even with careful analysis, and the limitation is almost never flagged in the report's summary.

The right response to this is not to give up on observational IS research. Cross-sectional and longitudinal observational studies are often the only feasible option when you cannot randomly assign firms to technology investments. The right response is to take endogeneity seriously as a threat, to look for instruments or natural experiments where they exist, to use panel data designs that allow firm fixed effects (which control for time-invariant organizational characteristics), and to be honest about the limits of what the evidence actually shows.

I wrote about related design issues in my post on why cross-sectional studies miss half the story. The endogeneity problem and the cross-sectional design problem overlap significantly: many endogeneity threats are harder to address when you only have one time point. Panel data does not solve endogeneity, but it reduces the omitted variable threat by controlling for stable organizational characteristics. That is a meaningful improvement, even if it is not a complete solution.

The claim "firms that invest in X perform better" is not necessarily false. It might be true. But the regression that supports it is not sufficient evidence, and saying so is not a criticism of the research as much as it is a request for a more honest description of what the evidence can and cannot support.