The Replication Crisis Is an IS Problem Too

I was reading Podsakoff et al. (2003) for the third time last week when a number stopped me. In a typical behavioral research measure, roughly a quarter to forty percent of the variance can come from the measurement method itself, not from the construct being studied. That is not a minor footnote. When the same respondent fills out both the independent and dependent variables at the same moment, using the same Likert scale, the correlation between those variables is partly a mirage created by the shared method. Podsakoff and his coauthors showed that this common method bias can inflate observed relationships by a factor of three. And the remedy most researchers cite, Harman's single-factor test, does not actually control for the problem; it only checks whether one dominant factor emerges. It is a fig leaf.

I keep noticing this because I am living inside a comps reading list that forces me to reread the same foundational papers, and the gap between what those papers warn about and what the field actually publishes is getting harder to ignore.

Walk through any recent issue of a major IS journal and you will find the same dominant design: a single cross-sectional survey, all perceptual measures, same respondent providing every variable. Podsakoff et al. (2003) call this the recipe for common method bias. They recommend procedural fixes built into the design: obtain IV and DV from different sources, introduce temporal separation between measurements, vary response formats, counterbalance question order. These are design choices, not post-hoc statistical rituals. Yet I rarely see them treated as standard practice. When a paper does address common method bias, it is usually a single sentence about Harman's test. That sentence lets the authors claim they handled the problem. It does not handle the problem.

The issue runs deeper than survey design. It touches how we choose and justify our statistical tools. Chin (1998) introduced PLS-SEM as a component-based alternative to CB-SEM, one suited to prediction-oriented research, formative constructs, and less mature theories. The logic was clear: PLS maximizes explained variance, so use it when the goal is prediction or theory development, not when you want to confirm a well-established theory. Over the last decade, that logic has been inverted. PLS-SEM has become the default for anyone whose sample is small, whose data are non-normal, or whose model fails to converge in AMOS. I have watched this in paper after paper. Researchers justify PLS not by research goal but by data weakness. That is not a methodological choice. It is a shortcut. PLS is not a rescue tool. It is a different methodology with different goals, and using it to dodge data problems turns a prediction technique into a black box that generates significant paths under almost any conditions.

If the method is flexible and the design is fragile, the findings become difficult to trust. Multiply that by the most cited theory in the field. Davis (1989) built the Technology Acceptance Model as a clean, parsimonious chain from perceived ease of use to perceived usefulness to behavioral intention. Venkatesh et al. (2003) later consolidated acceptance research into UTAUT. Both models were elegant. The field, however, has run thousands of survey-based TAM studies, and the vast majority measure perceived usefulness and self-reported use from the same respondent at the same time. When common method bias inflates those paths by design, the significant coefficients become artifacts of the measurement process. I keep asking myself how many of those published TAM effects would survive if the independent variables were collected from a different source, or if use were measured with objective system logs rather than a self-report item. I am not sure we know, because almost no one tries to replicate TAM under stricter conditions. The theory has been tested to death, but it has rarely been tested well.

Psychology and medicine have been loud about their replication crises. IS has been quieter, and I think that quietness comes from a mistaken belief that our methods are somehow exempt. They are not. Our dominant empirical design is an all-perceptual, same-source, cross-sectional survey. Our most popular statistical technique is often applied for the wrong reasons. And our most reused theory has been validated so many times under the same biased conditions that the accumulated evidence looks more robust than it actually is. That is the definition of a replication problem.

Positivist research is supposed to be evaluated by reliability, validity, and replication. We have turned the first two into checklists everyone completes in the methods section. The third one, replication, barely appears in the conversation. We do not need to abandon surveys, SEM, or TAM. We need to stop pretending that a significant path coefficient in a cross-sectional PLS model is enough. Podsakoff et al. gave us the playbook for stronger designs. Chin gave us the logic for choosing PLS honestly. What we do with them is up to us.

If the next TAM study in your reading queue still collects every variable from the same person on the same afternoon, you should ask what exactly is being replicated.