I rolled my eyes when a study partner said critical realism was important. Three reads later I apologized, at least internally.
A study partner told me critical realism was important. I rolled my eyes. Not visibly, I am polite enough for that, but internally I had already filed it under background philosophy that the field treats as significant without ever showing me why. I was wrong about that and I want to walk through how I found out.
The first time I read about critical realism it sounded like common sense dressed up in language it did not earn. The world exists whether you study it or not. You cannot observe reality directly. Your measurements are always fallible. These felt obvious, not revolutionary. I wondered why anyone would build a whole paradigm around something so straightforward when there was actual research to design. Positivism gave you clean hypotheses and testable predictions. Interpretivism gave you rich accounts of meaning and context. Critical realism gave you a lecture about ontology. I filed it under philosophical background, not practically useful, and moved on to theories I actually needed to know for comps.
The second reading came during paradigm comparison. Day 2 of my study hub had the full treatment. Positivism assumes objective reality and tests falsifiable hypotheses. Interpretivism assumes socially constructed reality and pursues understanding through meaning. Critical realism assumes a stratified ontology: the real which is mechanisms and structures and powers, the actual which is events generated when mechanisms are triggered, and the empirical which is the subset of events we observe. Wynn and Williams (2012) explained that observed regularities may not reveal underlying causal mechanisms because context activates or suppresses them in open systems. I read that paragraph. I underlined it. I put it in my comparison table. And I still did not absorb it. I could define retroduction, reasoning backward from observed events to the mechanisms that could have generated them. I could name Mingers et al. (2013) as the citation for CR bridging positivist and interpretive research. I could write a clean exam paragraph explaining stratified ontology and explanatory power as the evaluation criterion. But I could not feel why any of it mattered beyond the exam table. The distinction between the real, the actual, and the empirical felt like a philosophical refinement that did not change what I would actually do in a study. I would still collect data. I would still analyze it. The labels would be different but the work would look the same.
I think the problem was that I had no concrete problem that required the stratified ontology. I had the framework in my head but nothing pressing to apply it to. The real test of a theory is not whether you can define it. It is whether the theory does work that would not get done without it. And I had not yet encountered a question that CR answered better than the alternatives.
The third reading was not a reading. It came from watching what happens when machine learning models go into production and fail.
The pattern is well known to anyone who has deployed a model. A neural network trains on historical data, achieves strong performance on a held out test set, and then drops in accuracy once it faces live data. Every ML engineer who has shipped a model has experienced this. The standard explanation is concept drift or distribution shift. The statistical properties of the target variable change over time. The model was optimized for the old distribution. The new distribution produces different patterns. So the model degrades and needs retraining.
I had always accepted that explanation as a technical problem. Better monitoring pipelines. More frequent retraining cycles. More robust training procedures that generalize across distributions. Then it hit me that concept drift is not a technical problem. It is an ontological one.
A machine learning model learns patterns in the empirical domain. It processes observations, historical records, whatever data was collected during the training window. It does not and cannot access the generative mechanism that produced those observations. The model has a representation of the empirical trace. It has no representation of the real domain, the structures and mechanisms and powers that actually generate events in the world. When the generative mechanism changes, because the economy shifts or user behavior changes or the operational environment is restructured, the model keeps predicting from the old pattern. It has no way of knowing the mechanism changed. It does not even have the concept of a mechanism.
The stratified ontology does not describe an abstract philosophical position. It describes exactly why this failure happens. The real domain contains the mechanisms that generate events. The actual domain contains the events that occur when those mechanisms are triggered. The empirical domain contains the subset of events we observe and record. A model operates entirely in the empirical domain. It has no access to the actual domain, to events that happened but were not captured in the training set. It has no access to the real domain, to the causal forces that produced whatever regularities appeared in the data. When you deploy an AI system and it fails six months later, you are watching the stratified ontology in motion. A mechanism changed in the real domain. The empirical trace shifted. The model had no representation of the mechanism, so it could not anticipate the shift.
This changed how I understood retroduction. Wynn and Williams (2012) define it as reasoning backward from observed phenomena to the mechanisms that could have generated them. Before the concept drift realization, I read retroduction as a methodological preference, a third option between induction and deduction that you choose based on your paradigm commitment. After, it became the thing you actually do when a model fails and you need to understand why. You look at the failure pattern. You ask: what mechanism could have produced this? Was it a change in user behavior? A shift in regulatory policy that altered the incentive structure? A modification to the upstream data pipeline that introduced a previously absent bias? The mechanism is not directly observable. You infer its existence from the pattern it leaves behind. That is retroduction. It is not an abstract inference mode. It is everyday diagnostic reasoning applied whenever the empirical layer stops matching the generative reality. CR gave that reasoning a name and a philosophical justification, but the reasoning itself is what working engineers do when something breaks and they need to trace the failure to its cause.
Mingers et al. (2013) emphasize that CR is not a compromise between positivism and interpretivism. It is a distinct position with its own ontology and its own logic of inquiry. I finally understood why that distinction matters. Positivism describes the empirical layer and tests whether observed patterns are statistically significant. Interpretivism describes how actors make sense of events at the actual and empirical levels. CR insists that there is a real layer beneath both, that mechanisms have causal power independent of whether anyone observes them or makes sense of them, and that those mechanisms can change without warning. The stratified ontology is not an extra philosophical commitment that you add to your research design for completeness. It is the minimum framework you need if your question is about why a system that worked stops working. And that question, as AI systems get deployed into more organizations, is going to become more central to IS research, not less.
I never formally apologized to my study partner. But the next time she mentioned critical realism I did not roll my eyes. I said she was right and I had been slow to see it. That is the closest thing to an apology that either of us needed.
About the author
Share
More notes
Related notes