RLHF Is Structuration Theory: We Train AI, Then AI Trains Us

I was reading about RLHF last week, the training pipeline that major AI labs now use to align language models with human preferences. I had understood the mechanism before. Human raters rank model outputs. The model learns from the ranking. The model gets better at producing the kind of answers raters prefer. But I was also reading Giddens again, prepping for comps, and the structural loop jumped out at me. RLHF is not just a clever training technique. It is the duality of structure running as software.

Giddens (1984) built structuration theory around the idea that social structures are simultaneously the medium through which action is organized and the outcome of that action. Structures enable action and constrain it at the same time. The patterned action that people produce then reproduces or transforms the structures. The loop is recursive. RLHF makes that loop concrete in a way I think the IS field has not fully recognized.

In an RLHF pipeline, human raters rank model outputs according to a preference criterion, typically some combination of helpfulness, honesty, and harmlessness. That ranking becomes a reward model, a structure encoded as neural network weights. The language model trains against this reward model, adjusting its parameters to produce outputs that score highly. The preference structure constrains the model. Outputs that deviate are punished by low scores. Outputs that conform are reinforced. This is the duality of structure at the technical level. The structure enables certain outputs and constrains others.

But the recursive part is what makes it structuration rather than simple conditioning. The model generates new outputs optimized for the preference structure. Those outputs go into production, where users read them, form expectations, and internalize the AI's style as normal. When the next round of human raters ranks outputs, their preferences have shifted. They have been reading AI-generated text for months. The structure has been transformed by the agent's outputs. The loop closes. The structure shaped the agent, and the agent's outputs reshaped the structure.

Orlikowski (1992) applied structuration to technology by arguing that technology is both a product of human action and a medium of human action. RLHF fits this framing tightly. The reward model is a product of human ranking behavior. It is also a medium that shapes what users see, expect, and eventually prefer. Orlikowski introduced technology-in-practice to emphasize that the structural properties of a technology are enacted through recurrent use and are never fixed at design time. The reward model shifts with every round of training data, and the training data is itself partially generated by the system. The technology-in-practice of an RLHF system is never the same twice.

The two most discussed problems with RLHF are structuration problems in disguise. The first is model drift toward safe generic answers. When raters consistently rank safer, blander outputs higher, the preference distribution tightens around the center of the safe zone. The model becomes constrained to that zone. It is not a training failure. It is the structure restricting the range of possible action, exactly as structuration predicts. Markus and Robey (1988) would recognize this as the technological imperative reversed: the human preferences encoded in the model take on a constraining force that looks deterministic from the model's perspective.

The second is the recursive contamination problem where AI-generated content becomes training data for the next model. When a model's outputs enter the training pool for the next RLHF round, human raters rank AI-generated text alongside human-written text. When the pool is dominated by AI text, the preference distribution converges on the average of what the model was already producing. The structure is no longer encoding human preferences. It is encoding the model's own previous output distribution, which was itself shaped by human preferences from an earlier round. The duality becomes a closed loop that shrinks with each cycle. This is what structuration theory would predict when the medium starts producing the conditions of its own reproduction. Leonardi (2011) warned that structuration can slip into structural determinism when the recursive dynamic is not examined. The RLHF data loop is that warning realized at scale.

Alignment cannot be a one-time fix for exactly this reason. You cannot train a model to be aligned, lock the weights, and walk away. The preference structure changes as users interact with the model's outputs. It changes as the training data composition shifts. Maintaining alignment requires ongoing recursive engagement where humans update the preference model and the model continues to shape what humans expect. Orlikowski argued that technology-in-practice is enacted through recurrent use. Alignment works the same way. It is not a property you embed into a system at training time. It is a relationship you maintain over time, and the relationship itself changes the terms. Every alignment technique that treats the problem as a one-shot optimization is fighting against the recursive logic that RLHF itself creates.

I think the IS field should stop treating RLHF as a technical detail and start analyzing it as a structural phenomenon. Sarker et al. (2019) classify genuine sociotechnical research as Type IV, where social and technical dimensions genuinely interact. RLHF is textbook Type IV territory. The social dimension is human raters with shifting preferences embedded in institutional contexts. The technical dimension is the reward model and the language model. Neither determines the other. They co-evolve through the RLHF loop. A Type IV analysis would ask questions that the optimization literature never touches. Who gets to be in the rater pool, and whose preferences become structural? How does the institutional context of the AI lab that designs the reward model shape what counts as helpful or harmless? What happens when the preference structure propagates across cultural and linguistic boundaries where the original ranking criteria do not transfer? These are not engineering questions. They are structuration questions.

Burton-Jones et al. (2021) proposed four shifts in how IS should think about theorizing. The shift from theory as representational to theory as performative is the one that connects to RLHF most directly. Theories do not just describe the world, they argued. They shape it. When practitioners adopt a theory's vocabulary, the theory starts influencing the phenomenon it was describing. RLHF makes structuration performative in a literal engineering sense. Giddens described the duality of structure as a way of conceptualizing social life. RLHF engineers did not set out to implement Giddens. But they built a system where the duality runs as a reward loop. The theory describes the system, and the system enacts the theory. That is performativity at the infrastructure level, and it is one of the most interesting things I have seen in the years I have been studying IS theory.

The practical implication is that every decision about an RLHF system is a structural decision, whether the people making it think in those terms or not. Choosing the rater pool is choosing whose interpretive schemes become the signification modality of the structure. Choosing the ranking criteria is choosing the legitimation modality. Choosing what data goes into the training mix determines whether the structure reproduces itself or stays open to transformation. These are the same dimensions Giddens identified as the three modalities linking structure to agency: signification, domination, and legitimation. RLHF engineers are building systems that operationalize all three, usually without knowing that structuration theory exists.

I wrote about how structuration explains why the same tool produces different outcomes in different departments. RLHF scales that insight from the department level to the societal level. The preference-training pipeline is the same mechanism running across millions of users. The structural context is different depending on who builds the system, who rates the outputs, what content fills the training pool, and what users the model serves. The outcome is different every time. And the duality of structure never stops running, whether you designed for it or not.