EdTech and the Learning Analytics Problem

A few semesters ago I noticed something in one of my online course platforms. The dashboard showed I had spent less than average time on a particular module compared to my peers. The system flagged it in a slightly different color, I think to suggest I might be at risk of falling behind. What the system could not know is that I had already read three papers on that topic, taken notes in a separate notebook, and had no intention of re-reading the introductory content that the module was mostly made of. The dashboard had data. It had almost no information about my learning.

That experience stuck with me because it points to a problem that runs through most of the learning analytics conversation: the data is plentiful, but the gap between what gets measured and what actually constitutes learning is often enormous.

Learning analytics is the practice of collecting data about student behavior within digital learning environments and using it to understand, predict, or improve learning. The data available to platforms like Canvas, Coursera, and edX is substantial. Logins, session durations, video watch rates, quiz scores, assignment completion timestamps, forum participation, click paths through content. All of that gets recorded. And the aspiration, fully reasonable on its face, is that if you have enough of this data, you can identify students who are falling behind before they fail, adapt the content to individual learning patterns, and close the gaps that traditional instruction misses because a single lecturer cannot track forty students simultaneously.

The problem is that time on task is not the same as learning. A student who reads slowly, carefully, and annotates a text will look identical to a student who opens a browser tab and goes to cook dinner. Both spent the same number of minutes "on" the module. A student who clicks through video lessons at 2x speed while taking careful notes will show lower average watch time than a student who plays every video at 1x while doing something else. Engagement metrics are easy to collect. They are much harder to interpret, and the gap between the metric and the construct it is supposed to measure is not a small data quality issue. It is a fundamental validity problem.

The at-risk flagging systems that some platforms use to identify students who might disengage or fail are trying to do something genuinely useful. Early warning is valuable if it leads to effective intervention. The concern I have is about what happens when those systems are trained on historical data that reflects structural inequalities rather than learning potential. If students from certain backgrounds historically showed lower engagement metrics for reasons that had nothing to do with their ability, and if an algorithm trained on that history then flags students from similar backgrounds as higher risk, the system has encoded a historical pattern as a predictive signal. The student who logs in less frequently because they share a computer, or because they work more hours, or because their broadband is unreliable, shows up in the data the same way as a student who is genuinely disengaged. The algorithm does not know the difference.

The algorithmic recommendation side of this has its own complications. Platforms that use behavioral data to make personalized content recommendations or to adjust pacing are building on the same measurement base. If the signals being used to drive adaptation are noisy, the adaptation can drift in unhelpful directions. A student who skips introductory content because they already know it might receive a recommendation to go back and complete the basics. A student who flies through foundational material because they are answering questions randomly rather than thinking carefully might get pushed to advanced content too quickly. The system is optimizing for signals. Whether those signals correlate with learning is the empirical question that often gets assumed rather than demonstrated.

FERPA, the Family Educational Rights and Privacy Act, gives students in the US some control over their educational records and restricts how that data can be shared without consent. In K-12 contexts, additional protections apply. But FERPA was written for a world of paper transcripts and administrative records. The behavioral trace data that a modern learning management system generates, click patterns, time stamps, emotional inference if the platform uses cameras or attention monitoring, sits in a complicated space relative to those rules. And in corporate training contexts, which is where Gartner has noted that learning experience platforms are evolving toward adaptive and personalized delivery, many of the student data protections simply do not apply at all. An employee completing a training module has far fewer rights over the data that generates than a university student would. Whether the adaptive features those platforms are developing will actually improve learning outcomes or just measure completion more granularly remains, in my read of the space, an open question.

The deeper issue is one that IS researchers know well from adjacent domains. Data availability creates pressure to use the data, even when the validity of the measurement is unclear. Organizations that invested in learning management systems with analytics dashboards face pressure to make use of the dashboards. Making use of the dashboards means treating the metrics as meaningful. Treating the metrics as meaningful means potentially acting on signals that are noisy proxies rather than direct measures of learning. This is not unique to EdTech. The same pattern shows up in performance management, health monitoring, and organizational surveillance more generally. But in education, the stakes for the individual students on the receiving end of those decisions are high enough that the validity gap is worth taking seriously as an IS design problem rather than just a data quality footnote.

What would it look like to treat learning analytics as an IS problem rather than a data problem? It would start with being honest about what each metric measures versus what it is being used to predict. It would require building feedback mechanisms that let students contest or contextualize their own data rather than just being flagged by an algorithm they cannot see. And it would require treating the equity implications of historically-trained predictive systems as a design constraint, not a post-hoc diversity concern. The platforms that get this right will probably look less impressive on a feature sheet and more defensible in an audit. That tradeoff is worth thinking about.