Surveillance Capitalism and What Your Data Is Actually For

You probably know that Google collects data about your searches. The part that took me a while to fully understand is what that data is for. The obvious answer is: to sell you targeted advertising. That is true but incomplete. Shoshana Zuboff's "The Age of Surveillance Capitalism," published in 2019, argues that the economic logic at work is more specific than targeting, and understanding the specificity changes how you think about what is actually being built.

Zuboff's argument centers on a concept she calls "behavioral surplus." The data that companies like Google and Facebook collect exceeds what would be needed to improve their services. You could improve search results and email filtering with a fraction of the data that Google actually collects. The excess, the behavioral surplus, is extracted and used for a different purpose: building what Zuboff calls "prediction products." These are products sold to advertisers and others that promise to predict what users will do, buy, click, or feel. The customer of a prediction product is not you. You are the raw material.

This reframes the business model in a way that I think is more accurate than the standard "if you're not paying, you're the product" formulation. The "you're the product" framing is catchy but slightly wrong. You are not the product being sold. Your predicted future behavior is the product being sold. The advertiser is not buying access to you. The advertiser is buying a probabilistic guarantee about what you will do when you see their message. The difference matters because it means the goal of the data collection is not just to know what you have done. It is to use what you have done to build a model that predicts and, in Zuboff's framing, modifies what you will do next.

"Behavioral modification at scale" is how Zuboff describes the endpoint of this logic. Advertisers do not only want to reach people who already want their product. They want to make people want their product. The most valuable prediction products are not just targeting tools. They are influence tools. And the behavioral data that makes them work is extracted from your daily digital activity without you being aware of the mechanism or consenting to the use.

I want to be careful about the specific claims here because Zuboff's book is ambitious and some of its arguments are contested. I am not reproducing any specific quotes from the book because I do not have the text in front of me and I do not want to fabricate her words. The concepts I am describing, behavioral surplus, prediction products, behavioral modification, are her concepts and I am attributing them to her. The general argument, that major platforms extract behavioral data beyond what is needed for service improvement and use it to build influence tools sold to third parties, is the argument she makes in the book.

The IS dimension is the one I find most relevant for my own work. The data practices of major platforms are not technical necessities. They are design decisions. How much data to collect, how long to retain it, whether to use it only for service improvement or also for prediction product development, what consent mechanisms to present to users, and how to design those consent interfaces, all of these are choices made by people in organizations. GDPR attempted to regulate some of these choices by requiring that data collection have a specific legal basis and that consent be freely given, specific, informed, and unambiguous. The implementation of that regulation is a useful illustration of the gap between legal compliance and the spirit of what the regulation was trying to achieve. Most GDPR consent interfaces are technically compliant and structurally designed to maximize acceptance. The consent is real in a legal sense. The "freely given" part is questionable when the alternative to clicking "accept all" is a degraded or inaccessible service.

The privacy calculus framework, which I wrote about in the privacy paradox post, helps explain why people accept these arrangements. The perceived benefit of immediate access is concrete and present. The cost of surveillance capitalism, the behavioral modification, the prediction products, the influence at scale, is diffuse, deferred, and mostly invisible to the person clicking "accept." The information asymmetry is almost complete. The user cannot see what is being built from their data, cannot see the prediction products that will be used on them, and has no mechanism for understanding how their future behavior is being influenced by systems they have no access to. The privacy calculus is being run on incomplete information, by design.

The platform governance question runs underneath all of this. I wrote about how platform design is governance and how the rules of a platform ecosystem are written to serve the platform's interests. Surveillance capitalism is the economic logic that makes those governance rules coherent. The platform is not just choosing what content to show you. It is building a model of your behavior to sell predictions about your future actions to parties who want to influence those actions. The governance of that system, who gets to look at your behavioral data, who gets to buy prediction products built from it, and what those products can be used for, is almost entirely controlled by the platform owner.

What I take from Zuboff's argument is that privacy debates framed around "what data are companies collecting" are asking a slightly wrong question. The more fundamental question is what the data is for. The same data can be used to improve a service or to build an influence product. The behavioral surplus concept says that major platforms are doing both, and that the second use, not the first, is where the economic value and the social harm are concentrated. Regulating data collection without addressing what the data is used for may produce compliance without changing the underlying dynamic.

I do not think this means the data economy is irredeemable or that digital services cannot be built on different terms. Some services do operate with minimal data collection and without prediction product business models. But they are not the dominant model, and the dominant model is what shapes the infrastructure that most people use most of the time. Understanding that infrastructure accurately requires understanding what behavioral surplus is for.