The Cloud Migration Nobody Talks About Honestly

The pitch is always the same. Move to the cloud and you will cut infrastructure costs, scale faster, and stop worrying about hardware. Your on-premise servers are a liability. The cloud is a variable expense you can dial up and down. The slide deck is clean and the math is usually compelling. Then the bill arrives twelve months later and it is three times what anyone projected, and suddenly the CTO is sitting in a room explaining to the CFO why "cloud is cheaper" turned out to be wrong.

I have seen this story play out in enough organizations that I stopped being surprised by it. What surprises me now is how consistently the industry press keeps repeating the original pitch even as a growing number of companies quietly move workloads back on-premise.

The cost problem is not a secret. As argued in a widely circulated piece by Andreessen Horowitz around 2022, cloud costs can become the second or third largest expense item for software companies, after salaries and, sometimes, sales. Their argument was that cloud repatriation, moving workloads back to owned infrastructure, could save large companies hundreds of millions of dollars over time. These are their own estimates and their own modeling, not peer-reviewed accounting, but the directional claim lines up with what a lot of IT leaders will tell you in private. The cloud is expensive when you actually run production workloads at scale for years.

Dropbox is the case that gets cited most often. Around 2016 and 2017, they moved a large portion of their storage workloads off AWS and onto their own hardware. By their own account, this saved them roughly $75 million over two years. Dropbox is a specific kind of company: massive amounts of relatively predictable storage demand, the engineering capability to build and operate their own infrastructure, and enough scale to make that investment worthwhile. Most companies are not Dropbox. But the case demonstrates something important: the on-premise model is not obsolete, and for certain workload profiles it is substantially cheaper.

The trouble starts before you sign any contracts, in how migration projects get scoped and costed in the first place. Total cost of ownership is the standard framework for these decisions, but the way TCO gets calculated in migration business cases is often incomplete in the same predictable ways. The cost of retraining staff for cloud-native tools gets underestimated. Refactoring legacy applications to work efficiently in the cloud gets skipped entirely. Downtime during migration is treated as a one-time event rather than an ongoing risk. And egress fees, the charges you pay to get your own data out of the cloud, are almost never in the original spreadsheet.

Egress fees deserve more attention than they usually get. Every major cloud provider charges you to transfer data out of their environment. Moving data to another cloud, pulling it back on-premise, or sending it to a third-party analytics platform all costs money per gigabyte. When organizations are evaluating a migration, they are thinking about compute costs and storage costs. The exit costs are not front of mind because at the evaluation stage, nobody is thinking about leaving. By the time the egress costs become real, you are already in. The data is there. The applications depend on it being there. And leaving now means paying to extract everything you put in.

This is not accidental. Oliver Williamson's transaction cost economics gives a clean explanation for why this structure persists. Once an organization builds on a cloud provider's proprietary services, the switching costs become enormous. Using AWS Lambda, Azure Cognitive Services, or Google's Vertex AI means your code is written against those specific APIs and those specific abstractions. That code does not port to another cloud easily, and it does not port back on-premise at all without significant refactoring. The asset specificity in Williamson's terms is very high: the investments you make become specific to that particular vendor relationship. The higher the asset specificity, the more locked in you are, and the more leverage the vendor has in every subsequent negotiation. I wrote about how platform governance works as a mechanism for capturing and retaining value in the platform decides who wins, and cloud lock-in follows the same structural logic. The platform is designed so that leaving costs more than staying.

The lift-and-shift approach makes this worse. Lift and shift means taking an existing workload, picking it up, and putting it in the cloud with minimal changes. It is the fastest path to migration and the path that most organizations take because the alternative, actually redesigning the application to be cloud-native, takes far longer and costs far more upfront. The problem is that on-premise architecture makes assumptions that cloud pricing does not respect. On-premise, you pay for hardware once and run it continuously. A server running at 20% utilization costs you the same as one running at 90%. In the cloud, you pay for what you use, but the pricing model also rewards sustained use and penalizes the kind of bursty, always-on workload patterns that legacy applications were designed around. A workload that was economical on owned hardware can be shockingly expensive in the cloud if it was not designed with cloud pricing in mind. Lift and shift transfers the workload without addressing any of these assumptions, and the bill reflects that immediately.

I think about this as an instance of the same problem I described in why ERP implementations fail structurally, not technically. The technology decision is made at the executive level, with projections that assume the organization will adapt its processes to the technology. The actual adaptation is harder than projected, the costs accumulate, and the gap between the business case and reality grows. Cloud migration has the same shape: a confident projection, an underestimated adaptation cost, and an outcome that looks nothing like the slide deck.

What is happening now as a result is repatriation, a word that sounds more dramatic than the reality but captures something real. Organizations are not abandoning the cloud entirely. They are becoming more selective. Workloads with predictable, sustained compute needs, particularly storage-intensive or high-throughput data processing jobs, are moving back on-premise or to colocation facilities where the economics are cleaner. Workloads with genuinely variable demand, customer-facing applications that need to scale for unpredictable traffic, and services that benefit from geographic distribution are staying in the cloud. This is not a repudiation of cloud computing. It is the industry slowly arriving at the conclusion that cloud is a good fit for some workloads and a bad fit for others, and that the original pitch that cloud is always better was a simplification that benefited vendors more than customers.

The honest version of the cloud decision is not "cloud vs. on-premise." It is "which workloads belong where, and what will it actually cost to move them, run them, and, if necessary, move them again?" That calculation is harder than the migration vendor wants it to be, because part of the answer involves modeling a future where you might want to leave. Vendors are not motivated to help you model that scenario. The egress fees, the proprietary service dependencies, and the migration inertia are all features of a system designed to make staying the default. An organization that goes in without understanding this will eventually learn it on the bill. Most of them do.