AI & Agentic Systems

By 2028, Half of Organizations Will Adopt Zero-Trust Data Governance Because of AI

When AI generates unverified data at scale, the assumption that data is trustworthy by default becomes the most dangerous assumption in your architecture.

2026-05-14 · 6 min read AI & Agentic SystemsComps & ReflectionsIT Governance & Strategy
ZeroPart 1 of 6
1Zero Trust Data GoveZero Trust EverywherZero Trust Security Zero Trust Trust CalZero Trust Vendor Ma

Gartner published a prediction earlier this year that I find genuinely unsettling as an IS researcher: by 2028, 50 percent of organizations will adopt zero-trust data governance frameworks specifically because of the growth of AI-generated, unverified data flowing into their systems (https://www.gartner.com/en/newsroom/press-releases/2026-04-07-gartner-forecasts-worldwide-it-spending-to-grow-9-8-percent-in-2026). I keep coming back to the word "unverified." Not incorrect data. Not low-quality data. Unverified, meaning data whose origin and relationship to real-world events cannot be traced with confidence. That is a new kind of data quality problem, and it is being introduced at scale right now, because McKinsey reports that 88 percent of organizations already use AI (https://www.mckinsey.com/capabilities/quantumblack/our-insights/the-state-of-ai). Whatever those organizations are generating with that AI is landing somewhere in their data environments. The question of whether they can trust it, and whether their governance frameworks can even ask that question, is what the Gartner prediction is really about.

Traditional data governance was built on an assumption so fundamental that nobody needed to state it explicitly: data came from somewhere real. A transaction occurred. A sensor measured something. A person filled out a form. The record might be inaccurate, the form might contain errors, the sensor might have drifted. But there was a traceable origin in a real-world event. Data quality frameworks were designed to catch errors in records that represented real events. Lineage tools tracked how data moved through pipelines and what transformations it underwent. The question was whether the data accurately represented what happened, not whether it represented anything at all.

Generative AI breaks that assumption at the root. When an LLM produces a summary, a report, a synthetic data record, or a paragraph of analysis, it is generating output that represents what the model calculated was plausible given its training distribution. That output may or may not correspond to any real-world event. It looks, syntactically and semantically, exactly like text a human would write. It carries none of the epistemic markers that distinguish "I observed this" from "my model predicted this was likely." Once that text enters an organizational knowledge system, a database, a training dataset, a regulatory filing, the standard data governance framework cannot distinguish it from human-produced content. It is in the pipeline. It will be treated as information.

Zero-trust data governance applies the same logic to data that zero-trust network security applies to users and devices. In a zero-trust security model, no user or device is assumed trustworthy simply because it is inside the network perimeter. Continuous verification, minimum necessary privilege, and access controls at each transaction point are the operating assumptions. Zero-trust data governance extends this to the data itself: no data record is assumed accurate, authoritative, or real-world-grounded until its provenance is verified. Every piece of data used in a consequential decision must carry traceable attribution. AI-generated content must be labeled as such. Data that passed through any generative inference step must carry a flag that distinguishes it from data produced by direct observation or measurement.

The IS theory that frames this best for me is sociotechnical systems thinking from Trist and Bamforth (1951). Their core insight was that technical changes do not happen in a social vacuum. Every change to the technical system changes the social system around it. The introduction of AI into organizational content production is a profound technical change, and it is reshaping the social system of knowledge production inside organizations. When an analyst produces a report using an LLM, the social process of authorship and verification changes. The analyst may not check every claim the LLM generates. The review process may not be designed to catch AI confabulation because it was designed to catch human errors. The governance infrastructure was built for a different sociotechnical configuration, and organizations are discovering the mismatch only when something goes wrong.

The provenance problem is what makes zero-trust data governance technically demanding in practice. Traditional data lineage tools track transformations: this field in the reporting database came from this join, which came from this ETL job, which pulled from this operational system. That lineage is about process flow. Zero-trust data governance requires something more: not just which systems the data passed through, but whether any step in that flow involved generative inference, and if so, which model produced it, using which version of that model, trained on which data. That is a different kind of tracking requirement. Most current data lineage tools were not designed to capture it.

The synthetic data problem compounds this further. Synthetic data, generated by AI to mimic the statistical properties of real data, has legitimate uses: training models when real data is scarce, testing systems without exposing personal information, augmenting datasets to improve coverage. But synthetic data that enters a production pipeline without clear labeling creates a recursive governance problem. If an organization trains a model on data that includes synthetic records generated by an earlier model, and that new model produces outputs that enter another dataset, each generation removes the data further from real-world grounding. Lineage tracking tells you which system the data came from. It does not tell you whether that system was itself generating rather than observing. Without a framework for tracking the generative chain, data quality audits become meaningless for AI-touched records.

The absorptive capacity framework from Cohen and Levinthal (1990) is relevant to how organizations will respond to this problem. Building zero-trust data governance requires understanding a complex and technically specific set of requirements: how to implement provenance tracking at the record level, how to label AI-generated content in a way that persists through transformations, how to build exception handling for cases where provenance cannot be established. Organizations with high absorptive capacity, meaning those that have been investing in data engineering talent and data governance maturity, will be able to build this infrastructure. Organizations with low absorptive capacity will not understand what is required until something fails visibly, and even then may not have the internal capacity to fix it without substantial external help.

What I find most unsettling about the Gartner prediction is the implied timeline. By 2028 means the next two years. The organizations that will be in that 50 percent are mostly not finished building zero-trust data governance today. Many have not started. The organizations that are already generating significant volumes of AI content and routing it into knowledge systems, databases, and analytical pipelines are the ones most exposed to this problem right now. The gap between current practice and where Gartner predicts they need to be in two years is large for most of them.

As an IS researcher, the question this raises for my work is about what organizational capabilities and governance configurations allow organizations to detect and respond to data quality problems whose root cause is a change in how data is produced, not a change in measurement error or process failure. Those are different failure modes and they require different detection mechanisms. The AI confabulation error is not caught by the same audit process that catches a duplicate record or an out-of-range sensor reading. If half of organizations are adopting zero-trust data governance by 2028, the more interesting research question is what the other half are doing with their AI-generated data, and what consequences they are not yet seeing.

---
claims_checked:
- "50% of organizations will adopt zero-trust data governance by 2028 due to AI-generated unverified data": "https://www.gartner.com/en/newsroom/press-releases/2026-04-07-gartner-forecasts-worldwide-it-spending-to-grow-9-8-percent-in-2026"
- "88% of organizations use AI (McKinsey 2026 State of AI)": "https://www.mckinsey.com/capabilities/quantumblack/our-insights/the-state-of-ai"
- "Trist & Bamforth 1951 sociotechnical systems": "academic reference, consistent with field knowledge"
- "Cohen & Levinthal 1990 absorptive capacity": "academic reference, consistent with field knowledge"
claims_unverified:
- "Most organizations have not started building zero-trust data governance frameworks: directional assessment, consistent with field observation but not from a single empirical source"
- "Existing data lineage tools were not designed to track generative inference steps: directional technical claim, consistent with tool capability knowledge"
sources_used:
- "https://www.gartner.com/en/newsroom/press-releases/2026-04-07-gartner-forecasts-worldwide-it-spending-to-grow-9-8-percent-in-2026"
- "https://www.mckinsey.com/capabilities/quantumblack/our-insights/the-state-of-ai"
word_count: 1090


About the author

A
Ali Safari
PhD Student in IS, University of North Texas

Researching AI governance, trust in intelligent systems, and agentic AI. Writing while studying for comps.

Share

More notes

← Previous
Zero-Trust Data Governance and the AI Provenance Problem
Next →
Your Paradigm Is Not Neutral

Related notes