Tiny Teams, Big Questions: What Happens When AI Writes the Code

I kept coming back to two numbers from the Gartner 2026 Strategic Technology Trends report that crossed my desk this semester in BCIS 6670. Number one: 80% of organizations will evolve large software engineering teams into smaller, AI-augmented teams by 2030. Number two: 40% of enterprise application portfolios will include custom applications built using AI-native platforms by 2030, up from 2% in 2025. The report talks about tiny teams of two people delivering what teams of twenty used to build. It uses the phrase vibe coding. It describes AI agents orchestrated together to create software. The numbers are dramatic and the report is from a practitioner source, not an academic paper, so I take the exact percentages as directional rather than precise. But the direction is what stopped me. Not because of the productivity claim, but because I had already read the theoretical framework for exactly this shift.

I wrote about Baird and Maruping (2021) in detail before, specifically in a post about why use is the wrong construct for agentic information systems. The argument there is that when a system has agency, the relationship between human and artifact changes from operator-to-tool to delegator-to-proxy. That post is at `/blog/stop-counting-users-start-measuring-delegation` if the theoretical framing is useful. The human does not use the system in the ordinary sense. The human appraises whether the system can perform a task, distributes subtasks between themselves and the artifact, and coordinates the ongoing relationship. What I did not see clearly in that earlier post was that Gartner was describing this exact mechanism at industrial scale. Tiny teams are not smaller teams. They are delegation arrangements. The two people in a tiny team are appraising what the AI can build, distributing the coding work across human and agentic tools, and coordinating the output into a working application. The language in the report confirms it: AI-native development platforms range from prompt-driven one-shot tools through vibe coding environments for people without deep technical knowledge, all the way to AI agents that build software together. That is not a straight line from developer adoption. That is a delegation spectrum.

The three mechanisms of delegation from Baird and Maruping map onto the Gartner scenario in a way that I think most organizations are not ready for. Appraisal becomes the question of whether the AI can actually write production-ready code, not just prototype code. The Gartner report imagines a world where non-developers ship applications. But appraisal of AI-generated code requires a different kind of expertise than writing code yourself. You need to know what good code looks like in order to evaluate what the AI produced, and that expertise is exactly what non-developers do not have. The vibing developer may not know that the generated code has a race condition, an exposed credential, or a dependency that introduces a supply chain vulnerability. Appraisal breaks down when the appraiser lacks the knowledge to appraise, and the Gartner vision depends on nonexpert appraisal functioning at scale. I am skeptical.

Distribution is the second mechanism, and the Gartner report is explicit about the shift it describes: larger teams with numerous employees replaced by tiny teams of two people, enabled by AI-native platforms. The question Gartner does not answer is what stays with the human. Architecture decisions, security review, ethical boundaries, domain verification, deployment governance. If the two-person team distributes all of these to the AI platform, then the platform becomes the architect, the security reviewer, the ethicist, and the deployment engineer. That is not a tiny team. That is a team that has outsourced its judgment to an opaque system. The difference matters because outsourcing judgment to a platform is not the same as having judgment on the team. The Gartner report mentions establishing security guardrails, but guardrails are not the same as judgment. Guardrails catch known failure modes. They do not catch novel ones.

Coordination is the third mechanism and the one I think is most neglected in the current conversation. When a human and an AI agent write code together, who manages the interdependencies between the AI-written module and the human-written module? When five AI agents each generate a different microservice, who ensures that the interfaces align, that the error handling is consistent, that the logging infrastructure works across all of them? The Gartner report says tiny teams can deliver more faster. But the coordination cost of AI-generated components does not disappear just because an AI wrote them. It shifts from internal coordination among human developers to human-AI coordination across modules that the human may not fully understand. That is a harder coordination problem, not an easier one, because the human has less visibility into the components the AI wrote. Liu et al. (2025) show that delegation is dynamic, not static. Willingness to delegate shifts as performance feedback comes in. When the AI-written module fails in production, the human does not just fix the module. The human recalibrates their entire delegation model. That recalibration cost is real and it is not in the Gartner productivity calculation.

The vibe coding trend that the report describes makes all of these mechanisms more acute. People without deep technical knowledge are shipping applications built by AI. Some of these applications will be useful. Some will be trivial. Some will introduce security vulnerabilities into enterprise portfolios at a rate that human development teams never could because no single person on the tiny team understands the full stack. The GitHub Copilot usage data that I have seen from practitioner reports suggests that developers accept AI suggestions at high rates and verify the output at much lower rates. If that pattern holds for vibe coding outside enterprise engineering teams, the security review gap becomes a crisis trajectory. Not a future crisis. A present one.

I think most organizations are treating the Gartner report as a staffing memo. Reduce team size. Improve productivity. Save costs. That framing misses what I think is the actual structural change. Tiny teams are not a size reduction. They are a delegation architecture change. The organization is redesigning who builds software at all, and more importantly, who is responsible for what the software does. When an AI agent writes a function that causes a data breach, who is accountable? The two-person team that reviewed the output? The platform vendor whose model generated the vulnerable code? The organization that deployed it without sufficient testing? The Gartner report does not answer that question because it is not a question about productivity. It is a question about governance. I wrote about the theoretical shift from use to delegation at `/blog/stop-counting-users-start-measuring-delegation`, and the argument there is about constructs and mechanisms. What I did not emphasize enough then is that delegation is not just a theoretical construct shift. It is an organizational governance problem. When you delegate software development to an AI platform, you are not using a tool. You are transferring rights and responsibilities for a critical business function to a system whose failure modes you do not fully understand.

The organizations that will succeed with tiny teams are not the ones that cut headcount fastest. They are the ones that treat the shift as a governance redesign. They will specify what the AI may and may not build. They will define the appraisal criteria that the two-person team must apply before accepting AI-generated code. They will design the coordination protocols that manage interdependencies between human-written and AI-written modules. They will build the review and rollback infrastructure that makes delegation safe rather than fast. Baird and Maruping (2021) gave the field the mechanisms to think about this. Appraisal, distribution, and coordination are not just constructs for a research model. They are the dimensions of the governance framework that every organization deploying AI-native development platforms needs. Most organizations do not have that framework yet. They have a Gartner report and a cost-cutting mandate. That is a recipe for delegation without governance, and delegation without governance is just technical debt generated at machine speed.