Why 95% of AI pilots fail
(01)
Overview
MIT's State of AI in Business 2025 report found that 95% of generative AI pilots inside large enterprises deliver no measurable P&L impact. The models work. The operating model around them doesn't.
Year
2026
Industry
Operating Model / GenAI

Challenge
Every COO I speak with has the same story. The firm spent twelve to eighteen months running GenAI pilots. Procurement signed the enterprise license. A handful of engineers built impressive demos. Leadership saw the demos. Everyone agreed it was the future. Then nothing changed. Headcount didn't move. Cycle times didn't shorten. The cost-to-income ratio didn't budge. The pilot quietly stopped being mentioned in steering committee updates, and a new pilot started somewhere else. This is not a technology problem. The underlying models are extraordinary — frontier LLMs now outperform junior analysts on a wide range of structured tasks, and the cost per token has dropped roughly 90% in eighteen months. The technology is ready. What's not ready is the operating model. Most enterprises are trying to bolt GenAI onto processes that were designed for a pre-AI world: handoff-heavy, control-heavy, exception-heavy. You can drop the most capable model in the world into a process with fourteen handoffs and six reconciliation steps, and you will get a faster version of the same broken process. The model is not the bottleneck. The process is. The firms getting real value are doing something different. They are not asking "where can we add AI?" They are asking "what would this process look like if we redesigned it for an AI-native team?" That is a fundamentally different question, and it produces fundamentally different answers.
Impact
Second, they redesign the process before they deploy the model. They map the current state, strip out steps that exist only because a human couldn't hold enough context, and rebuild the workflow around what an AI-augmented team can actually do. The Lean Six Sigma discipline matters here — value-stream mapping is not optional, it is the work. Third, they treat change management as the primary cost line, not a footnote. The technology is maybe 20% of the spend. The other 80% is rewiring how teams actually operate: new role definitions, new controls, new performance metrics, new escalation paths. Firms that underweight this consistently end up in the 95%. None of this is glamorous. It does not produce demos that play well at the all-hands. But it produces the only thing that matters at the COO level — measurable change in the operating P&L. If your firm is in pilot purgatory, the question is not "do we need a better model?" The question is "are we willing to redesign the operating model around the technology, or are we just looking for a faster way to do what we already do?"