How the Wrong KPI Kills AI Pilots
(03)
Overview
There's a single question that tells you whether an AI pilot will deliver, and most teams can't answer it. Here's the test, and what it reveals about the project before a single line of code is written.
Year
2026
Industry
Operating Model / GenAI

Challenge
Walk into any AI pilot kick-off and ask one question: "What is the number this pilot is supposed to move?" Then watch the room. You usually get one of three responses, and none of them is a number. You get a goal ("improve efficiency"). You get an activity ("reduce manual work"). Or you get a category ("better risk management"). What you almost never get is a specific KPI with a current value and a target value: false-positive rate is at 38%; we need it under 20% by year-end. If a team can't answer that question on day one, the pilot has already failed. They just don't know it yet. This is most visible in risk functions, because risk teams are fluent in metrics. They have dashboards everywhere. Alert volumes, escalation rates, breach counts, model performance scores. So when an AI pilot lands in a risk team, the assumption is that of course there are KPIs, look at all these dashboards. But look harder. Most of those metrics measure activity. Number of alerts generated. Number of cases reviewed. Number of reports filed. Activity metrics tell you whether the team is busy. They don't tell you whether the business is better off. And a pilot pointed at an activity metric will succeed in moving the activity, generating more alerts faster, processing more cases per hour, without ever moving the thing the business actually cares about.
Impact
Here is the test, the one I run before agreeing to scope any pilot: Name the business outcome. Not the activity. Not the tool's output. The outcome. For a financial-crime monitoring pilot, the activity metric is "alerts generated." The business outcome is something like: what percentage of suspicious activity reports we file go on to trigger regulatory action, divided by the analyst-hours spent producing them. That is harder to define. It is also the only metric that tells you whether the AI is creating value or just generating noise faster. For a model-validation pilot, the activity metric is "validations completed per quarter." The business outcome is: time from model build to production deployment, weighted by the risk grade of the model. Activity says you are validating more. Outcome says you are unblocking the business. Notice what is happening in both examples. Defining the real outcome forces you to define the customer of the work. Who is this risk function actually serving? The regulator, the business line, the firm's capital position? Each customer cares about a different number. Most risk teams have never had that conversation explicitly, which is why the KPIs default to activity. Activity is easy to measure and offends no one. Outcome metrics force a fight about what the function is for. This is why the KPI question is a stand-in for a much deeper one: who is this pilot serving, and what does winning look like for that customer? If the team cannot answer the second question, they will not answer the first. And without a target number tied to a real customer outcome, the pilot will optimize whatever is easy to measure, and the P&L will not move. The good news: this test is cheap, and you can run it before spending a dollar. Before approving an AI pilot, require a one-page brief with three lines: The customer of this process is (named, specific). The number we will move is (named KPI, current value, target value, deadline). If the number doesn't move, the pilot failed regardless of whether the model worked. Most teams cannot fill in line one. Many cannot fill in line two. Almost none will sign up for line three. That refusal is the most useful signal in the room. It tells you the pilot is set up to look busy, not to win, and that the right answer is not "build the model anyway." The right answer is to send the team back to define the customer and the number before any technology gets involved. The model is the easy part. Naming the number it is supposed to move, and the customer it is supposed to serve, is the work that decides whether anything else matters.