The alchemy of AI: why most AI experiments fail to deliver ROI

Written by Gordon Ritchie | Mar 4, 2026 7:30:26 AM

If AI worked like King Midas, every experiment would turn to gold. In reality, only about 5% of AI experiments produce measurable ROI according to McKinsey. Organizations experiment aggressively. Few convert those experiments into bottom-line impact.

The real question for CIOs and CHROs is not whether to experiment with AI. It is how to turn experimentation into economic value.

AI capability is compounding. Work visibility is not.

Why most AI experiments fail to deliver ROI

AI experiments fail when they are disconnected from the work actually happening inside the enterprise.

Across organizations we see the same pattern:

Teams experiment with new AI systems
Pilots run in isolated environments
Results look promising but rarely scale
ROI remains unclear

The core problem: most experiments are not grounded in a clear understanding of the work itself.

Instead, organizations rely on outdated artifacts. Job descriptions designed for hiring. Not for understanding how work actually happens.

The scientific method for AI experimentation

AI experimentation should follow the same discipline as scientific research.

That means:

Define the current environment
Form a hypothesis
Control variables and inputs
Observe outcomes
Measure results
Refine and repeat

Breakthroughs like penicillin, Post-it notes, and Velcro all emerged through this process. AI experimentation should follow the same rubric.

Without this rigor, AI experimentation becomes something else entirely. It becomes playing with technology rather than deploying it responsibly.

The missing ingredient: understanding work at the task level

Organizations need a clear, structured understanding of how work actually happens. They need this before deploying AI agents.

Leading companies start with Work Context. Work Context is formed by 25 industry-specific Work Ontologies. It defines the metadata of work in natural language. Both humans and AI systems understand it.

This means describing work at the level where automation actually occurs:

Traditional workforce view	AI-ready workforce view
Jobs	Tasks
Roles	Work activities
Job descriptions	Work metadata
Headcount planning	Task coverage and productivity

When organizations map work this way, they:

Identify where AI can realistically contribute
Estimate potential productivity gains
Define measurable outcomes for experimentation
Track AI performance like any other part of the workforce

This creates the environment for meaningful experimentation.

From AI curiosity to confident investment

When AI experimentation is grounded in work data, leaders make investment decisions with measurable outcomes.

Instead of asking "What can this AI system do?" leaders ask:

Which tasks could this agent perform?
What percentage of the work does that represent?
What productivity or cost impact would that create?

This shift allows organizations to manage AI like a workforce. Leaders track:

Task coverage
Productivity gains
Reliability
Business outcomes

AI becomes governable, measurable, and scalable. The Work Operating System is how Reejig makes this operational. Map. Analyze. Build. Run. Measure. Log. Update. That's Reejig.

Why task-level thinking matters: lessons from the MIT Iceberg Index

Recent research from MIT illustrates why this shift matters.

Researchers created what they call the Iceberg Index. It maps the capabilities of 13,000+ AI systems against 32,000 human skills across 151 million U.S. workers.

Their finding: AI systems could technically perform tasks representing roughly 11.7% of total wage value. That is about $1.2 trillion.

Predictably, headlines declared that AI will replace 11.7% of jobs. That is not what the research says.

The index measures task exposure. Not job displacement.

Adoption timelines, integration challenges, governance requirements, and organizational readiness determine what happens in practice.

The real insight: workforce disruption happens at the level of tasks. Not jobs.

If leaders want to manage that disruption responsibly, they must first understand the tasks themselves. Work Architecture makes those tasks visible and structured. From Job Architecture to Work Architecture.

Why "time in skills" is the wrong metric for the AI era

Another misconception in workforce analytics: the reliance on time as a proxy for skill or expertise.

In practice, time has always been a weak signal. Across decades of enterprise learning and workforce systems, time has repeatedly proven misleading:

Time on a resume does not equal expertise
Skill half-life does not mean old skills lose all value
Learning time does not predict capability development

Now a new misuse has emerged. Some analyses compare AI training time with human skill development time. They use this to estimate workforce disruption.

This comparison is flawed.

AI capability is determined by task suitability. Not how long a system took to train. The relevant question is not how long it takes AI to learn. The question is what work it can reliably perform.

The leadership imperative: define the job of an AI agent

For CIOs and CHROs, the most practical governance question is surprisingly simple.

What job does the AI agent actually do?

If leaders cannot clearly answer that question, they cannot:

Measure its impact
Govern its behavior
Evaluate risk
Scale it responsibly

The same principles used to manage human workers apply to AI systems. That means defining:

The tasks the agent performs
The conditions in which it operates
The outputs it must produce
The metrics used to evaluate performance

Without this clarity, AI experimentation will remain exactly that. Experiments. Work Intelligence gives leaders the task-level data to define agent jobs with precision.

A simple checklist for responsible AI experimentation

Leaders apply a straightforward framework before launching AI initiatives:

Map the work: Understand tasks, activities, and workflows
Define the hypothesis: What improvement do you expect to see?
Identify candidate tasks for AI: Focus on repeatable, measurable work
Run controlled experiments: Introduce AI agents with clear success criteria
Measure outcomes: Track productivity, cost, quality, and risk
Evaluate repeatability: Can the results be reproduced at scale?
Scale with governance: Treat AI agents as part of the workforce

The real alchemy of AI

The ancient alchemists spent centuries trying to turn base metals into gold. They never succeeded.

But the analogy holds. AI is not magic.

Turning experimentation into value requires method, structure, and discipline.

Organizations that begin with the science of work itself experiment faster. They measure impact earlier. They scale AI responsibly.

Those that skip that step will continue chasing the illusion. Deploying AI systems alone will not transform the enterprise.

Because in the end, AI does not transform organizations. Work does.

Book a demo to see how the Work Operating System turns AI experimentation into measurable ROI.

View full post