Reejig Blog

The alchemy of AI: why most AI experiments fail to deliver ROI

Written by Gordon Ritchie | Mar 4, 2026 7:30:26 AM

If AI worked like King Midas, every experiment would turn to gold.

In reality, only about 5% of AI experiments produce measurable ROI according to McKinsey.
Organizations are experimenting aggressively, but few are converting those experiments into bottom-line impact.

The real question for CIOs and CHROs is not whether to experiment with AI.
It is how to turn experimentation into economic value.

Why most AI experiments fail to deliver ROI

AI experiments fail when they are disconnected from the work actually happening inside the enterprise.

Across organizations we see the same pattern:

  • teams experiment with new AI tools
  • pilots run in isolated environments
  • results look promising but rarely scale
  • ROI remains unclear

The core problem is that most experiments are not grounded in a clear understanding of the work itself.

Instead, organizations rely on outdated artifacts like job descriptions designed for hiring, not for understanding how work actually happens.

The scientific method for AI experimentation

AI experimentation should follow the same discipline as scientific research.

That means:

  1. Define the current environment.
  2. Form a hypothesis.
  3. Control variables and inputs.
  4. Observe outcomes.
  5. Measure results.
  6. Refine and repeat.

Breakthroughs like penicillin, Post-it notes, and Velcro all emerged through this process.

AI experimentation should follow the same rubric.

Without this rigor, AI experimentation becomes something else entirely.

It becomes playing with technology rather than deploying it responsibly.

The missing ingredient: understanding work at the task level

Organizations need a clear, structured understanding of how work actually happens before deploying AI agents.

Leading companies are starting with work ontology.

A work ontology defines the metadata of work in natural language so that both humans and AI systems can understand it.

This means describing work at the level where automation actually occurs:

Traditional workforce view

AI-ready workforce view

jobs

tasks

roles

work activities

job descriptions

work metadata

headcount planning

task coverage and productivity

 

When organizations map work this way, they can:

  • Identify where AI can realistically contribute
  • Estimate potential productivity gains
  • Define measurable outcomes for experimentation
  • Track AI performance like any other part of the workforce

This creates the environment for meaningful experimentation.

From AI curiosity to confident investment

When AI experimentation is grounded in work data, leaders can make investment decisions with measurable outcomes.

Instead of asking:

  • “What can this AI tool do?”

Leaders can ask:

  • “Which tasks could this agent perform?”
  • “What percentage of the work does that represent?”
  • “What productivity or cost impact would that create?”

This shift enables organizations to manage AI like a workforce.

Leaders can track:

  • Task coverage
  • Productivity gains
  • Reliability
  • Business outcomes

In other words, AI becomes governable, measurable, and scalable.

Why task-level thinking matters: lessons from the MIT “Iceberg Index”

Recent research from MIT illustrates why this shift matters.

Researchers created what they call the Iceberg Index, mapping the capabilities of 13,000+ AI tools against 32,000 human skills across 151 million U.S. workers.

Their finding: AI systems could technically perform tasks representing roughly 11.7% of total wage value (about $1.2 trillion).

Predictably, headlines quickly declared that AI will replace 11.7% of jobs.

That is not what the research says.

The index measures task exposure, not job displacement.

Adoption timelines, integration challenges, governance requirements, and organizational readiness will ultimately determine what happens in practice.

The real insight is this:

Workforce disruption happens at the level of tasks, not jobs.

And if leaders want to manage that disruption responsibly, they must first understand the tasks themselves.

Why “time in skills” is the wrong metric for the AI era

Another misconception in workforce analytics is the reliance on time as a proxy for skill or expertise.

In practice, time has always been a weak signal.

Across decades of enterprise learning and workforce systems, time has repeatedly proven misleading:

  • Time on a resume does not equal expertise.
  • Skill half-life does not mean old skills lose all value.
  • Learning time does not predict capability development.

Now a new misuse has emerged.

Some analyses compare AI training time with human skill development time as a way to estimate workforce disruption.

This comparison is flawed.

AI capability is determined by task suitability, not how long a system took to train.

The relevant question is not how long it takes AI to learn.

The question is what work it can reliably perform.

The leadership imperative: define the job of an AI agent

For CIOs and CHROs, the most practical governance question is surprisingly simple.

What job does the AI agent actually do?

If leaders cannot clearly answer that question, they cannot:

  • Measure its impact
  • Govern its behavior
  • Evaluate risk
  • Scale it responsibly

The same principles used to manage human workers must apply to AI systems.

That means defining:

  • The tasks the agent performs
  • The conditions in which it operates
  • The outputs it must produce
  • The metrics used to evaluate performance

Without this clarity, AI experimentation will remain exactly that.

Experiments.

A simple checklist for responsible AI experimentation

Leaders can apply a straightforward framework before launching AI initiatives.

  1. Map the work: Understand tasks, activities, and workflows.
  2. Define the hypothesis: What improvement do you expect to see?
  3. Identify candidate tasks for AI: Focus on repeatable, measurable work.
  4. Run controlled experiments: Introduce AI agents with clear success criteria.
  5. Measure outcomes: Track productivity, cost, quality, and risk.
  6. Evaluate repeatability: Can the results be reproduced at scale?
  7. Scale with governance: Treat AI agents as part of the workforce.

Conclusion: the real alchemy of AI

The ancient alchemists spent centuries trying to turn base metals into gold.

They never succeeded.

But the analogy holds.

AI is not magic.

Turning experimentation into value requires method, structure, and discipline.

Organizations that begin with the science of work itself will be able to experiment faster, measure impact earlier, and scale AI responsibly.

Those that skip that step will continue chasing the illusion that deploying AI tools alone will transform the enterprise.

Because in the end, AI does not transform organizations.

Work does.

Join the waitlist to access certified AI workflows for secure, enterprise task-level redesign as soon as they are released.