Successful AI projects make your team more capable, not redundant
The Conductor Painting by September McGee | Saatchi Art

Introduction

In February 2024, Air Canada’s chatbot told passenger Jake Moffatt he could book a full-price ticket to his grandmother’s funeral and claim a bereavement discount later. When Moffatt tried to get his refund, the airline said no and blamed the bot. Their defense? The chatbot was “a separate legal entity” responsible for its own mistakes. The tribunal didn’t buy it, ordered Air Canada to pay $812, and the story went viral.

Air Canada’s real mistake wasn’t the technology. It was treating GenAI as a customer service replacement instead of a tool that needs human oversight. That distinction matters more than most organizations realize.

GenAI projects succeed when humans stay in the loop, not when AI faces customers alone.

This insight changes how you phase your project, measure quality, train users, and build your team. Get it right and you join the 30% of implementations that succeed. Get it wrong and you become one of the 95% of generative AI pilots at companies are failing (August 2025 MIT report).


What makes human-in-the-loop projects different

Traditional software projects have endpoints. You gather requirements, build features, test functionality, ship the product. Users learn the system. It behaves predictably.

GenAI projects with humans in the loop work differently. You’re not deploying a static tool, you’re creating a system where AI and humans work together. McKinsey’s State of AI research found that only 27% of organizations review all AI-generated content before it reaches end users. That gap explains a lot of high-profile failures.

The differences cascade through every project decision:

Aspect Traditional non AI Projects Human-in-the-Loop AI Projects
Phasing Fixed phases: requirements, build, test, deploy Continuous refinement loops
Success Features working as specified Human + AI performance together
Training Happens once at deployment Ongoing as capabilities evolve
Team IT + developers IT + developers + model experts + power users
Workflows Stay mostly unchanged Adapt as the organization learns and AI capabilities expand

These aren’t minor variations. They require different approaches to five critical areas.


1. Project phasing: embracing continuous refinement

Traditional projects follow waterfall or agile sprints toward a finish line. Human-in-the-loop AI projects never truly “finish.” The model evolves, the technology improves, and the human’s understanding deepens all at the same time.

Why this matters: Your initial deployment is your first learning cycle, not your final delivery. The human figures out what the AI does well, where it stumbles, and how to work with it effectively. Meanwhile, the model learns from feedback. Traditional project plans don’t have room for this refinement loop.

How to structure it:

Real example: Octopus Energy’s customer service team uses Kraken Tech’s “Magic Ink” AI to draft responses. TechUK’s case study shows agents initially reviewed every AI message carefully. Over time they learned about one-third needed minimal changes. The system has now generated over 9.4 million messages with a 70% customer satisfaction rating, which is higher than messages written without AI. The workflow evolved naturally because the project plan expected it to.

Model technology advances every few months. Each release brings new capabilities and potential changes to how your system performs. Your project approach must account for this.


2. Ongoing iterations: workflows that change

In traditional projects you define the process, analyze the workflow, then automate it. With human-in-the-loop AI, the process itself changes as new capabilities emerge.

Why this matters: What was impossible six months ago becomes routine. Tasks you did sequentially might run in parallel. Steps you thought essential might vanish. You discover these changes by using the system in production, not during requirements gathering.

How to adapt:

Concrete example: Technical support teams following this pattern initially use AI to suggest troubleshooting steps while humans validate each one. Over time they discover AI can handle routine diagnostics automatically, letting humans focus on edge cases. Fundrise automated nearly 60% of IT support tasks this way. Hazel Health saw ticket deflection rates jump from 3-5% to over 20%. These workflow changes emerged from usage patterns, not initial design.

The key difference: your requirements document is a starting hypothesis, not a final specification.


3. Quality assurance: measuring human plus AI performance

Traditional QA measures whether software works correctly. Human-in-the-loop QA must measure whether the combined human-AI system delivers better outcomes than either alone.

Why traditional metrics fail: You can have perfect AI accuracy (technically) while the system fails (practically) if humans can’t use it effectively. Or a less accurate model that gives humans the right context might deliver better results.

What to measure instead:

Metric Category What to Track
AI performance Accuracy, confidence scores, error rates
Human efficiency Time saved per task, volume handled, cognitive load
Combined impact Overall timing, cost savings, quality improvement
Trust indicators Override frequency, feedback submission, adoption rate

Create validation sets from real interactions to test each model update. When OpenAI or Anthropic release a new model or you switch providers, run your validators again. Make sure the new technology maintains or improves performance for your specific use case.

Critical insight: Don’t just measure if the AI is “better.” Measure if the human working with the AI produces better outcomes than before. That’s your real success metric.


4. Training: preparing humans to guide AI

Traditional software training teaches users how to operate features. Human-in-the-loop training must teach users how to collaborate with a system that will make brilliant suggestions and confusing mistakes.

What users need to understand:

Why this differs from traditional training: Users aren’t just learning a tool. They’re learning to be part of a collaborative system. They need mental models for how AI works (high-level, not technical), not button-clicking instructions.

Practical approach:

Mindset shift: Frame the system as “AI assisting humans” not “AI replacing humans with humans checking.” This distinction affects adoption dramatically.


5. Team composition: developing model experts internally

Traditional software projects need IT people: analysts, project managers, developers. Human-in-the-loop AI projects need model experts who understand your specific use case.

Why you can’t just hire this expertise: General AI knowledge specialists don’t understand fully how the AI system performs on your customer service emails. That expertise comes from working with your specific data and workflows. These are:

How to develop them:

Strategic benefit: Developing internal model experts does two things at once. It improves your AI system’s performance and empowers employees with new skills. This is key for the change management.

Important note: This doesn’t replace your IT team or developers. It complements them. You need developers to build the system and model experts to guide how it learns and adapts.


To conclude

Managing a GenAI project with humans in the loop means managing a living system, not deploying static software.

The project manager must orchestrate ongoing collaboration, learning, and adaptation between humans and machines across five dimensions:

These aren’t exotic requirements. They’re adaptations of proven project management principles for technology that learns and changes. The PM fundamentals remain: clear objectives, stakeholder engagement, iterative testing, budget control, change management. The difference is recognizing that your “product” isn’t the AI alone. It’s the collaborative system of humans working with AI to achieve outcomes neither could accomplish independently.

A word on project economics: Human-in-the-loop projects require continuous refinement, ongoing training, and dedicated model experts. That means higher initial investment and ongoing costs (both human and token). Your target benefits need to be substantially higher too. Don’t pursue GenAI projects aiming for 7% cost reductions. Look for opportunities where you can realistically save 30% or more of time, cost, or effort. The Octopus Energy and Fundrise examples above demonstrate this scale of impact. We’ll explore the economics and ROI calculations for GenAI projects in more depth in a future post.

This is why Eanis emphasizes human-in-the-loop from the start. Employees ask questions, the AI provides grounded answers with confidence scores, and low-confidence responses get routed to a human. Managers have dashboards and get weekly digests showing what’s working and where gaps remain. The system assumes humans remain essential, not temporary.

Ready to build AI systems designed for human collaboration? Start by identifying one workflow where AI assistance (not replacement) could help your team. Involve the humans who’ll use it from day one. Plan for continuous refinement, not one-time deployment.

The technology is powerful. Your implementation approach should keep humans where they belong: in the loop, in control, and continuously learning alongside the AI.


References


Learn more about Eanis - See how it works - Schedule a call