Successful AI projects make your team more capable, not redundant

Successful AI projects make your team more capable, not redundant
The Conductor Painting by September McGee | Saatchi Art

Introduction

In February 2024, Air Canada’s chatbot told passenger Jake Moffatt he could book a full-price ticket to his grandmother’s funeral and claim a bereavement discount later. When Moffatt tried to get his refund, the airline said no and blamed the bot. Their defense? The chatbot was “a separate legal entity” responsible for its own mistakes. The tribunal didn’t buy it, ordered Air Canada to pay $812, and the story went viral.

Air Canada’s real mistake wasn’t the technology. It was treating GenAI as a customer service replacement instead of a tool that needs human oversight. That distinction matters more than most organizations realize.

GenAI projects succeed when humans stay in the loop, not when AI faces customers alone.

This insight changes how you phase your project, measure quality, train users, and build your team. Get it right and you join the 30% of implementations that succeed. Get it wrong and you become one of the 95% of generative AI pilots at companies are failing (August 2025 MIT report).

What makes human-in-the-loop projects different

Traditional software projects have endpoints. You gather requirements, build features, test functionality, ship the product. Users learn the system. It behaves predictably.

GenAI projects with humans in the loop work differently. You’re not deploying a static tool, you’re creating a system where AI and humans work together. McKinsey’s State of AI research found that only 27% of organizations review all AI-generated content before it reaches end users. That gap explains a lot of high-profile failures.

The differences cascade through every project decision:

Aspect	Traditional non AI Projects	Human-in-the-Loop AI Projects
Phasing	Fixed phases: requirements, build, test, deploy	Continuous refinement loops
Success	Features working as specified	Human + AI performance together
Training	Happens once at deployment	Ongoing as capabilities evolve
Team	IT + developers	IT + developers + model experts + power users
Workflows	Stay mostly unchanged	Adapt as the organization learns and AI capabilities expand

These aren’t minor variations. They require different approaches to five critical areas.

Traditional projects follow waterfall or agile sprints toward a finish line. Human-in-the-loop AI projects never truly “finish.” The model evolves, the technology improves, and the human’s understanding deepens all at the same time.

Why this matters: Your initial deployment is your first learning cycle, not your final delivery. The human figures out what the AI does well, where it stumbles, and how to work with it effectively. Meanwhile, the model learns from feedback. Traditional project plans don’t have room for this refinement loop.

How to structure it:

Plan 3-month refinement cycles instead of a single deployment date
Budget for ongoing updates covering model improvements and human feedback
Expect workflows to shift as humans learn what to delegate and what to keep
Build feedback mechanisms into the system from day one

Real example: Octopus Energy’s customer service team uses Kraken Tech’s “Magic Ink” AI to draft responses. TechUK’s case study shows agents initially reviewed every AI message carefully. Over time they learned about one-third needed minimal changes. The system has now generated over 9.4 million messages with a 70% customer satisfaction rating, which is higher than messages written without AI. The workflow evolved naturally because the project plan expected it to.

Model technology advances every few months. Each release brings new capabilities and potential changes to how your system performs. Your project approach must account for this.

2. Ongoing iterations: workflows that change

In traditional projects you define the process, analyze the workflow, then automate it. With human-in-the-loop AI, the process itself changes as new capabilities emerge.

Why this matters: What was impossible six months ago becomes routine. Tasks you did sequentially might run in parallel. Steps you thought essential might vanish. You discover these changes by using the system in production, not during requirements gathering.

How to adapt:

Review workflows quarterly based on actual usage patterns
Create space for user suggestions on process improvements
Track override patterns and log which tasks humans consistently change versus accept
Test new model releases against your current workflows when they ship

Concrete example: Technical support teams following this pattern initially use AI to suggest troubleshooting steps while humans validate each one. Over time they discover AI can handle routine diagnostics automatically, letting humans focus on edge cases. Fundrise automated nearly 60% of IT support tasks this way. Hazel Health saw ticket deflection rates jump from 3-5% to over 20%. These workflow changes emerged from usage patterns, not initial design.

The key difference: your requirements document is a starting hypothesis, not a final specification.

3. Quality assurance: measuring human plus AI performance

Traditional QA measures whether software works correctly. Human-in-the-loop QA must measure whether the combined human-AI system delivers better outcomes than either alone.

Why traditional metrics fail: You can have perfect AI accuracy (technically) while the system fails (practically) if humans can’t use it effectively. Or a less accurate model that gives humans the right context might deliver better results.

What to measure instead:

Metric Category	What to Track
AI performance	Accuracy, confidence scores, error rates
Human efficiency	Time saved per task, volume handled, cognitive load
Combined impact	Overall timing, cost savings, quality improvement
Trust indicators	Override frequency, feedback submission, adoption rate

Create validation sets from real interactions to test each model update. When OpenAI or Anthropic release a new model or you switch providers, run your validators again. Make sure the new technology maintains or improves performance for your specific use case.

Critical insight: Don’t just measure if the AI is “better.” Measure if the human working with the AI produces better outcomes than before. That’s your real success metric.

4. Training: preparing humans to guide AI

Traditional software training teaches users how to operate features. Human-in-the-loop training must teach users how to collaborate with a system that will make brilliant suggestions and confusing mistakes.

What users need to understand:

What the AI handles well versus where it struggles
How to spot AI errors before they cause problems
When to trust AI output versus when to verify carefully
How to give feedback that improves the system
When to escalate issues to a manager or colleague

Why this differs from traditional training: Users aren’t just learning a tool. They’re learning to be part of a collaborative system. They need mental models for how AI works (high-level, not technical), not button-clicking instructions.

Practical approach:

Start with AI basics covering what it can and can’t do
Use real examples of both successes and failures from your system
Create quick reference guides for recognizing common AI mistakes
Build a user team to share tips for effective collaboration
Emphasize humans remain essential, not temporary placeholders

Mindset shift: Frame the system as “AI assisting humans” not “AI replacing humans with humans checking.” This distinction affects adoption dramatically.

5. Team composition: developing model experts internally

Traditional software projects need IT people: analysts, project managers, developers. Human-in-the-loop AI projects need model experts who understand your specific use case.

Why you can’t just hire this expertise: General AI knowledge specialists don’t understand fully how the AI system performs on your customer service emails. That expertise comes from working with your specific data and workflows. These are:

Power users who deeply understand both the workflow and AI behavior
Not necessarily technical, but curious and analytical
Pattern spotters who see where AI succeeds or fails
Comfortable giving structured feedback to improve the system

How to develop them:

Identify early adopters during your pilot phase
Give them access to AI performance metrics
Create a direct feedback channel to the project team
Recognize their contributions publicly
Gradually expand their role in training others

Strategic benefit: Developing internal model experts does two things at once. It improves your AI system’s performance and empowers employees with new skills. This is key for the change management.

Important note: This doesn’t replace your IT team or developers. It complements them. You need developers to build the system and model experts to guide how it learns and adapts.

To conclude

Managing a GenAI project with humans in the loop means managing a living system, not deploying static software.

The project manager must orchestrate ongoing collaboration, learning, and adaptation between humans and machines across five dimensions:

Phasing: Continuous refinement cycles instead of fixed deployments
Workflows: Iterative improvement as capabilities expand
Quality: Measuring combined human + AI performance
Training: Building collaboration skills, not just tool operation
Team: Developing model experts alongside technical staff

These aren’t exotic requirements. They’re adaptations of proven project management principles for technology that learns and changes. The PM fundamentals remain: clear objectives, stakeholder engagement, iterative testing, budget control, change management. The difference is recognizing that your “product” isn’t the AI alone. It’s the collaborative system of humans working with AI to achieve outcomes neither could accomplish independently.

A word on project economics: Human-in-the-loop projects require continuous refinement, ongoing training, and dedicated model experts. That means higher initial investment and ongoing costs (both human and token). Your target benefits need to be substantially higher too. Don’t pursue GenAI projects aiming for 7% cost reductions. Look for opportunities where you can realistically save 30% or more of time, cost, or effort. The Octopus Energy and Fundrise examples above demonstrate this scale of impact. We’ll explore the economics and ROI calculations for GenAI projects in more depth in a future post.

This is why Eanis emphasizes human-in-the-loop from the start. Employees ask questions, the AI provides grounded answers with confidence scores, and low-confidence responses get routed to a human. Managers have dashboards and get weekly digests showing what’s working and where gaps remain. The system assumes humans remain essential, not temporary.

Ready to build AI systems designed for human collaboration? Start by identifying one workflow where AI assistance (not replacement) could help your team. Involve the humans who’ll use it from day one. Plan for continuous refinement, not one-time deployment.

The technology is powerful. Your implementation approach should keep humans where they belong: in the loop, in control, and continuously learning alongside the AI.

References

Learn more about Eanis - See how it works - Schedule a call