
The Conductor Painting by September McGee | Saatchi Art
Introduction
In February 2024, Air Canada’s chatbot told passenger Jake Moffatt he could book a full-price ticket to his grandmother’s funeral and claim a bereavement discount later. When Moffatt tried to get his refund, the airline said no and blamed the bot. Their defense? The chatbot was “a separate legal entity” responsible for its own mistakes. The tribunal didn’t buy it, ordered Air Canada to pay $812, and the story went viral.
Air Canada’s real mistake wasn’t the technology. It was treating GenAI as a customer service replacement instead of a tool that needs human oversight. That distinction matters more than most organizations realize.
GenAI projects succeed when humans stay in the loop, not when AI faces customers alone.
This insight changes how you phase your project, measure quality, train users, and build your team. Get it right and you join the 30% of implementations that succeed. Get it wrong and you become one of the 95% of generative AI pilots at companies are failing (August 2025 MIT report).
What makes human-in-the-loop projects different
Traditional software projects have endpoints. You gather requirements, build features, test functionality, ship the product. Users learn the system. It behaves predictably.
GenAI projects with humans in the loop work differently. You’re not deploying a static tool, you’re creating a system where AI and humans work together. McKinsey’s State of AI research found that only 27% of organizations review all AI-generated content before it reaches end users. That gap explains a lot of high-profile failures.
The differences cascade through every project decision:
| Aspect | Traditional non AI Projects | Human-in-the-Loop AI Projects |
|---|---|---|
| Phasing | Fixed phases: requirements, build, test, deploy | Continuous refinement loops |
| Success | Features working as specified | Human + AI performance together |
| Training | Happens once at deployment | Ongoing as capabilities evolve |
| Team | IT + developers | IT + developers + model experts + power users |
| Workflows | Stay mostly unchanged | Adapt as the organization learns and AI capabilities expand |
These aren’t minor variations. They require different approaches to five critical areas.
1. Project phasing: embracing continuous refinement
Traditional projects follow waterfall or agile sprints toward a finish line. Human-in-the-loop AI projects never truly “finish.” The model evolves, the technology improves, and the human’s understanding deepens all at the same time.
Why this matters: Your initial deployment is your first learning cycle, not your final delivery. The human figures out what the AI does well, where it stumbles, and how to work with it effectively. Meanwhile, the model learns from feedback. Traditional project plans don’t have room for this refinement loop.
How to structure it:
- Plan 3-month refinement cycles instead of a single deployment date
- Budget for ongoing updates covering model improvements and human feedback
- Expect workflows to shift as humans learn what to delegate and what to keep
- Build feedback mechanisms into the system from day one
Real example: Octopus Energy’s customer service team uses Kraken Tech’s “Magic Ink” AI to draft responses. TechUK’s case study shows agents initially reviewed every AI message carefully. Over time they learned about one-third needed minimal changes. The system has now generated over 9.4 million messages with a 70% customer satisfaction rating, which is higher than messages written without AI. The workflow evolved naturally because the project plan expected it to.
Model technology advances every few months. Each release brings new capabilities and potential changes to how your system performs. Your project approach must account for this.
2. Ongoing iterations: workflows that change
In traditional projects you define the process, analyze the workflow, then automate it. With human-in-the-loop AI, the process itself changes as new capabilities emerge.
Why this matters: What was impossible six months ago becomes routine. Tasks you did sequentially might run in parallel. Steps you thought essential might vanish. You discover these changes by using the system in production, not during requirements gathering.
How to adapt:
- Review workflows quarterly based on actual usage patterns
- Create space for user suggestions on process improvements
- Track override patterns and log which tasks humans consistently change versus accept
- Test new model releases against your current workflows when they ship
Concrete example: Technical support teams following this pattern initially use AI to suggest troubleshooting steps while humans validate each one. Over time they discover AI can handle routine diagnostics automatically, letting humans focus on edge cases. Fundrise automated nearly 60% of IT support tasks this way. Hazel Health saw ticket deflection rates jump from 3-5% to over 20%. These workflow changes emerged from usage patterns, not initial design.
The key difference: your requirements document is a starting hypothesis, not a final specification.
3. Quality assurance: measuring human plus AI performance
Traditional QA measures whether software works correctly. Human-in-the-loop QA must measure whether the combined human-AI system delivers better outcomes than either alone.
Why traditional metrics fail: You can have perfect AI accuracy (technically) while the system fails (practically) if humans can’t use it effectively. Or a less accurate model that gives humans the right context might deliver better results.
What to measure instead:
| Metric Category | What to Track |
|---|---|
| AI performance | Accuracy, confidence scores, error rates |
| Human efficiency | Time saved per task, volume handled, cognitive load |
| Combined impact | Overall timing, cost savings, quality improvement |
| Trust indicators | Override frequency, feedback submission, adoption rate |
Create validation sets from real interactions to test each model update. When OpenAI or Anthropic release a new model or you switch providers, run your validators again. Make sure the new technology maintains or improves performance for your specific use case.
Critical insight: Don’t just measure if the AI is “better.” Measure if the human working with the AI produces better outcomes than before. That’s your real success metric.
4. Training: preparing humans to guide AI
Traditional software training teaches users how to operate features. Human-in-the-loop training must teach users how to collaborate with a system that will make brilliant suggestions and confusing mistakes.
What users need to understand:
- What the AI handles well versus where it struggles
- How to spot AI errors before they cause problems
- When to trust AI output versus when to verify carefully
- How to give feedback that improves the system
- When to escalate issues to a manager or colleague
Why this differs from traditional training: Users aren’t just learning a tool. They’re learning to be part of a collaborative system. They need mental models for how AI works (high-level, not technical), not button-clicking instructions.
Practical approach:
- Start with AI basics covering what it can and can’t do
- Use real examples of both successes and failures from your system
- Create quick reference guides for recognizing common AI mistakes
- Build a user team to share tips for effective collaboration
- Emphasize humans remain essential, not temporary placeholders
Mindset shift: Frame the system as “AI assisting humans” not “AI replacing humans with humans checking.” This distinction affects adoption dramatically.
5. Team composition: developing model experts internally
Traditional software projects need IT people: analysts, project managers, developers. Human-in-the-loop AI projects need model experts who understand your specific use case.
Why you can’t just hire this expertise: General AI knowledge specialists don’t understand fully how the AI system performs on your customer service emails. That expertise comes from working with your specific data and workflows. These are:
- Power users who deeply understand both the workflow and AI behavior
- Not necessarily technical, but curious and analytical
- Pattern spotters who see where AI succeeds or fails
- Comfortable giving structured feedback to improve the system
How to develop them:
- Identify early adopters during your pilot phase
- Give them access to AI performance metrics
- Create a direct feedback channel to the project team
- Recognize their contributions publicly
- Gradually expand their role in training others
Strategic benefit: Developing internal model experts does two things at once. It improves your AI system’s performance and empowers employees with new skills. This is key for the change management.
Important note: This doesn’t replace your IT team or developers. It complements them. You need developers to build the system and model experts to guide how it learns and adapts.
To conclude
Managing a GenAI project with humans in the loop means managing a living system, not deploying static software.
The project manager must orchestrate ongoing collaboration, learning, and adaptation between humans and machines across five dimensions:
- Phasing: Continuous refinement cycles instead of fixed deployments
- Workflows: Iterative improvement as capabilities expand
- Quality: Measuring combined human + AI performance
- Training: Building collaboration skills, not just tool operation
- Team: Developing model experts alongside technical staff
These aren’t exotic requirements. They’re adaptations of proven project management principles for technology that learns and changes. The PM fundamentals remain: clear objectives, stakeholder engagement, iterative testing, budget control, change management. The difference is recognizing that your “product” isn’t the AI alone. It’s the collaborative system of humans working with AI to achieve outcomes neither could accomplish independently.
A word on project economics: Human-in-the-loop projects require continuous refinement, ongoing training, and dedicated model experts. That means higher initial investment and ongoing costs (both human and token). Your target benefits need to be substantially higher too. Don’t pursue GenAI projects aiming for 7% cost reductions. Look for opportunities where you can realistically save 30% or more of time, cost, or effort. The Octopus Energy and Fundrise examples above demonstrate this scale of impact. We’ll explore the economics and ROI calculations for GenAI projects in more depth in a future post.
This is why Eanis emphasizes human-in-the-loop from the start. Employees ask questions, the AI provides grounded answers with confidence scores, and low-confidence responses get routed to a human. Managers have dashboards and get weekly digests showing what’s working and where gaps remain. The system assumes humans remain essential, not temporary.
Ready to build AI systems designed for human collaboration? Start by identifying one workflow where AI assistance (not replacement) could help your team. Involve the humans who’ll use it from day one. Plan for continuous refinement, not one-time deployment.
The technology is powerful. Your implementation approach should keep humans where they belong: in the loop, in control, and continuously learning alongside the AI.
References
- Air Canada Chatbot Lawsuit - CBC News
- MIT Report: 95% of Generative AI Pilots Failing - Fortune
- The State of AI: How Organizations Are Rewiring to Capture Value - McKinsey
- Case Study: Kraken Tech’s Generative AI Tool for Customer Service - TechUK
- 10 Best AI IT Help Desk Software in 2025 - Risotto