Image by starline on Freepik
In May 2024, Ernst & Young fired dozens of U.S. employees during its “EY Ignite Learning Week” after discovering they had watched multiple online training courses simultaneously. Some terminated workers claimed they didn’t realize this violated company policy. Others argued the behavior was consistent with the company’s “culture of multitasking,” pointing out that many employees work with three monitors. The incident sparked a broader conversation about the real value of mandatory training videos when employees are clearly not engaging with them.
This story highlights a uncomfortable truth most companies already know but rarely acknowledge: formal training materials often deliver far less value than we expect. Meanwhile, the sources we dismiss as too informal or messy, like FAQ documents, email threads, and troubleshooting notes, contain the exact knowledge employees actually use every day. This disconnect becomes critical when you’re considering AI-powered tools to support your team, because not all documentation types have the same value for training an intelligent answer system.
This post reveals the surprising hierarchy of which knowledge sources actually work for AI training, based on real implementation experience. You’ll learn why training or webinar videos are the hardest sources to extract value from, why that chaotic FAQ document is worth more than you think, and how to audit what you already have. Most importantly, you’ll discover that you likely already possess 80% of the valuable training material you need, you just haven’t recognized it yet.
Why formal training materials are surprisingly difficult for AI training
When most companies think about training an AI system, they assume their formal training materials are the obvious starting point. After all, these are the official, polished resources designed to teach employees. But here’s the counterintuitive reality: formal training materials are among the hardest sources to use effectively for AI training.
The issue isn’t quality or effort, it’s structural compatibility with how AI answer systems work. Let’s look at what makes formal training so challenging:
They answer questions that weren’t asked. Training materials are organized by topic (“Module 3: Customer Returns”), not by the specific questions employees actually have (“What do I do when a customer wants to return something they bought 65 days ago?”). AI systems work best when they can match a real question to a specific answer, but training content forces the AI to infer connections between abstract topics and concrete questions.
They lack the context of real problems. A training video explains “how our refund process works” in theory, presenting the ideal scenario. It rarely covers the messy edge cases that employees encounter daily: the customer who lost their receipt, the item that’s been discontinued, the refund that crosses fiscal year boundaries. Those edge cases are exactly what employees need help with, and they’re missing from formal training.
PowerPoints without speaker notes contain almost no useful information. Slides with bullet points like “Customer Communication Best Practices” or “Key Performance Indicators” provide no actual information without the context of what the presenter said. For AI training, these are essentially empty documents.
Video and audio require extensive preprocessing. Even after transcription, spoken language contains fillers, repetitions, incomplete sentences, and references to visual content (“as you see here…”) that make it harder for AI to extract clean, actionable information. Without semantic chunking, timestamp alignment, and manual cleanup, video transcripts underperform compared to written sources.
They’re optimized for learning, not for reference. Training materials are designed to be consumed sequentially to build understanding over time. But employees don’t need to relearn concepts, they need quick answers to specific questions. The structure that works for teaching (build context, introduce concepts, provide examples, test understanding) doesn’t work well for retrieval (give me the answer to this exact question right now).
`The uncomfortable truth: your formal training materials were designed for a different purpose. They teach general concepts, while AI systems need specific answers to specific questions.
Why processing video to feed a GenAI model is often more difficult than written sources
Video content presents specific technical challenges to train your AI model that make it the most difficult format to work with, not the easiest.
Informal or unstructured language: Training videos and webinars contain spoken language, which is less concise than written documentation. Transcripts are filled with verbal tics, repetitions, and incomplete sentences. In general, this reduces the quality of information unless the content is carefully cleaned up.
Context fragmentation: Transcripts may lack visual or slide context. When a presenter says “As you see here…” without describing what’s on screen, that reference becomes meaningless. You need to extract text from slides and on-screen content and somehow connect it to the right moment in the transcript.
Topic drift: In long videos and recorded webinars, topics shift gradually without clear boundaries. Without good segmentation, relevant information may be scattered across timestamps. You need time-based or semantic chunking rather than simple fixed-size chunks.
Noise and accuracy of transcripts: Even good speech-to-text models introduce small errors that can degrade information retrieval. Cleanup steps like adding punctuation, fixing casing, and removing filler words require extra effort.
What works
If you do decide to use video content, here’s what actually makes it effective:
- Combine transcript, slide text, and metadata (titles, modules, timestamps) into one retrievable document per segment
- Use semantic chunking to split the transcript where topics change, not every X tokens
- Optionally embed timestamps so the AI can reference “at 3:42 in the video” when answering
- Add human-readable context like module titles (“Module 2: Handling Customer Objections”) at the start of each section before processing
If you preprocess transcripts well with semantic segmentation, cleanup, and slide context, video content can perform almost as well as written documents for retrieval-based question answering. Sometimes it even performs better, especially when speakers explain things more naturally. But this requires significant extra work compared to simply using written documentation.
The value hierarchy that actually works
Not all information sources are created equal. Based on real implementations, here’s what actually delivers value for AI training, ranked by ease of extraction and usefulness:
| Tier | Source type | Why it works/doesn’t work |
|---|---|---|
| Tier 1: Easiest & Most Valuable | FAQs, email threads with specific answers, troubleshooting documents, chat transcripts solving problems | Contains real questions from real people with specific answers. Uses natural language that matches how people actually ask questions. Already in Q&A format. Gets updated organically when processes change. |
| Tier 2: Moderate Difficulty, Good Value | SOPs, internal wikis, process documents, recorded meeting transcripts, customer service scripts | Structured information with clear procedures. Usually up-to-date because teams rely on them. Contains specific steps rather than general concepts. |
| Tier 3: More Difficult, Mixed Value | PDFs with complex formatting, company handbooks, policy documents | May contain valuable information but requires more processing. Often includes outdated sections. General policies rather than specific how-to guidance. |
| Tier 4: Most Difficult, Lowest Value | PowerPoints without notes, formal training videos, recorded webinars, theoretical training materials, company websites, product databases | Too general, lacks specific answers, expensive to process, often contains marketing language rather than operational knowledge. Requires significant preprocessing. |
Why this hierarchy exists
The pattern is clear: sources that answer specific questions with specific solutions are most valuable. FAQs and email threads win because they’re created in response to actual problems. Someone asked “How do I handle a customer who wants to return a product after 60 days?” and someone else answered with the exact process, including the edge cases and workarounds.
These informal sources use natural language that matches how people actually ask questions. When a new employee asks the AI “What do I do if a customer’s discount code isn’t working?”, that question will closely match the language in email threads where customer service reps discussed the same issue. The AI can retrieve and provide the exact answer that worked before.
Formal training materials, by contrast, tend to use abstract language: “Promotional code troubleshooting procedures” instead of “discount code isn’t working.” They describe ideal scenarios instead of the messy reality. They explain general principles instead of specific solutions.
What about diagrams?
A common question: don’t I need to process diagrams and flowcharts? The short answer is that most content can be extracted from text sources, so you can get quite far without processing every diagram.
Important caveat: Some diagrams (like flowcharts) contain unique information not found elsewhere. These should be converted to text descriptions or tables. For example, a decision flowchart for “Should I escalate this customer complaint?” can be converted to:
If complaint involves safety issue → Escalate immediately to safety team
If complaint involves billing dispute < $500 → Follow standard refund process
If complaint involves billing dispute > $500 → Escalate to finance manager
If complaint involves product defect → Document in defect tracking system and offer replacement
This table format is actually more useful for AI training than the flowchart image, and it’s searchable and editable.
The surprising power of audio transcripts
Getting transcripts from recorded meetings, training sessions, or even voice memos is surprisingly effective with modern AI transcription tools. These transcripts often contain gold because people explain things more naturally when speaking than when writing formal documentation.
A manager explaining to a new employee “Here’s what I do when a vendor misses a delivery deadline” in a recorded onboarding call will use concrete examples, mention specific vendors, and include the unwritten rules that never make it into the official SOP. That informal explanation is often more valuable than the formal “Vendor Management Process” document.
Beware, also here the key is semantic chunking and adding context, as described in the video processing section above. Without preprocessing, raw transcripts can be too messy. With proper cleanup, they become Tier 1 or Tier 2 sources.
The secret sauce: values and culture documents
Beyond operational knowledge, there’s an advanced technique for making AI responses feel authentic to your company: training on your values statements, mission documents, and cultural guidelines.
These documents help the AI respond in your company’s voice and approach problems the way your company would. A retail company that values “customer delight above all” will want different AI responses than a financial services company that values “precision and compliance above all.”
For example, when asked “Should I give this customer a refund?”, an AI trained on customer-first values might respond: “Yes, process the refund and include a discount code for their next purchase to rebuild trust.” An AI trained on process-first values might respond: “Check if the return meets the criteria in section 3.2 of the refund policy before proceeding.”
Important caveat: This only works if your mission statement and values are actually followed. If you “invited” your mission statement because you “needed” one for the website but nobody references it in daily work, it won’t improve your AI’s responses. It might even make responses feel inauthentic.
Position this as a bonus technique. While not essential for basic functionality, these documents transform generic AI responses into responses that feel like they come from your best employee. It’s the difference between technically correct answers and answers that feel right for your company culture.
Your information audit framework
Before you create any new documentation, audit what you have. Here’s a simple framework you can use immediately:
Step 1: Inventory your sources
Create a list of all information sources you currently have:
- FAQ documents (formal and informal)
- Email threads in shared inboxes or team folders
- Internal wiki pages or knowledge base articles
- Standard operating procedures and process documents
- Training videos and recorded presentations
- Recorded webinars
- PowerPoint decks and training slides
- Meeting transcripts or recordings
- Chat logs (Slack, Teams, WhatsApp) where problems are solved
- Customer service scripts and canned responses
- Product documentation and technical specs
Step 2: Estimate volume and accessibility
For each source type, note:
- Volume: How much of this type do you have? (e.g., “50 FAQ items,” “3 months of support emails”)
- Accessibility: Can you easily export or access it? Is it in a searchable format?
- Currency: How recently was it updated? Is it still accurate?
Step 3: Rate value based on the hierarchy
Categorize each source as Tier 1, 2, 3, or 4 based on the value hierarchy above. Focus your attention on Tier 1 and 2 sources first.
Step 4: Identify red flags
Watch for sources that look valuable but aren’t:
- Training materials that employees don’t actually use (check completion rates or ask your team)
- Documentation that hasn’t been updated in over a year (likely contains outdated information)
- PowerPoints that are just bullet points without context or speaker notes
- Videos without clear structure (expensive to process, low return)
- Marketing materials masquerading as training content (focuses on selling, not doing)
Step 5: Identify quick wins
Look for sources you can use immediately without preprocessing:
- Existing FAQ documents, even if they’re messy
- Email threads that have been saved or forwarded as examples
- Wiki pages that your team actually references
- Process documents that are actively maintained
- Troubleshooting guides created by senior team members
What “enough to start” looks like
You can start training an AI system with less than you think. Here’s a realistic minimum viable content set:
- 50-100 FAQ items covering common questions in one role or department
- 3 months of support emails or chat logs showing how problems were solved
- 5-10 key SOPs for the most common processes
- Optionally: Your company values document and mission statement if they’re actually used
`You don’t need to wait until you have “everything” to begin. In fact, starting small helps you discover what types of content your AI actually needs.
According to research, employees spend 1.8 hours every day searching and gathering information, which means almost 25% of an employee’s working day simply vanishes. The goal isn’t perfect documentation, it’s making the knowledge you already have accessible when people need it.
The ROI reality check
Let’s talk numbers. The cost difference between organizing existing sources and creating new formal training is dramatic.
Time cost of creating new training materials
Even if you create training content internally rather than hiring professionals, the time investment is significant:
- Recording and editing training videos: 4-8 hours of work per finished minute of video (includes scripting, recording, editing, and revisions)
- Creating comprehensive training documentation from scratch: 20-40 hours for a complete role-specific training guide
- Ongoing maintenance: Every time your process changes, you need to update videos, documentation, and training materials. For a growing SME, this can mean quarterly updates across multiple documents
For a small team, this time cost is real money. If a manager earning $40/hour spends 30 hours creating training videos, that’s $1,200 in direct labor cost, plus the opportunity cost of what else they could have accomplished.
Cost of organizing existing sources
- Exporting FAQs and emails: Essentially free, just requires time to organize
- Cleaning up existing documentation: Mostly internal time, typically 10-20 hours for a comprehensive audit
- Initial AI training setup: One-time setup to upload and structure your documents
What employees actually use
Research shows that employees learn 70% of their skills on the job and just 10% through formal training. Similarly, up to 70% of what employees know about their jobs they learn informally from the people they work with, while formal training programs account for only 10%.
According to a Gartner survey, 47% of digital workers struggle to find information or data needed to effectively perform their jobs. The problem isn’t a lack of training materials, it’s that existing knowledge isn’t accessible or searchable.
The real ROI calculation
Consider this scenario:
You have 20 employees who spend an average of 1.8 hours per day searching for information. At an average hourly rate of $25, that’s $45 per employee per day, or $900 per day across your team. Over a year, that’s approximately $234,000 in lost productivity.
If organizing your existing FAQs, emails, and documentation into an AI-powered answer system reduces that search time by even 50%, you’ve saved $117,000 annually. Compare that to spending $50,000 on new training videos that achieve a 10% success rate and don’t address specific daily questions.
`Organizing existing informal sources delivers 10x the value of creating new formal training materials, at a fraction of the cost.
Start with what works
Here’s the core message you need to remember: most likely, you already have valuable training content in your FAQs and emails. Stop feeling guilty about not having “proper” training materials. In general, what you have is actually more valuable for AI training than expensive formal content.
Your messy documentation isn’t a liability, it’s an asset. Those email threads where your senior customer service rep explained how to handle a tricky refund situation? That’s gold. The FAQ document that five different people have edited over the years with increasingly specific edge cases? That’s exactly what an AI needs to provide helpful answers.
Immediate action steps
-
Start by exporting your FAQ document or support ticket history. Even if it’s disorganized, it contains real questions and real answers.
-
Collect three months of “how do I…” emails. Search your inbox or shared team folders for threads where someone asked a question and someone else provided a solution.
-
Don’t wait for perfect documentation. Perfect is the enemy of done. Start with what you have, test it, and improve it based on what your team actually asks.
-
Focus on organizing, restructuring and making accessible what exists, not creating new content. Your biggest ROI comes from surfacing existing knowledge, not filming new videos.
How Eanis makes this easy
This is exactly where Eanis delivers its value. The system is designed to train on informal but valuable sources, the messy documentation you already have. No complex integrations needed, just upload your FAQs, emails, and existing documents.
Eanis handles the rest, turning your “messy” documentation into instant, accurate answers for your team:
- Grounded answers with source citations: Every answer includes the source document, so your team can verify and learn the context
- Confidence meters: The system indicates how confident it is in each answer, building trust with your team
- Low-confidence routing to humans: When the AI doesn’t have a good answer, it routes the question to a person who does, capturing new knowledge in the process
- Weekly digest for managers: See coverage, gaps, and time-to-first-answer, so you know where documentation needs improvement
You’re not building a fancy, new and all-encompassing solution. You’re making the knowledge your team already has accessible when they need it. Start with one role, no integrations, just upload and go.
Your FAQ document is worth more than you think. It’s time to treat it that way.
References
- CBS News - Ernst & Young fires workers for taking 2 online training courses at once
- Ninja Tropic - How Much Does an eLearning Training Video Cost to Produce?
- Level Up - The Surprising Truth About Employee Training Completion Rates
- SC Training - 10 Employee training statistics in 2025
- Devlin Peck - Employee Training Statistics, Trends, and Data in 2025
- Cottrill Research - Various Survey Statistics: Workers Spend Too Much Time Searching for Information
- Workleap - Formal vs informal learning: Developing employees though different training methods
- Gartner - Gartner Survey Reveals 47% of Digital Workers Struggle to Find the Information Needed to Effectively Perform Their Jobs
- Harvard Business Review - Why Leadership Training Fails—and What to Do About It