Training Process
Learn how to train your custom language model with your audio and transcript data.
Training Overview
Training is the process of teaching your custom model to recognize speech patterns, vocabulary, and characteristics specific to your domain. This requires:
- Training Data - Audio files with matching transcripts
- Data Upload - Adding your files to the model
- Training Execution - Running the training process
- Validation - Confirming successful training
Before Training
Data Requirements
Audio Files:
- Format: WAV, MP3, M4A, or other standard audio formats
- Quality: Clear recordings with minimal background noise
- Duration: Ideally 2-20 hours total
- Content: Representative of your target transcriptions
Transcript Files:
- Format: Plain text (.txt)
- Content: Exact transcription of the audio
- Accuracy: Must match audio precisely
- Encoding: UTF-8 recommended
Audio-Transcript Pairing:
- Each audio file must have a corresponding transcript
- Filenames should match (e.g.,
recording1.mp3+recording1.txt) - Content must align exactly
Data Preparation Tips
Audio Quality:
- Use high-quality recordings when possible
- Consistent recording environment
- Clear speech, minimal overlapping
- Similar audio characteristics to your target use case
Transcript Accuracy:
- Transcripts must be highly accurate (95%+ accuracy)
- Include all spoken words
- Proper punctuation and capitalization
- Match the audio exactly, word for word
Data Quantity:
- More data generally = better results
- Minimum: Usually 1-2 hours of audio
- Recommended: 5-10 hours for good results
- Optimal: 10-20+ hours for best results
Accessing Training Interface
- Navigate to Custom Models page
- Click on your model name or Edit icon (pencil)
- Model detail page opens
- Training interface displays
Training Status States
Your model progresses through these states:
1. Not Running (Gray Badge)
What it means:
- Model created but no training started
- No training data uploaded yet
What to do:
- Upload training data (audio + transcripts)
- Prepare to start training
2. Ready to Run (Amber Badge)
What it means:
- Training data has been uploaded
- Model is ready to begin training
- Waiting for you to start the process
What to do:
- Review uploaded data
- Click "Start Training" or similar button
- Training will begin
3. Running (Blue Badge)
What it means:
- Training is actively in progress
- System is processing your data
- Model is learning from your audio and transcripts
What to do:
- Wait for training to complete
- Training typically takes several hours
- Monitor progress if status updates are available
- Do not interrupt the process
Duration:
- Depends on data volume
- Typically 2-12 hours
- Larger datasets take longer
4. Success (Green Badge)
What it means:
- Training completed successfully
- Model is ready to use
- Can be selected for transcription
What to do:
- Model is now ready for production use
- Start using it for transcriptions
- Test accuracy with sample audio
5. Failed (Red Badge)
What it means:
- Training encountered an error
- Model is not ready to use
- Issue needs investigation
What to do:
- Check error messages or logs (if available)
- Review training data quality
- Contact support if needed
- May need to retry training
Uploading Training Data
Via Model Detail Page
- Open your model (click Edit or model name)
- Look for training data upload section
- Upload your files:
- Audio files
- Corresponding transcript files
- System validates and processes files
- Status changes to "Ready to Run" when upload complete
Data Organization
Best Practice:
- Organize files in pairs
- Use clear, matching filenames
- Example:
session01.mp3+session01.txtmeeting_02.wav+meeting_02.txtcall_003.mp3+call_003.txt
Starting Training
Once data is uploaded and status is "Ready to Run":
- Review uploaded data
- Click "Start Training" or "Train Model" button
- Confirm training start (if prompted)
- Status changes to "Running" (blue)
- Training begins processing
Note: You cannot stop training once started. Ensure everything is ready before starting.
During Training
What Happens
The system:
- Processes your audio files
- Analyzes the matching transcripts
- Learns patterns and vocabulary
- Optimizes the model for your domain
- Validates the training results
Monitoring Progress
Status Updates:
- Check the Training Status badge
- May show progress percentage (if available)
- Estimated completion time (if shown)
What You Can Do:
- Monitor status periodically
- Continue other work
- Do not delete the model while training
- Wait for completion
Training Duration
Typical Timelines:
- Small dataset (1-3 hours audio): 2-4 hours training
- Medium dataset (5-10 hours audio): 4-8 hours training
- Large dataset (10-20 hours audio): 8-12+ hours training
Factors Affecting Duration:
- Amount of training data
- Audio file sizes
- System load
- Model complexity
After Training
Successful Training
When status shows "Success" (green):
- Model is ready to use
- Available for selection in transcription
- Will improve accuracy for your domain
Next Steps:
- Test the model with sample audio
- Compare accuracy to base model
- Start using in production transcriptions
Failed Training
If status shows "Failed" (red):
- Check for error messages in model details
- Review training data quality
- Common issues:
- Audio-transcript mismatch
- Poor audio quality
- Insufficient data
- File format issues
Troubleshooting:
- Verify audio and transcript pairs match
- Check file formats are supported
- Ensure transcripts are accurate
- Try with different/better data
- Contact support with error details
Retraining
When to Retrain
Consider retraining your model when:
- Vocabulary has evolved significantly
- New terminology emerged
- Accuracy has decreased
- You have additional quality training data
- Quarterly or bi-annual maintenance
How to Retrain
- Gather new/additional training data
- Open your existing model
- Upload new data
- Start training again
- New training replaces previous version
Note: Retraining overwrites the previous model. Test before deploying to production.
Best Practices
Data Quality
Ensure High Quality:
- Accurate transcripts (95%+ accuracy)
- Clear audio recordings
- Consistent audio quality
- Representative of target use case
Incremental Approach
Start Small, Expand:
- Begin with 2-5 hours of best quality data
- Train and test model
- Evaluate accuracy improvements
- Add more data if needed
- Retrain with expanded dataset
Testing
Validate Training Success:
- Use model on test audio
- Compare to base model results
- Measure accuracy improvement
- Verify domain-specific terms recognized
- Gather user feedback
Maintenance
Regular Updates:
- Review model performance quarterly
- Add new terminology as needed
- Retrain with fresh data periodically
- Keep training data organized and backed up
Troubleshooting
Training Won't Start
Problem: Can't start training despite uploading data
Solutions:
- Verify status is "Ready to Run"
- Check all data uploaded correctly
- Ensure audio-transcript pairs complete
- Try refreshing the page
Training Stuck at "Running"
Problem: Training status stuck, not progressing
Solutions:
- Wait longer (training takes hours)
- Refresh page to check for updates
- Contact support if stuck for 24+ hours
Training Failed
Problem: Status shows "Failed"
Solutions:
- Review error messages
- Check training data quality
- Verify file formats supported
- Ensure audio-transcript matching
- Try with different data
- Contact support with details
Model Not Available After Success
Problem: Training succeeded but model not usable
Solutions:
- Refresh the page
- Check model list
- Verify status is green "Success"
- Contact support if issue persists
Next Steps
After successful training:
- Use in Transcription - Apply your trained model
- Test with sample audio
- Compare accuracy improvements
- Deploy to production use
Training complete! Your custom model is ready to improve transcription accuracy.