What Are Custom Models?
Custom models allow you to train specialized speech recognition models with your own data to improve transcription accuracy for specific terms, names, and vocabulary.
Overview
A custom model is a specialized version of Scriptix's base language models trained with your audio files and transcripts. This training helps the model better recognize:
- Domain-specific terminology
- Product and company names
- Specific vocabulary
- Your organization's unique language patterns
How Custom Models Work
Base Models vs Custom Models
Base Models:
- Pre-trained language models
- Available immediately
- Support standard speech recognition
- Cover common vocabulary
Custom Models:
- Built on top of a base language model
- Trained with your specific audio and transcripts
- Optimized for your vocabulary
- Require training time and data
The Training Process
- Create Model - Name the model and select base language
- Upload Datasets - Provide audio and/or transcript files
- Train Model - System trains on your data
- Use Model - Select trained model when creating transcripts
Training Data Requirements
Audio Files:
- Supported formats: .wav, .mp3, .m4a, .flac
- 1-10 files per upload
- Maximum 10GB per file
Transcript Files:
- Supported formats: .vtt, .srt, .txt
- Must match audio content
- Include accurate text and timestamps
Manifest Files:
- Supported format: .jsonl
- Organize datasets
Force Alignment:
- Use Force Alignment feature to add timestamps to existing transcripts
- Creates properly formatted training data
- Available in workspace STT Session
Training Requirements
To train a custom model you need:
- Datasets uploaded (audio and/or transcripts)
- Training credits available in organization
- Model validation passes
Using Custom Models
After training completes successfully:
- Navigate to workspace (Home page)
- Click "STT Session" to create transcript
- Select language from dropdown
- Select your custom model from model list
- Upload audio file
- Start transcription
The trained model applies to new transcriptions for improved accuracy.
Custom Models vs Glossaries
Both improve transcription accuracy but work differently:
Custom Models
What They Do:
- Train speech recognition with your audio and transcripts
- Learn vocabulary patterns from training data
- Require training time
Best For:
- Large volumes of specialized content
- Comprehensive domain coverage
- Ongoing regular use
Requirements:
- Training data (audio + transcripts)
- Training credits
- Training time
Glossaries
What They Do:
- Define term pairs (source → target)
- Replace terms in transcripts
- Apply during transcription
Best For:
- Specific terms and names
- Quick implementation
- Limited vocabulary
Requirements:
- Term pairs list
- Source and target language
- Immediate availability
Using Together:
- Custom models improve base recognition
- Glossaries handle specific term replacements
- Both can be used in same transcription
Training Status
Models have five training statuses:
- Not Running (gray) - Model created but not trained
- Ready to Run (amber) - Datasets uploaded, ready to train
- Running (blue) - Training in progress
- Success (green) - Training completed successfully
- Failed (red) - Training failed
Only models with "Success" status can be used for transcription.
Model Management
View Models:
- Navigate to Custom Models page
- See all models in list
- View training status
Edit Models:
- Click model name or Edit action
- Update model name
- Upload additional datasets
- View training details
Delete Models:
- Available to ORGADMIN and SYSOP only
- Click three-dot menu on model
- Select Delete
- Confirm deletion
Dataset Management
Upload Datasets:
- Open model details page
- Click "New Dataset"
- Select type (auto-detect, TRANSCRIPT, or AUDIO)
- Upload 1-10 files
- Files added to datasets list
Delete Datasets:
- Click three-dot menu on dataset row
- Select Delete
- Confirm deletion
Force Alignment Integration
Use Force Alignment to prepare training data:
What It Does:
- Adds timestamps to existing transcripts
- Creates properly formatted training files
- Uses audio + plain text transcript
How to Use:
- Navigate to workspace (Home)
- Click "STT Session"
- Select Force Alignment option
- Upload audio file
- Upload or paste transcript text
- Process alignment
- Download timestamped transcript
- Use as training data for custom model
Limitations
Custom Models Cannot:
- Transcribe languages they weren't trained for
- Work without training data
- Be used before training succeeds
- Improve audio quality issues
Training Limitations:
- Requires quality training data
- Takes time to complete
- Needs training credits
- Cannot train if already training
- Cannot train if already succeeded
Access Control
Who Can Create Models:
- All authenticated users can create custom models
- Model creation available on Custom Models page
Who Can Delete Models:
- ORGADMIN role users
- SYSOP role users
- Other roles cannot delete models
Who Can Train Models:
- All users with access to model
- Training requires credits in organization
Getting Started Guide
First-time modal appears on Custom Models page:
Guide Sections:
- What are custom models
- What you need (training data requirements)
- Only have audio files (use Force Alignment)
- After training (how to use trained model)
Access Guide:
- Appears automatically on first visit
- Click "Getting Started" button to reopen
- Stored in browser localStorage
Transform your transcription accuracy! Custom models bring domain expertise to speech recognition.
Next Steps
- When to Use Custom Models - Determine if they're right for you
- Create Custom Model - Step-by-step creation guide
- Glossaries - Alternative accuracy option