Custom Models API - Overview
Complete guide to the Custom Models API for training specialized speech recognition models tailored to your domain, vocabulary, and audio environment.
What are Custom Models?
Custom models are specialized speech recognition models trained on your specific audio data and vocabulary to achieve higher accuracy than general-purpose models. They learn:
- Domain-specific terminology (medical, legal, technical terms)
- Speaker characteristics (accents, speaking patterns, pronunciation)
- Audio environment (recording quality, background noise patterns)
- Vocabulary patterns (company names, product names, jargon)
Accuracy improvements typically range from 15-30% for specialized domains.
Use Cases
Medical & Healthcare
- Clinical documentation: Physician dictation, patient notes, diagnoses
- Radiology reports: MRI, CT scan, X-ray descriptions
- Medical consultations: Doctor-patient conversations
- Specialized departments: Cardiology, oncology, pathology
Legal
- Court proceedings: Depositions, hearings, trials
- Legal consultations: Attorney-client meetings
- Contract dictation: Legal document creation
- Case notes: Legal research and case documentation
Technical & IT
- Technical support: Customer support calls, troubleshooting
- Engineering discussions: Design reviews, technical meetings
- Product documentation: Technical writing, specifications
- Software development: Code reviews, sprint meetings
Business & Corporate
- Executive meetings: Board meetings, strategy sessions
- Sales calls: Client meetings, product demos
- Customer service: Support calls, feedback sessions
- Training sessions: Employee training, onboarding
How It Works
1. Create Model
POST /api/v3/custom_models
Define your model with name, language, and base model type.
2. Upload Training Data
PUT /api/v3/custom_models/{id}/data
Upload matched pairs of audio files and transcripts (training + test sets).
Minimum requirements:
- 5+ hours of audio
- Accurate transcripts matching audio
- Representative of production use case
3. Start Training
POST /api/v3/custom_models/{id}/run
Trigger the training process (typically 6-12 hours depending on data size).
4. Monitor Progress
GET /api/v3/custom_models/{id}
Poll the model status to track training progress (0-100%).
5. Use in Production
Once training completes successfully, reference your model in transcription requests:
POST /api/v3/stt
{
"model": "custom_model_123",
"language": "en",
"audio_file": "..."
}
API Endpoints Summary
| Method | Endpoint | Description |
|---|---|---|
| POST | /api/v3/custom_models | Create new custom model |
| GET | /api/v3/custom_models | List all models with pagination |
| GET | /api/v3/custom_models/{id} | Get model details and status |
| POST | /api/v3/custom_models/{id} | Update model metadata |
| DELETE | /api/v3/custom_models/{id} | Delete model permanently |
| PUT | /api/v3/custom_models/{id}/data | Upload training or test data |
| POST | /api/v3/custom_models/{id}/run | Start training process |
| GET | /api/v3/custom_models/{id}/log | Get training logs |
Training Status Lifecycle
┌─────────────┐
│ 1: Not │ ← Model created, no data uploaded
│ Running │
└──────┬ ──────┘
│ Upload training data
▼
┌─────────────┐
│ 2: Ready │ ← Data uploaded, ready to train
│ to Run │
└──────┬──────┘
│ POST /run
▼
┌─────────────┐
│ 3: Running │ ← Training in progress (0-100%)
│ │
└──────┬──────┘
│
├─── Success ───→ ┌─────────────┐
│ │ 4: Success │ ← Ready for production use
│ └─────────────┘
│
└─── Failure ───→ ┌─────────────┐
│ 5: Failed │ ← Training error occurred
└─────────────┘
Status Codes:
1- Not Running (created, no data)2- Ready to Run (data uploaded)3- Running (training in progress)4- Success (training completed, ready for use)5- Failed (training error)
Base Model Types
Choose a base model that matches your domain for best results:
| Base Model | Description | Best For |
|---|---|---|
general | General speech, broadest vocabulary | Business meetings, interviews, general content |
medical | Medical terminology and procedures | Clinical notes, radiology reports, medical consultations |
legal | Legal vocabulary and case law | Depositions, legal consultations, court proceedings |
technical | Technical/IT terminology | Tech support, engineering discussions, software development |
Training Data Requirements
Minimum Requirements
- Audio duration: 5+ hours total
- Audio quality: Clear speech, minimal background noise
- Transcript accuracy: 95%+ accuracy required
- Data split: 80% training, 20% test recommended
- Format: Supported audio formats (MP3, WAV, FLAC, M4A, AAC, OGG)
Quality Guidelines
Audio:
- Sample rate: 16kHz or higher
- Bit rate: 128kbps or higher
- Mono or stereo
- Low background noise
- Clear speech, minimal overlapping speakers
Transcripts:
- Plain text (.txt files)
- Verbatim transcription (include all words spoken)
- Correct punctuation and capitalization
- Match audio timing exactly
- Include all domain-specific terminology
Recommended Data Size
| Audio Hours | Expected Improvement | Use Case |
|---|---|---|
| 5-10 hours | 10-15% | Small vocabulary, limited domain |
| 10-30 hours | 15-25% | Standard domain customization |
| 30-50 hours | 25-30% | Comprehensive domain coverage |
| 50+ hours | 30%+ | Maximum accuracy, complex domains |
Authentication
All Custom Models API endpoints require authentication via API key:
curl https://api.scriptix.io/api/v3/custom_models \
-H "Authorization: Bearer YOUR_API_KEY"
Get your API key from the Scriptix Dashboard.
Rate Limits
- Model creation: 10 requests/hour
- Training starts: 5 requests/hour
- Status checks: 100 requests/minute
- Data uploads: 50 requests/hour
See Rate Limits for details.
Error Handling
All endpoints return standard error responses:
{
"error": "Error Type",
"message": "Human-readable description",
"error_code": "MACHINE_READABLE_CODE",
"details": {
"field": "Additional context"
}
}
Common error codes:
MODEL_NOT_FOUND(404) - Model doesn't existMODEL_NOT_READY(409) - Cannot train, upload data firstINSUFFICIENT_DATA(400) - Not enough training dataTRAINING_FAILED(500) - Training process encountered errorMODEL_LIMIT_EXCEEDED(402) - Quota limit reached
See Error Codes for complete list.
Pricing
Custom model training is available on Gold and Enterprise plans.
Limits by plan:
- Gold: 5 custom models
- Enterprise: Unlimited custom models
Training costs:
- First 10 hours of training data: Included
- Additional hours: $50/hour of audio data
Contact sales for Enterprise pricing.
Quick Start Example
Complete workflow from creation to production use:
# 1. Create model
curl -X POST https://api.scriptix.io/api/v3/custom_models \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"name": "Medical Cardiology EN",
"language": "en",
"base_model": "medical",
"description": "Cardiology consultations and reports"
}'
# Response: {"id": 123, "training_status": 1, ...}
# 2. Upload training data (repeat for each file pair)
curl -X PUT https://api.scriptix.io/api/v3/custom_models/123/data \
-H "Authorization: Bearer YOUR_API_KEY" \
-F "type=train" \
-F "audio_file=@cardiology_01.mp3" \
-F "transcript_file=@cardiology_01.txt"
# 3. Upload test data
curl -X PUT https://api.scriptix.io/api/v3/custom_models/123/data \
-H "Authorization: Bearer YOUR_API_KEY" \
-F "type=test" \
-F "audio_file=@cardiology_test_01.mp3" \
-F "transcript_file=@cardiology_test_01.txt"
# 4. Start training
curl -X POST https://api.scriptix.io/api/v3/custom_models/123/run \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{}'
# 5. Check status (poll until training_status = 4)
curl https://api.scriptix.io/api/v3/custom_models/123 \
-H "Authorization: Bearer YOUR_API_KEY"
# Response: {"id": 123, "training_status": 3, "training_progress": 65, ...}
# 6. Use in production (after status = 4)
curl -X POST https://api.scriptix.io/api/v3/stt \
-H "Authorization: Bearer YOUR_API_KEY" \
-F "model=custom_model_123" \
-F "language=en" \
-F "audio_file=@new_recording.mp3"
Best Practices
Data Preparation
- Diverse audio samples: Include variety of speakers, speaking styles, recording conditions
- Representative data: Training data should match production use case
- Clean transcripts: Ensure 95%+ accuracy before uploading
- Sufficient volume: More data = better results (aim for 20+ hours)
- Test set quality: Use high-quality test set to measure true performance
Training Process
- Monitor progress: Poll status endpoint during training
- Review metrics: Check
word_error_rateandaccuracy_improvementafter completion - Iterate: If results insufficient, add more data and retrain
- Test thoroughly: Validate on real production audio before deployment
- Keep improving: Periodically retrain with new production data
Production Use
- Gradual rollout: Test on subset of traffic before full deployment
- Monitor accuracy: Track real-world performance vs. base model
- Update regularly: Retrain every 3-6 months with new data
- Version models: Keep previous versions as backup
- Document changes: Track what data was used for each training run
Training Performance Metrics
After successful training (training_status = 4), review these metrics:
{
"training_metrics": {
"word_error_rate": 12.5, // Lower is better (test set WER)
"accuracy_improvement": 18.7, // % improvement over base model
"character_error_rate": 8.3, // Character-level accuracy
"training_hours": 7.5 // Audio hours used in training
}
}
What's a good WER?
- < 10%: Excellent (professional transcription quality)
- 10-15%: Very good (suitable for most use cases)
- 15-25%: Good (significant improvement over base)
- 25-35%: Fair (may need more training data)
- > 35%: Poor (review data quality or add more data)
Common Issues & Troubleshooting
Training Failed
Symptom: training_status = 5 with error message
Common causes:
- Insufficient training data (< 5 hours)
- Poor transcript quality (too many errors)
- Audio/transcript mismatch
- Corrupted audio files
Solutions:
- Verify minimum 5 hours of audio
- Review transcript accuracy (aim for 95%+)
- Ensure audio and transcript align exactly
- Re-upload with corrected data
Low Accuracy Improvement
Symptom: accuracy_improvement < 10%
Common causes:
- Too little training data
- Training data not representative of production
- High-quality base model already performs well
- Insufficient domain-specific vocabulary
Solutions:
- Add more training data (aim for 20+ hours)
- Ensure training data matches production use case
- Include more domain-specific examples
- Consider different base model
Slow Training
Symptom: Training takes > 24 hours
Common causes:
- Very large dataset (50+ hours)
- System capacity constraints
Solutions:
- Normal for large datasets
- Contact support if > 48 hours
- Consider splitting into smaller model updates
Next Steps
- Create Model - Create your first custom model
- Upload Training Data - Prepare and upload audio + transcripts
- Train Model - Start training and monitor progress
- Data Models - Complete schema reference
Support & Resources
- API Reference: See individual endpoint documentation
- Support: support@scriptix.io
- Status Page: status.scriptix.io
- Community: community.scriptix.io
Ready to get started? Create your first custom model and achieve higher accuracy for your domain!