Skip to main content

Custom Models API - Overview

Complete guide to the Custom Models API for training specialized speech recognition models tailored to your domain, vocabulary, and audio environment.

What are Custom Models?

Custom models are specialized speech recognition models trained on your specific audio data and vocabulary to achieve higher accuracy than general-purpose models. They learn:

  • Domain-specific terminology (medical, legal, technical terms)
  • Speaker characteristics (accents, speaking patterns, pronunciation)
  • Audio environment (recording quality, background noise patterns)
  • Vocabulary patterns (company names, product names, jargon)

Accuracy improvements typically range from 15-30% for specialized domains.

Use Cases

Medical & Healthcare

  • Clinical documentation: Physician dictation, patient notes, diagnoses
  • Radiology reports: MRI, CT scan, X-ray descriptions
  • Medical consultations: Doctor-patient conversations
  • Specialized departments: Cardiology, oncology, pathology
  • Court proceedings: Depositions, hearings, trials
  • Legal consultations: Attorney-client meetings
  • Contract dictation: Legal document creation
  • Case notes: Legal research and case documentation

Technical & IT

  • Technical support: Customer support calls, troubleshooting
  • Engineering discussions: Design reviews, technical meetings
  • Product documentation: Technical writing, specifications
  • Software development: Code reviews, sprint meetings

Business & Corporate

  • Executive meetings: Board meetings, strategy sessions
  • Sales calls: Client meetings, product demos
  • Customer service: Support calls, feedback sessions
  • Training sessions: Employee training, onboarding

How It Works

1. Create Model

POST /api/v3/custom_models

Define your model with name, language, and base model type.

2. Upload Training Data

PUT /api/v3/custom_models/{id}/data

Upload matched pairs of audio files and transcripts (training + test sets).

Minimum requirements:

  • 5+ hours of audio
  • Accurate transcripts matching audio
  • Representative of production use case

3. Start Training

POST /api/v3/custom_models/{id}/run

Trigger the training process (typically 6-12 hours depending on data size).

4. Monitor Progress

GET /api/v3/custom_models/{id}

Poll the model status to track training progress (0-100%).

5. Use in Production

Once training completes successfully, reference your model in transcription requests:

POST /api/v3/stt
{
"model": "custom_model_123",
"language": "en",
"audio_file": "..."
}

API Endpoints Summary

MethodEndpointDescription
POST/api/v3/custom_modelsCreate new custom model
GET/api/v3/custom_modelsList all models with pagination
GET/api/v3/custom_models/{id}Get model details and status
POST/api/v3/custom_models/{id}Update model metadata
DELETE/api/v3/custom_models/{id}Delete model permanently
PUT/api/v3/custom_models/{id}/dataUpload training or test data
POST/api/v3/custom_models/{id}/runStart training process
GET/api/v3/custom_models/{id}/logGet training logs

Training Status Lifecycle

┌─────────────┐
│ 1: Not │ ← Model created, no data uploaded
│ Running │
└──────┬──────┘
│ Upload training data

┌─────────────┐
│ 2: Ready │ ← Data uploaded, ready to train
│ to Run │
└──────┬──────┘
│ POST /run

┌─────────────┐
│ 3: Running │ ← Training in progress (0-100%)
│ │
└──────┬──────┘

├─── Success ───→ ┌─────────────┐
│ │ 4: Success │ ← Ready for production use
│ └─────────────┘

└─── Failure ───→ ┌─────────────┐
│ 5: Failed │ ← Training error occurred
└─────────────┘

Status Codes:

  • 1 - Not Running (created, no data)
  • 2 - Ready to Run (data uploaded)
  • 3 - Running (training in progress)
  • 4 - Success (training completed, ready for use)
  • 5 - Failed (training error)

Base Model Types

Choose a base model that matches your domain for best results:

Base ModelDescriptionBest For
generalGeneral speech, broadest vocabularyBusiness meetings, interviews, general content
medicalMedical terminology and proceduresClinical notes, radiology reports, medical consultations
legalLegal vocabulary and case lawDepositions, legal consultations, court proceedings
technicalTechnical/IT terminologyTech support, engineering discussions, software development

Training Data Requirements

Minimum Requirements

  • Audio duration: 5+ hours total
  • Audio quality: Clear speech, minimal background noise
  • Transcript accuracy: 95%+ accuracy required
  • Data split: 80% training, 20% test recommended
  • Format: Supported audio formats (MP3, WAV, FLAC, M4A, AAC, OGG)

Quality Guidelines

Audio:

  • Sample rate: 16kHz or higher
  • Bit rate: 128kbps or higher
  • Mono or stereo
  • Low background noise
  • Clear speech, minimal overlapping speakers

Transcripts:

  • Plain text (.txt files)
  • Verbatim transcription (include all words spoken)
  • Correct punctuation and capitalization
  • Match audio timing exactly
  • Include all domain-specific terminology
Audio HoursExpected ImprovementUse Case
5-10 hours10-15%Small vocabulary, limited domain
10-30 hours15-25%Standard domain customization
30-50 hours25-30%Comprehensive domain coverage
50+ hours30%+Maximum accuracy, complex domains

Authentication

All Custom Models API endpoints require authentication via API key:

curl https://api.scriptix.io/api/v3/custom_models \
-H "Authorization: Bearer YOUR_API_KEY"

Get your API key from the Scriptix Dashboard.

Rate Limits

  • Model creation: 10 requests/hour
  • Training starts: 5 requests/hour
  • Status checks: 100 requests/minute
  • Data uploads: 50 requests/hour

See Rate Limits for details.

Error Handling

All endpoints return standard error responses:

{
"error": "Error Type",
"message": "Human-readable description",
"error_code": "MACHINE_READABLE_CODE",
"details": {
"field": "Additional context"
}
}

Common error codes:

  • MODEL_NOT_FOUND (404) - Model doesn't exist
  • MODEL_NOT_READY (409) - Cannot train, upload data first
  • INSUFFICIENT_DATA (400) - Not enough training data
  • TRAINING_FAILED (500) - Training process encountered error
  • MODEL_LIMIT_EXCEEDED (402) - Quota limit reached

See Error Codes for complete list.

Pricing

Custom model training is available on Gold and Enterprise plans.

Limits by plan:

  • Gold: 5 custom models
  • Enterprise: Unlimited custom models

Training costs:

  • First 10 hours of training data: Included
  • Additional hours: $50/hour of audio data

Contact sales for Enterprise pricing.

Quick Start Example

Complete workflow from creation to production use:

# 1. Create model
curl -X POST https://api.scriptix.io/api/v3/custom_models \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"name": "Medical Cardiology EN",
"language": "en",
"base_model": "medical",
"description": "Cardiology consultations and reports"
}'
# Response: {"id": 123, "training_status": 1, ...}

# 2. Upload training data (repeat for each file pair)
curl -X PUT https://api.scriptix.io/api/v3/custom_models/123/data \
-H "Authorization: Bearer YOUR_API_KEY" \
-F "type=train" \
-F "audio_file=@cardiology_01.mp3" \
-F "transcript_file=@cardiology_01.txt"

# 3. Upload test data
curl -X PUT https://api.scriptix.io/api/v3/custom_models/123/data \
-H "Authorization: Bearer YOUR_API_KEY" \
-F "type=test" \
-F "audio_file=@cardiology_test_01.mp3" \
-F "transcript_file=@cardiology_test_01.txt"

# 4. Start training
curl -X POST https://api.scriptix.io/api/v3/custom_models/123/run \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{}'

# 5. Check status (poll until training_status = 4)
curl https://api.scriptix.io/api/v3/custom_models/123 \
-H "Authorization: Bearer YOUR_API_KEY"
# Response: {"id": 123, "training_status": 3, "training_progress": 65, ...}

# 6. Use in production (after status = 4)
curl -X POST https://api.scriptix.io/api/v3/stt \
-H "Authorization: Bearer YOUR_API_KEY" \
-F "model=custom_model_123" \
-F "language=en" \
-F "audio_file=@new_recording.mp3"

Best Practices

Data Preparation

  1. Diverse audio samples: Include variety of speakers, speaking styles, recording conditions
  2. Representative data: Training data should match production use case
  3. Clean transcripts: Ensure 95%+ accuracy before uploading
  4. Sufficient volume: More data = better results (aim for 20+ hours)
  5. Test set quality: Use high-quality test set to measure true performance

Training Process

  1. Monitor progress: Poll status endpoint during training
  2. Review metrics: Check word_error_rate and accuracy_improvement after completion
  3. Iterate: If results insufficient, add more data and retrain
  4. Test thoroughly: Validate on real production audio before deployment
  5. Keep improving: Periodically retrain with new production data

Production Use

  1. Gradual rollout: Test on subset of traffic before full deployment
  2. Monitor accuracy: Track real-world performance vs. base model
  3. Update regularly: Retrain every 3-6 months with new data
  4. Version models: Keep previous versions as backup
  5. Document changes: Track what data was used for each training run

Training Performance Metrics

After successful training (training_status = 4), review these metrics:

{
"training_metrics": {
"word_error_rate": 12.5, // Lower is better (test set WER)
"accuracy_improvement": 18.7, // % improvement over base model
"character_error_rate": 8.3, // Character-level accuracy
"training_hours": 7.5 // Audio hours used in training
}
}

What's a good WER?

  • < 10%: Excellent (professional transcription quality)
  • 10-15%: Very good (suitable for most use cases)
  • 15-25%: Good (significant improvement over base)
  • 25-35%: Fair (may need more training data)
  • > 35%: Poor (review data quality or add more data)

Common Issues & Troubleshooting

Training Failed

Symptom: training_status = 5 with error message

Common causes:

  • Insufficient training data (< 5 hours)
  • Poor transcript quality (too many errors)
  • Audio/transcript mismatch
  • Corrupted audio files

Solutions:

  • Verify minimum 5 hours of audio
  • Review transcript accuracy (aim for 95%+)
  • Ensure audio and transcript align exactly
  • Re-upload with corrected data

Low Accuracy Improvement

Symptom: accuracy_improvement < 10%

Common causes:

  • Too little training data
  • Training data not representative of production
  • High-quality base model already performs well
  • Insufficient domain-specific vocabulary

Solutions:

  • Add more training data (aim for 20+ hours)
  • Ensure training data matches production use case
  • Include more domain-specific examples
  • Consider different base model

Slow Training

Symptom: Training takes > 24 hours

Common causes:

  • Very large dataset (50+ hours)
  • System capacity constraints

Solutions:

  • Normal for large datasets
  • Contact support if > 48 hours
  • Consider splitting into smaller model updates

Next Steps

Support & Resources

  • API Reference: See individual endpoint documentation
  • Support: support@scriptix.io
  • Status Page: status.scriptix.io
  • Community: community.scriptix.io

Ready to get started? Create your first custom model and achieve higher accuracy for your domain!