Skip to main content

Custom Models

Custom models are specialized speech recognition models that can be trained to improve transcription accuracy for specific vocabulary and terminology.

Overview

Custom models optimize speech recognition for specific terminology and industry vocabulary. Available on Business (Gold, €1,199/month) and Enterprise (Platinum, €10,000/month) plans.

Important: Custom models are organization-specific. No verified cross-organization sharing functionality exists.

Requirements: 1-10 dataset files (10GB max per file). Supported formats: Audio (.wav, .mp3, .m4a, .flac), Transcript (.vtt, .srt, .txt), Manifest (.jsonl)

Creating Custom Models

Navigate to Custom Models → "Create Custom Model" → fill form (Name*, Language*) → Create. Language selection determines base model. Guide modal appears on first visit.

Training Custom Models

Click model name → "Add Dataset" → select type (Auto/Audio/Transcript/Manifest) → upload files (1-10, 10GB max) → submit. Dataset table shows: Name, Type, Duration, URL, Start/End time.

Train Model: Click "Train" → confirm → training begins.

Training Status (5 states): Not Running (Zinc), Ready to Run (Amber), Running (Sky), Success (Emerald), Failed (Red). Train button available when Not Running or Failed.

Managing Custom Models

Models List: View table with ID, Name, Base Language, Training Status, Last Modified, Type. Filter by language/status, sort, search by name.

Model Details: Click model name → view/edit name, language, last modified, organization ID. Save changes.

Delete: ORGADMIN and SYSOP only. Click Delete → confirm → deleted.

Permissions: ORGADMIN (create, train, update, delete, view all), SYSOP (full access, back-office), Other Roles (view only, cannot delete).

Organization Ownership: Each model belongs to one organization. No cross-organization sharing.

Using Custom Models

Select trained custom model during transcription upload/setup for improved accuracy on domain-specific content.

Best Practices

Training Data: Use clear, high-quality audio with minimal background noise. Provide accurate word-for-word transcripts matching audio files. Upload 1-10 dataset files (10GB max each).

Model Management: Use descriptive names (e.g., "Medical_EN_US"). Wait for training completion before use. Test with sample transcriptions and compare accuracy.

Troubleshooting

Training Fails: Check file formats/sizes, ensure audio-transcript match, review dataset quality, retry training.

Cannot Delete: Requires ORGADMIN or SYSOP role.

Upload Fails: File size limit 10GB. Check valid formats (Audio: .wav/.mp3/.m4a/.flac, Transcript: .vtt/.srt/.txt, Manifest: .jsonl).

Next Steps