How to Use Custom Models
Custom models let you train a speech‑recognition model on your own audio and transcripts so it gets better at the words, names, and phrases that matter to your work. Once trained, you pick the model from the language dropdown when you upload a new file, instead of the default language.
This page walks through the feature exactly as it behaves in the panel.
Before You Start
- Custom Models is only visible when your plan includes the Automatic Training feature. If it's enabled, you'll see a Custom Models entry in the left sidebar (brain icon). If it isn't, the page shows "Feature not available" with a link back to the dashboard, and (for non‑resellee accounts) an Upgrade Plan button.
- Only trainable languages can be used as the base of a custom model. The language dropdown when creating a model is filtered to those.
- Deleting a model requires an admin or sysop role. Everyone with access to the page can create, train, and edit models; only admins and sysops see the Delete row action.
The Custom Models Page
Open the Custom Models entry in the sidebar. The page has:
- A header with two buttons: Getting Started (opens the guide) and New custom model.
- An About Custom Models info banner with a short summary and a link to Force Alignment (used to add timestamps to existing transcripts — more on this below).
- A table of your models with these columns:
- Name
- Training status — a coloured badge (see status list below)
- Base language — the language the model is being trained on
- Last modified
The list auto‑refreshes every 20 seconds, so you can leave it open and watch a training job progress.
First Visit
The first time you open this page, the Getting Started with Custom Models guide opens automatically. It only opens once per browser — after that you can re‑open it any time from the Getting Started button in the header.
Empty State
If you have no models yet, you'll see a dashed circle in the middle of the page with "No custom models yet" and a Create your first custom model button.
Creating a Custom Model
- Click New custom model (top right) or the Create your first custom model button in the empty state.
- Fill in:
- Name — anything that helps you recognise the model later (for example "Cardiology dictation EN"). Minimum 2 characters — the server rejects shorter names with a 422 error even though the panel button lets you submit a 1‑character name.
- Language — the base language. Only trainable languages appear in this list.
- Click Create. The Create button stays disabled until both fields are filled.
The new model appears in the table with the status Not running. It has no training data yet — that's the next step.
Possible Errors at Creation
- "Language already exists" (
language-exists) — a custom model with the same internal key already exists for your organisation. Pick a different name and try again. - "The language is not published to your organization" (
language-not-published) — the base language hasn't been granted to your organisation yet. Contact your administrator (or, for resellers, go to language publications and grant access). - "Invalid language" (
invalid-language) — the language you picked doesn't exist or is no longer trainable.
Opening a Model
Click the model's Name, Base language cell, or use the row's Edit action. This opens the model details page (/custom-models/<id>), with breadcrumbs back to the list.
The details page has four summary cards at the top:
- Name — editable. The Update button in the header is only enabled after you change this and the field isn't empty.
- Training status — same badge as the list (see below).
- Base language model — full language name (e.g. "English (US)").
- Last modified — the last time anything changed on the model.
Below the cards, a coloured Training status banner shows a short help message for the current status, plus a "Need help preparing your data?" link to Force Alignment.
Renaming a Model
Edit the Name field on the details page and click Update in the top right. You'll get a "Custom model updated successfully" toast on success.
Only the name can be changed after creation. The base language is fixed.
Uploading Datasets
Training data is uploaded as datasets on the model details page, in the Datasets section below the status banner.
Add a New Dataset
- Click New dataset in the Datasets header.
- In the dialog, pick a Type:
- Auto‑detect — the server figures out the type per file from its extension. This is the easiest option when you're uploading a mixed batch of audio and transcripts.
- TRANSCRIPT — restricts the file picker to transcript formats.
- AUDIO — restricts the file picker to audio (and video, which is converted server‑side; see below).
- Drag files in or click to browse. Up to 100 files per batch, each up to 10 GB. You can keep adding more batches afterwards — there's no total cap per model.
- Click Create to start the upload. The button shows Uploading… while it runs. While an upload is in progress you can't change the type selector or close the dialog with Cancel.
Uploads are resumable: if your connection drops, Uppy retries automatically (after 0/3s/5s/10s/20s/30s/60s) and picks up where it left off. Files upload in 50 MB chunks.
You'll get one of three toasts when the batch finishes:
- All files uploaded → success toast with the count.
- Some files failed → error toast saying how many succeeded and how many failed (the per‑file errors are visible inline in the Uppy panel).
- All files failed → error toast.
Supported Formats
- Audio:
.wav,.mp3,.m4a,.flac - Video (uploaded on the AUDIO slot — the server extracts the audio track with ffmpeg, downmixes to mono at 16 kHz AAC and stores only the resulting
.m4a; the original video is discarded):.mp4,.mov,.mkv,.webm,.avi - Transcripts:
.vtt,.srt,.txt - Manifest:
.jsonl
Anything outside these extensions is rejected by the file picker.
If the server doesn't have ffmpeg installed, video uploads fail with
ffmpeg_unavailable. On the hosted service this is always available; if you self‑host, make sure ffmpeg is on the API container.
Filename Rules
The trainer runs each file through subprocesses (sox/ffmpeg), so the server rejects filenames containing shell‑unsafe characters at validate/train time:
Use only letters, digits,
.,_,-. Spaces, parentheses, brackets, and any of; & | < > " ' \$ \ ! * ? ,` are rejected with "Unsafe filename" errors.
A common gotcha: audiomass-output (1).mp3 will upload fine but fail validation. Rename to audiomass-output-1.mp3 first.
What Gets Paired with What
Audio files are paired with transcripts at train time by their basename stem, case‑insensitive. So Talk1.mp4 pairs with talk1.vtt, and interview_03.wav pairs with interview_03.srt. You don't have to upload them together or in any specific order — upload audio first, transcripts later, or mixed.
If at train time some audio has no matching transcript, those files are skipped (not deleted) and you'll see a confirmation before training continues. See "Training" below.
Duplicate Uploads
If you upload a file whose name matches a dataset file already on the model, the server silently skips it — the older file stays. To replace a file, delete the existing dataset row first, then re‑upload.
Transcript Segment Handling
For .vtt and .srt, each cue is treated as a training segment with two hard limits applied server‑side:
- Cues shorter than 0.5 seconds are dropped.
- Cues longer than 30 seconds are automatically split into smaller pieces.
Plain .txt transcripts are not segmented — the pairing happens on the filename stem only.
The Datasets Table
Each uploaded file shows up as a row with: Name, Type, Duration, URL, Start time, End time.
Each row has two actions:
- Download — downloads the file. If it fails you get a "Download failed" toast.
- Delete — confirmation dialog; the file is removed from the model.
Don't have timestamps in your transcripts? Use Force Alignment from the dashboard's upload menu (Upload → Force alignment) to add timestamps to existing transcripts before uploading them here. The "Need help preparing your data?" link on the model details page jumps you straight to the dashboard, and Force Alignment is in the upload menu.
Training
When you've uploaded enough data, click Train the language in the top right of the model details page.
The button is hidden while training is in Provisioning resources (status 2), Running (status 3), or Success (status 4) — you can only start (or restart) training from Not running or Failed.
Minimum Dataset Requirements
These are enforced server‑side. If you don't meet them, the validation step fails with a "Dataset too small" / "Insufficient audio" message and training never starts:
- At least one transcript file must be uploaded.
- At least 2 matched audio + transcript pairs (after stem pairing, unmatched audio is skipped, so the floor applies to matched pairs).
- At least 10 minutes of total audio duration across the matched pairs, when the server has measured durations. If durations aren't available yet, the requirement falls back to at least 10 matched audio files.
The numbers come from the trainer pipeline: anything under 2 samples breaks the train/validation split, and under ~10 minutes the model has nothing meaningful to learn from.
What Happens When You Click Train
- A confirmation dialog asks you to confirm.
- The server validates the dataset:
- If validation fails, you get the first error message returned by the server (or a generic "Validation failed" if none).
- If some audio has no matching transcript, those files are listed in a second dialog: "Train anyway?" — if you confirm, training starts and those files are skipped.
- If everything matches up, training starts immediately.
- On success you get a "Training started successfully" toast and the status changes.
Possible Training Errors
- Validation failed (
validation_failed) — the dataset doesn't meet the minimum requirements above. The error text from the server lists what's missing (e.g. "Need minimum 10 minutes total duration (found 4.3 minutes)" or "At least one transcript file is required" or "Unsafe filename: '<name>'…"). Fix the issue and try again. - No claims available (
no-claims-available) — your plan's training quota for this billing period is exhausted. Either you're over the included training units and there's no pay‑per‑use price set, or you're out of budget. Contact your admin. - No claims available for hosting (
no-claims-availablereturned later, when training reaches Success) — separate from the training claim: each trained model also consumes a monthly hosting claim when it goes live. If hosting is exhausted, training completes but the model can't be enabled. This is rare in practice; admins manage it via plan limits. - No data (
no_data) — you clicked Train before uploading anything, or before uploading at least one transcript. - Custom model not found / Model is not ready to run — usually fixed by reopening the model and trying again.
- Any other error message is shown verbatim from the server.
How Quotas Work
Each training run consumes two kinds of claim against your plan:
- A training claim consumed when you click Train.
- A monthly hosting claim consumed when training reaches Success and the model is enabled. Hosting renews each billing period.
If your plan has a "fair use" training limit, claims are free up to the limit and pay‑per‑use beyond it (only if a unit price is configured). Same model for hosting. There's no separate UI showing remaining claims — you'll only see the error if you run out.
Training Statuses
The badge colour and meaning are:
| Status | Colour | What it means |
|---|---|---|
| Not running | grey | No training has started, or you can start a new one. Upload your datasets first. |
| Provisioning resources | amber | Server‑side this is "ready to run" — the training environment is being prepared. Training will start shortly. |
| Running | sky blue | Training is in progress. This can take several hours depending on dataset size. |
| Success | green | Training is done. The model is now usable for new transcriptions (see "Using your model" below). |
| Failed | red | Something went wrong. Review the datasets and start training again. |
The list page auto‑refreshes every 20 seconds, so you can leave it open while a model trains.
Using Your Model
Once training reaches Success, the model is enabled server‑side and appears in the language dropdown when you upload a file from the dashboard. Pick it instead of the plain base language and your transcription will use the custom model.
If you don't see it right after training succeeds, refresh the upload dialog — the language list is loaded when the dialog opens.
Downloads
The row Download action returns a short‑lived signed URL (a 307 redirect) to the file in storage. The link only stays valid for a few minutes — if it expires, click Download again.
Deleting a Model
In the list, the row's Delete action (red trash icon) is only available to admin and sysop roles. Other roles see only Edit.
Click Delete, confirm in the dialog, and the model and its datasets are removed. You'll see a "Custom model deleted successfully" toast. This can't be undone — the server marks the model and all dataset files as deleted, they disappear from the panel and from the language dropdown immediately, and there's no "restore" flow.
Quick Reference
- Find it: sidebar → Custom Models (brain icon)
- Create: header → New custom model → name + trainable language
- Add data: open the model → New dataset → Auto / Transcript / Audio → drop up to 100 files (10 GB each) → Create
- Train: model details → Train the language → confirm
- Use it: after status is Success, pick it from the language dropdown when uploading
- Rename: edit the Name field → Update
- Delete (admin / sysop only): row action → Delete → confirm
Next Steps
- What are Custom Models? — concepts and how custom models compare to glossaries
- When to Use Custom Models — decide if a custom model is right for your project
- Glossaries — a faster alternative for simple term replacements