What Are Custom Models?

Custom models allow you to train specialized speech recognition models with your own data to improve transcription accuracy for specific terms, names, and vocabulary.

Overview

A custom model is a specialized version of Scriptix's base language models trained with your audio files and transcripts. This training helps the model better recognize:

Domain-specific terminology
Product and company names
Specific vocabulary
Your organization's unique language patterns

How Custom Models Work

Base Models vs Custom Models

Base Models:

Pre-trained language models
Available immediately
Support standard speech recognition
Cover common vocabulary

Custom Models:

Built on top of a base language model
Trained with your specific audio and transcripts
Optimized for your vocabulary
Require training time and data

The Training Process

Create Model - Name the model and select base language
Upload Datasets - Provide audio and/or transcript files
Train Model - System trains on your data
Use Model - Select trained model when creating transcripts

Training Data Requirements

Audio Files:

Supported formats: .wav, .mp3, .m4a, .flac
1-10 files per upload
Maximum 10GB per file

Transcript Files:

Supported formats: .vtt, .srt, .txt
Must match audio content
Include accurate text and timestamps

Manifest Files:

Supported format: .jsonl
Organize datasets

Force Alignment:

Use Force Alignment feature to add timestamps to existing transcripts
Creates properly formatted training data
Available in workspace STT Session

Training Requirements

To train a custom model you need:

Datasets uploaded (audio and/or transcripts)
Training credits available in organization
Model validation passes

Using Custom Models

After training completes successfully:

Navigate to workspace (Home page)
Click "STT Session" to create transcript
Select language from dropdown
Select your custom model from model list
Upload audio file
Start transcription

The trained model applies to new transcriptions for improved accuracy.

Custom Models vs Glossaries

Both improve transcription accuracy but work differently:

Custom Models

What They Do:

Train speech recognition with your audio and transcripts
Learn vocabulary patterns from training data
Require training time

Best For:

Large volumes of specialized content
Comprehensive domain coverage
Ongoing regular use

Requirements:

Training data (audio + transcripts)
Training credits
Training time

Glossaries

What They Do:

Define term pairs (source → target)
Replace terms in transcripts
Apply during transcription

Best For:

Specific terms and names
Quick implementation
Limited vocabulary

Requirements:

Term pairs list
Source and target language
Immediate availability

Using Together:

Custom models improve base recognition
Glossaries handle specific term replacements
Both can be used in same transcription

Training Status

Models have five training statuses:

Not Running (gray) - Model created but not trained
Ready to Run (amber) - Datasets uploaded, ready to train
Running (blue) - Training in progress
Success (green) - Training completed successfully
Failed (red) - Training failed

Only models with "Success" status can be used for transcription.

Model Management

View Models:

Navigate to Custom Models page
See all models in list
View training status

Edit Models:

Click model name or Edit action
Update model name
Upload additional datasets
View training details

Delete Models:

Available to ORGADMIN and SYSOP only
Click three-dot menu on model
Select Delete
Confirm deletion

Dataset Management

Upload Datasets:

Open model details page
Click "New Dataset"
Select type (auto-detect, TRANSCRIPT, or AUDIO)
Upload 1-10 files
Files added to datasets list

Delete Datasets:

Click three-dot menu on dataset row
Select Delete
Confirm deletion

Force Alignment Integration

Use Force Alignment to prepare training data:

What It Does:

Adds timestamps to existing transcripts
Creates properly formatted training files
Uses audio + plain text transcript

How to Use:

Navigate to workspace (Home)
Click "STT Session"
Select Force Alignment option
Upload audio file
Upload or paste transcript text
Process alignment
Download timestamped transcript
Use as training data for custom model

Limitations

Custom Models Cannot:

Transcribe languages they weren't trained for
Work without training data
Be used before training succeeds
Improve audio quality issues

Training Limitations:

Requires quality training data
Takes time to complete
Needs training credits
Cannot train if already training
Cannot train if already succeeded

Access Control

Who Can Create Models:

All authenticated users can create custom models
Model creation available on Custom Models page

Who Can Delete Models:

ORGADMIN role users
SYSOP role users
Other roles cannot delete models

Who Can Train Models:

All users with access to model
Training requires credits in organization

Getting Started Guide

First-time modal appears on Custom Models page:

Guide Sections:

What are custom models
What you need (training data requirements)
Only have audio files (use Force Alignment)
After training (how to use trained model)

Access Guide:

Appears automatically on first visit
Click "Getting Started" button to reopen
Stored in browser localStorage

Transform your transcription accuracy! Custom models bring domain expertise to speech recognition.

Next Steps

When to Use Custom Models - Determine if they're right for you
Create Custom Model - Step-by-step creation guide
Glossaries - Alternative accuracy option

Overview​

How Custom Models Work​

Base Models vs Custom Models​

The Training Process​

Training Data Requirements​

Training Requirements​

Using Custom Models​

Custom Models vs Glossaries​

Custom Models​

Glossaries​

Training Status​

Model Management​

Dataset Management​

Force Alignment Integration​

Limitations​

Access Control​

Getting Started Guide​

Next Steps​