Skip to main content

What Are Custom Models?

Custom models allow you to train specialized speech recognition models with your own data to improve transcription accuracy for specific terms, names, and vocabulary.

Overview

A custom model is a specialized version of Scriptix's base language models trained with your audio files and transcripts. This training helps the model better recognize:

  • Domain-specific terminology
  • Product and company names
  • Specific vocabulary
  • Your organization's unique language patterns

How Custom Models Work

Base Models vs Custom Models

Base Models:

  • Pre-trained language models
  • Available immediately
  • Support standard speech recognition
  • Cover common vocabulary

Custom Models:

  • Built on top of a base language model
  • Trained with your specific audio and transcripts
  • Optimized for your vocabulary
  • Require training time and data

The Training Process

  1. Create Model - Name the model and select base language
  2. Upload Datasets - Provide audio and/or transcript files
  3. Train Model - System trains on your data
  4. Use Model - Select trained model when creating transcripts

Training Data Requirements

Audio Files:

  • Supported formats: .wav, .mp3, .m4a, .flac
  • 1-10 files per upload
  • Maximum 10GB per file

Transcript Files:

  • Supported formats: .vtt, .srt, .txt
  • Must match audio content
  • Include accurate text and timestamps

Manifest Files:

  • Supported format: .jsonl
  • Organize datasets

Force Alignment:

  • Use Force Alignment feature to add timestamps to existing transcripts
  • Creates properly formatted training data
  • Available in workspace STT Session

Training Requirements

To train a custom model you need:

  • Datasets uploaded (audio and/or transcripts)
  • Training credits available in organization
  • Model validation passes

Using Custom Models

After training completes successfully:

  1. Navigate to workspace (Home page)
  2. Click "STT Session" to create transcript
  3. Select language from dropdown
  4. Select your custom model from model list
  5. Upload audio file
  6. Start transcription

The trained model applies to new transcriptions for improved accuracy.

Custom Models vs Glossaries

Both improve transcription accuracy but work differently:

Custom Models

What They Do:

  • Train speech recognition with your audio and transcripts
  • Learn vocabulary patterns from training data
  • Require training time

Best For:

  • Large volumes of specialized content
  • Comprehensive domain coverage
  • Ongoing regular use

Requirements:

  • Training data (audio + transcripts)
  • Training credits
  • Training time

Glossaries

What They Do:

  • Define term pairs (source → target)
  • Replace terms in transcripts
  • Apply during transcription

Best For:

  • Specific terms and names
  • Quick implementation
  • Limited vocabulary

Requirements:

  • Term pairs list
  • Source and target language
  • Immediate availability

Using Together:

  • Custom models improve base recognition
  • Glossaries handle specific term replacements
  • Both can be used in same transcription

Training Status

Models have five training statuses:

  1. Not Running (gray) - Model created but not trained
  2. Ready to Run (amber) - Datasets uploaded, ready to train
  3. Running (blue) - Training in progress
  4. Success (green) - Training completed successfully
  5. Failed (red) - Training failed

Only models with "Success" status can be used for transcription.

Model Management

View Models:

  • Navigate to Custom Models page
  • See all models in list
  • View training status

Edit Models:

  • Click model name or Edit action
  • Update model name
  • Upload additional datasets
  • View training details

Delete Models:

  • Available to ORGADMIN and SYSOP only
  • Click three-dot menu on model
  • Select Delete
  • Confirm deletion

Dataset Management

Upload Datasets:

  • Open model details page
  • Click "New Dataset"
  • Select type (auto-detect, TRANSCRIPT, or AUDIO)
  • Upload 1-10 files
  • Files added to datasets list

Delete Datasets:

  • Click three-dot menu on dataset row
  • Select Delete
  • Confirm deletion

Force Alignment Integration

Use Force Alignment to prepare training data:

What It Does:

  • Adds timestamps to existing transcripts
  • Creates properly formatted training files
  • Uses audio + plain text transcript

How to Use:

  1. Navigate to workspace (Home)
  2. Click "STT Session"
  3. Select Force Alignment option
  4. Upload audio file
  5. Upload or paste transcript text
  6. Process alignment
  7. Download timestamped transcript
  8. Use as training data for custom model

Limitations

Custom Models Cannot:

  • Transcribe languages they weren't trained for
  • Work without training data
  • Be used before training succeeds
  • Improve audio quality issues

Training Limitations:

  • Requires quality training data
  • Takes time to complete
  • Needs training credits
  • Cannot train if already training
  • Cannot train if already succeeded

Access Control

Who Can Create Models:

  • All authenticated users can create custom models
  • Model creation available on Custom Models page

Who Can Delete Models:

  • ORGADMIN role users
  • SYSOP role users
  • Other roles cannot delete models

Who Can Train Models:

  • All users with access to model
  • Training requires credits in organization

Getting Started Guide

First-time modal appears on Custom Models page:

Guide Sections:

  • What are custom models
  • What you need (training data requirements)
  • Only have audio files (use Force Alignment)
  • After training (how to use trained model)

Access Guide:

  • Appears automatically on first visit
  • Click "Getting Started" button to reopen
  • Stored in browser localStorage

Transform your transcription accuracy! Custom models bring domain expertise to speech recognition.

Next Steps