Automatic Transcription
After uploading your media file, Scriptix automatically processes it to create a timestamped transcript. This guide explains what happens during processing and what to expect.
Overview
Automatic transcription converts your uploaded audio/video into editable text using speech recognition technology. The process runs entirely on the server - no action required from you after upload.
Starting Transcription
Transcription begins automatically after you complete the upload configuration and submit.
What Happens
- Upload completes - File successfully uploaded via TUS protocol
- Session created - Speech-to-text session created with your settings
- Processing begins - Server processes your file automatically
- Session available - You can view status in "Speech-to-Text Sessions"
Configuration Applied
Your upload settings are applied to the transcription:
{
language: string, // Selected language
diarization: boolean, // Speaker identification
keep_source: boolean, // Retain media file
media_source: string, // File location
document: {...}, // Document details
punctuation?: boolean, // Auto punctuation
multichannel?: boolean, // Multi-channel processing
folder_id?: number | null // Target folder
}
Monitoring Progress
Track your transcription through the Workspace interface.
Viewing Sessions
- Navigate to "Workspace" in the sidebar
- Your sessions appear in a paginated list
- Default: 25 sessions per page
Session Information
Each session displays:
- Session details
- Processing status
- Duration
- Status
The interface supports:
- Pagination - Navigate through pages of sessions
- Sorting - Sort by various fields
- Searching - Search sessions (query parameter:
q) - Filtering - Filter by status or other criteria
Data Caching
Session data is cached for performance:
- Stale time: 5 minutes
- Cache time: 30 minutes
- Refetch behavior: Manual only (not on window focus/reconnect/mount)
- Previous data kept during pagination
Understanding the Process
While the exact processing pipeline is server-side, the frontend application handles:
Session Management
The application:
- Creates STT sessions via API
- Fetches session list with pagination
- Displays session status
- Handles errors with toast notifications
Error Handling
Errors are displayed via toast messages from multiple sources:
error_descriptiondetailerrormessage- Default: "An error has occurred" (localized)
Transcription Settings
Language
Required setting - The language code determines which speech recognition model processes your audio.
Supported: 37+ languages
See Language Selection for the complete list.
Speaker Diarization
Optional - When enabled, the system attempts to identify and separate different speakers.
Default value can be set at organization level:
setDiarization(organization?.diarization || false);
When to use:
- Multi-speaker content (interviews, meetings, panels)
- When speaker attribution is important
When to skip:
- Single speaker (lectures, presentations)
- When speaker identity doesn't matter
Additional Options
Folder Assignment (folder_id?: number | null)
- Organize session into specific folder
- Null = workspace root
What You Receive
When transcription completes, you get an editable document containing:
Transcript Content
- Complete text transcription
- Timestamps
- Speaker labels (if diarization enabled)
- Paragraph structure
Document Access
Access your completed transcript:
- From Sessions list - Click the completed session
- From Workspace - Navigate to the document in folders
- From Dashboard - Recent documents
Best Practices
Before Processing
- Verify Language - Double-check language selection matches your audio
- Enable Diarization - For multi-speaker content
- Choose Folder - Organize before uploading
- Test First - Try a short sample if unsure about settings
During Processing
- No Action Needed - Processing is automatic
- You Can Close Browser - Processing continues server-side
- Check Back Later - View status in Sessions list
After Processing
- Open Transcript - Click completed session to view
- Review Quality - Skim for overall accuracy
- Plan Editing - Identify sections needing attention
Common Questions
How long does processing take?
Processing time varies based on:
- File duration
- Audio quality
- Server load
- Enabled features (diarization adds time)
The frontend doesn't track or display processing time estimates.
Can I see real-time progress?
The frontend shows session status but doesn't provide real-time progress percentages. Processing happens server-side.
What if processing fails?
Errors are displayed via toast notifications with the error message. You can retry transcription from the session interface.
Can I cancel processing?
You can delete the session to stop processing. Specific pause/resume functionality depends on server implementation.
Do I get notified when complete?
The application uses react-hot-toast for error notifications. Completion notifications depend on server-side implementation.
Technical Details
API Endpoints
Troubleshooting
Session Not Appearing
Check:
- Pagination - Session may be on another page
- Search filters - Clear any active filters
- Cache - Refresh if cache is stale
Error Messages
All errors display via toast notifications. Common error sources:
- Upload issues
- Invalid settings
- Server errors
- Network problems
See Troubleshooting for detailed solutions.
Session Status
If unclear about status:
- Check the session list
- Look for error toast messages
- Try refreshing the page
- Check server status
Next Steps
After transcription completes:
- Transcript Editor - Edit your transcript
- Speaker Management - Manage speaker labels
- Export - Export your finished transcript
Related Documentation:
- Getting Started - Complete transcription workflow
- Upload - Upload methods and configuration
- Troubleshooting - Common issues