Skip to main content

Automatic Transcription

After uploading your media file, Scriptix automatically processes it to create a timestamped transcript. This guide explains what happens during processing and what to expect.

Overview

Automatic transcription converts your uploaded audio/video into editable text using speech recognition technology. The process runs entirely on the server - no action required from you after upload.

Starting Transcription

Transcription begins automatically after you complete the upload configuration and submit.

What Happens

  1. Upload completes - File successfully uploaded via TUS protocol
  2. Session created - Speech-to-text session created with your settings
  3. Processing begins - Server processes your file automatically
  4. Session available - You can view status in "Speech-to-Text Sessions"

Configuration Applied

Your upload settings are applied to the transcription:

{
language: string, // Selected language
diarization: boolean, // Speaker identification
keep_source: boolean, // Retain media file
media_source: string, // File location
document: {...}, // Document details
punctuation?: boolean, // Auto punctuation
multichannel?: boolean, // Multi-channel processing
folder_id?: number | null // Target folder
}

Monitoring Progress

Track your transcription through the Workspace interface.

Viewing Sessions

  1. Navigate to "Workspace" in the sidebar
  2. Your sessions appear in a paginated list
  3. Default: 25 sessions per page

Session Information

Each session displays:

  • Session details
  • Processing status
  • Duration
  • Status

The interface supports:

  • Pagination - Navigate through pages of sessions
  • Sorting - Sort by various fields
  • Searching - Search sessions (query parameter: q)
  • Filtering - Filter by status or other criteria

Data Caching

Session data is cached for performance:

  • Stale time: 5 minutes
  • Cache time: 30 minutes
  • Refetch behavior: Manual only (not on window focus/reconnect/mount)
  • Previous data kept during pagination

Understanding the Process

While the exact processing pipeline is server-side, the frontend application handles:

Session Management

The application:

  • Creates STT sessions via API
  • Fetches session list with pagination
  • Displays session status
  • Handles errors with toast notifications

Error Handling

Errors are displayed via toast messages from multiple sources:

  • error_description
  • detail
  • error
  • message
  • Default: "An error has occurred" (localized)

Transcription Settings

Language

Required setting - The language code determines which speech recognition model processes your audio.

Supported: 37+ languages

See Language Selection for the complete list.

Speaker Diarization

Optional - When enabled, the system attempts to identify and separate different speakers.

Default value can be set at organization level:

setDiarization(organization?.diarization || false);

When to use:

  • Multi-speaker content (interviews, meetings, panels)
  • When speaker attribution is important

When to skip:

  • Single speaker (lectures, presentations)
  • When speaker identity doesn't matter

Additional Options

Folder Assignment (folder_id?: number | null)

  • Organize session into specific folder
  • Null = workspace root

What You Receive

When transcription completes, you get an editable document containing:

Transcript Content

  • Complete text transcription
  • Timestamps
  • Speaker labels (if diarization enabled)
  • Paragraph structure

Document Access

Access your completed transcript:

  1. From Sessions list - Click the completed session
  2. From Workspace - Navigate to the document in folders
  3. From Dashboard - Recent documents

Best Practices

Before Processing

  1. Verify Language - Double-check language selection matches your audio
  2. Enable Diarization - For multi-speaker content
  3. Choose Folder - Organize before uploading
  4. Test First - Try a short sample if unsure about settings

During Processing

  1. No Action Needed - Processing is automatic
  2. You Can Close Browser - Processing continues server-side
  3. Check Back Later - View status in Sessions list

After Processing

  1. Open Transcript - Click completed session to view
  2. Review Quality - Skim for overall accuracy
  3. Plan Editing - Identify sections needing attention

Common Questions

How long does processing take?

Processing time varies based on:

  • File duration
  • Audio quality
  • Server load
  • Enabled features (diarization adds time)

The frontend doesn't track or display processing time estimates.

Can I see real-time progress?

The frontend shows session status but doesn't provide real-time progress percentages. Processing happens server-side.

What if processing fails?

Errors are displayed via toast notifications with the error message. You can retry transcription from the session interface.

Can I cancel processing?

You can delete the session to stop processing. Specific pause/resume functionality depends on server implementation.

Do I get notified when complete?

The application uses react-hot-toast for error notifications. Completion notifications depend on server-side implementation.

Technical Details

API Endpoints

Troubleshooting

Session Not Appearing

Check:

  • Pagination - Session may be on another page
  • Search filters - Clear any active filters
  • Cache - Refresh if cache is stale

Error Messages

All errors display via toast notifications. Common error sources:

  • Upload issues
  • Invalid settings
  • Server errors
  • Network problems

See Troubleshooting for detailed solutions.

Session Status

If unclear about status:

  1. Check the session list
  2. Look for error toast messages
  3. Try refreshing the page
  4. Check server status

Next Steps

After transcription completes:


Related Documentation: