Document Management Overview
Documents provide a flexible way to view, edit, and manage the results of transcription sessions. They are designed to give you more control than static result downloads, especially for use cases like captioning, editing, and reviewing transcripts.
๐งพ What Is a Document?โ
A document is a structured, segment-based representation of a transcription result. Unlike raw session results, documents are designed to support modification and post-processing.
We currently support two types of documents:
Document Type | Description |
---|---|
Transcript | Plain transcript, optimized for readability and review |
Caption | Timed segments (e.g. SRT/VTT) optimized for subtitling and video display |
๐ ๏ธ We offer an online caption editor today, with a visual document editor coming soon.
๐ Relation to Transcript Sessionsโ
Documents are always associated with a transcript session. A session may have zero or more documents derived from its results.
- A document is created after a batch session completes.
- At this time, real-time transcription sessions do not generate documents, since no data is stored post-session.
- This behavior may change as we roll out recording and retention support.
๐ Why Use Documents?โ
Unlike downloaded results, documents are:
- ๐งฉ Segmented โ Organized into human-editable blocks
- โ๏ธ Modifiable โ Designed for editing and collaboration
- ๐๏ธ Persistent โ Stored and accessible via the API
- ๐งญ Version-aware โ Easily integrated into review workflows
Common Use Casesโ
- Subtitling (SRT/VTT)
- Transcription review and editing
- Translation workflows
- Platform integrations (e.g. embedding captioned videos)
๐ฆ How to Useโ
To manage documents via the API, refer to:
- CRUD Operations
- Webhooks
- Magic Links โ Secure, shareable editing access
- Translation
๐ซ Limitationsโ
- Documents can only be created from completed batch sessions
- Not available for real-time sessions
- Only supported for sessions where transcription results are retained
๐ ๏ธ We plan to expand this functionality with editing, translations, and audio-linked visual editing in upcoming releases.