Document Management Overview
Documents provide a flexible way to view, edit, and manage the results of transcription sessions. They are designed to give you more control than static result downloads, especially for use cases like captioning, editing, and reviewing transcripts.
What Is a Document?
A document is a structured, segment-based representation of a transcription result. Unlike raw session results, documents are designed to support modification and post-processing.
We currently support two types of documents:
| Document Type | Description |
|---|---|
| Transcript | Plain transcript, optimized for readability and review |
| Caption | Timed segments (e.g. SRT/VTT) optimized for subtitling and video display |
We offer an online caption editor today, with a visual document editor coming soon.
Relation to Transcript Sessions
Documents are always associated with a transcript session. A session may have zero or more documents derived from its results.
- A document is created after a batch session completes.
- At this time, real-time transcription sessions do not generate documents, since no data is stored post-session.
- This behavior may change as we roll out recording and retention support.
Why Use Documents?
Unlike downloaded results, documents are:
- Segmented – Organized into human-editable blocks
- Modifiable – Designed for editing and collaboration
- Persistent – Stored and accessible via the API
- Version-aware – Easily integrated into review workflows
Common Use Cases
- Subtitling (SRT/VTT)
- Transcription review and editing
- Translation workflows
- Platform integrations (e.g. embedding captioned videos)
How to Use
To manage documents via the API, refer to:
- CRUD Operations
- Webhooks
- Magic Links – Secure, shareable editing access
- Translation
Downloading Documents
Documents can be retrieved in multiple formats (JSON, PDF, DOCX, SRT, VTT, etc.). As of December 2024, we recommend using URL-based downloads for better performance:
Recommended: URL Download Method
GET /api/v3/speech-to-text/session/{sessionId}/document/{documentId}?format=pdf&direct_download=true
Benefits:
- Faster downloads using pre-generated files
- Reduced API server load
- Better support for large files
- Browser-native download features
Returns a unified response with download URL and metadata (valid for 1 hour):
{
"result": {
"download_url": "https://scriptixbox.blob.core.windows.net/...",
"expires_in": 3600,
"filename": "transcript.pdf",
"content_type": "application/pdf",
"size_bytes": null,
"document": null,
"id": "123e4567-e89b-12d3-a456-426614174000",
"created": "2024-01-15T10:30:00Z",
"last_modified": "2024-01-15T11:00:00Z",
"language": "en",
"type": "document",
"timecode_offset": "00:00:00.000",
"finished": true,
"use_plain_document": false,
"plain_document_changed": false
},
"count": 0,
"total_results": 0
}
Legacy: Direct Streaming (Deprecated)
The original method of streaming document content directly through the API is still supported but deprecated. See Retrieve Document for migration details.
Limitations
- Documents can only be created from completed batch sessions
- Not available for real-time sessions
- Only supported for sessions where transcription results are retained
We plan to expand this functionality with editing, translations, and audio-linked visual editing in upcoming releases.