Skip to main content

Document Management Overview

Documents provide a flexible way to view, edit, and manage the results of transcription sessions. They are designed to give you more control than static result downloads, especially for use cases like captioning, editing, and reviewing transcripts.


What Is a Document?

A document is a structured, segment-based representation of a transcription result. Unlike raw session results, documents are designed to support modification and post-processing.

We currently support two types of documents:

Document TypeDescription
TranscriptPlain transcript, optimized for readability and review
CaptionTimed segments (e.g. SRT/VTT) optimized for subtitling and video display

We offer an online caption editor today, with a visual document editor coming soon.


Relation to Transcript Sessions

Documents are always associated with a transcript session. A session may have zero or more documents derived from its results.

  • A document is created after a batch session completes.
  • At this time, real-time transcription sessions do not generate documents, since no data is stored post-session.
  • This behavior may change as we roll out recording and retention support.

Why Use Documents?

Unlike downloaded results, documents are:

  • Segmented – Organized into human-editable blocks
  • Modifiable – Designed for editing and collaboration
  • Persistent – Stored and accessible via the API
  • Version-aware – Easily integrated into review workflows

Common Use Cases

  • Subtitling (SRT/VTT)
  • Transcription review and editing
  • Translation workflows
  • Platform integrations (e.g. embedding captioned videos)

How to Use

To manage documents via the API, refer to:


Downloading Documents

Documents can be retrieved in multiple formats (JSON, PDF, DOCX, SRT, VTT, etc.). As of December 2024, we recommend using URL-based downloads for better performance:

GET /api/v3/speech-to-text/session/{sessionId}/document/{documentId}?format=pdf&direct_download=true

Benefits:

  • Faster downloads using pre-generated files
  • Reduced API server load
  • Better support for large files
  • Browser-native download features

Returns a unified response with download URL and metadata (valid for 1 hour):

{
"result": {
"download_url": "https://scriptixbox.blob.core.windows.net/...",
"expires_in": 3600,
"filename": "transcript.pdf",
"content_type": "application/pdf",
"size_bytes": null,
"document": null,
"id": "123e4567-e89b-12d3-a456-426614174000",
"created": "2024-01-15T10:30:00Z",
"last_modified": "2024-01-15T11:00:00Z",
"language": "en",
"type": "document",
"timecode_offset": "00:00:00.000",
"finished": true,
"use_plain_document": false,
"plain_document_changed": false
},
"count": 0,
"total_results": 0
}

Legacy: Direct Streaming (Deprecated)

The original method of streaming document content directly through the API is still supported but deprecated. See Retrieve Document for migration details.


Limitations

  • Documents can only be created from completed batch sessions
  • Not available for real-time sessions
  • Only supported for sessions where transcription results are retained

We plan to expand this functionality with editing, translations, and audio-linked visual editing in upcoming releases.


Next Steps