🔄 Migration Guide (V1 → V2)
This guide explains the differences between Batch API v1 and v2 and what to update when migrating, focusing on session initialization and file upload behavior.
📌 Key Changes
| Feature | V1 | V2 |
|---|---|---|
| Upload Method | File sent directly via POST request | TUS protocol with resumable chunked uploads |
| Workflow | 1. Create session 2. Upload file or give URL 3. Check status | 1. Upload file with language & other properties 2. Check status |
| Protocol Requirements | Content-Type: application/json header required | Must comply with TUS protocol |
| Token Requirement | Batch token from scriptix.app | Same – batch token from scriptix.app |
🌐 Batch API – Upload & Workflow Differences
🧾 Session Initialization
| Aspect | V1 | V2 |
|---|---|---|
| Endpoint | POST /api/v3/speech-to-text/session | Not Needed (handled automatically) |
| Header Key | Content-Type: application/json | Not Needed |
| Auth Header | x-zoom-s2t-key: <batch_token> | Same |
| Body | JSON with "language" and optional "media_source" | JSON with "language" only. File upload handled separately |
| Media Upload | Optional in request via "media_source" | Not part of session request – handled via TUS |
| Webhook Options | Supported (webhook_url, webhook_method, webhook_headers) | Still supported, now typically passed during file upload via metadata |
If you used media_source in v1 to start transcription by URL, this step must now be handled separately using metadata during the TUS upload step.
📤 File Upload Changes
| Aspect | V1 | V2 (TUS Required) |
|---|---|---|
| Upload Method | Direct file upload or via URL in session body | Separate file upload using the TUS protocol |
| Upload Endpoint | Combined with session init | POST https://api.scriptix.io/api/v3/files/ |
| Headers | x-zoom-s2t-key | Same (x-zoom-s2t-key) or Authorization (API) |
| Client Requirement | None (standard HTTP POST) | Requires TUS client (e.g., Uppy, tus-js-client, tus-py-client) |
| Metadata | Set in JSON body of session | Set via TUS metadata or .setMeta() depending on client |
| Chunked Upload | ❌ Not supported | ✅ Required — resumable, chunked uploads |
🧪 Migration Example
Before (V1) — POST session with media_source:
{
"language": "en",
"media_source": "https://example.com/audio.mp3",
"keep_source": true,
"meta_data": {
"duration": "100"
}
}
After (V2) — Separate session + TUS upload:
1. Session Init:
{
"language": "en"
}
2. Upload with TUS (metadata example)
{
"language": "en",
"keep_source": "true",
"document": "{\"webhook_url\": \"https://example.com/webhook\"}"
}
TUS clients may use this metadata via .setMeta() or headers like Upload-Metadata depending on implementation.
🔐 Token Usage
You must continue using a Batch token, created at:
No token change is needed between v1 and v2.
📥 Retrieving Results Differences
Retrieving results works the same in both v1 and v2, but how you get the session_id differs if you're using automatic session creation in v2.
🔄 V1 vs V2: Session ID Behavior
| Behavior | V1 | V2 |
|---|---|---|
| Session creation | Manual — always returns session ID | Optional — session created automatically by TUS |
| Retrieving session ID | From session creation response | From TUS upload response metadata (if not created manually) |
| Step required | None | Use the session_id from your manual session creation or from the TUS upload metadata, then call:GET /api/v3/speech-to-text/session/{session_id}/result with the same Batch API token used for the upload |
✅ Migration Action (Only if skipping manual session creation in v2)
If you're not initializing a session manually in v2:
- After uploading a file via TUS, extract the file ID from the
Locationheader. - Call:
GET /api/v3/files/{file_id}
- Use the returned session_id to retrieve results as usual:
GET /api/v3/speech-to-text/session/{session_id}/result
No other changes are required — headers, response structure, and status codes remain the same.
🗑️ Removing Results
There are no changes in how results are removed between Batch API v1 and v2.
The same endpoint, headers, and behavior apply:
DELETE /api/v3/speech-to-text/session/{session_id}- Must use the same batch token that created the session
- Deletes both the transcript result and filename reference
✅ No migration action required for result removal.
🌐 Realtime API – Connection Differences
The method for connecting via WebSocket remains the same in v1 and v2, but there are small differences in the endpoint format and language parameter handling.
🔄 What's Different
| Behavior | v1 Endpoint | v2 Endpoint |
|---|---|---|
| Host | api.scriptix.io | realtime.scriptix.io |
| Path | /realtime | /v2/realtime |
| Language Parameter | ?language=xx-yy (required) | ?language=xx (optional, ISO 639-1 format) |
| Token Parameter | Not required in URL (auth handled via headers) | ?token=<your_realtime_token> (required) |
| Language Fallback | Not supported | ✅ Auto-detection if no language provided |
| Parameter Placement | Path parameter | Query parameter(s) — language optional, token required |
✅ Migration Action
- Update the WebSocket URL from:
wss://api.scriptix.io/realtime?language=nl-nl
to:
wss://realtime.scriptix.io/v2/realtime?token=<your_realtime_token>&language=nl
Use ISO-639-1 codes like nl, en, fr instead of full regional tags like nl-nl
Language is now optional — if not provided, automatic detection will be used.
🔄 Realtime API Protocol – Differences (V1 → V2)
While the message structure and WebSocket flow remain largely the same between versions, the following minor differences exist in protocol properties:
🧾 Start Properties – Differences
| Property | V1 | V2 |
|---|---|---|
actual_numbers | ✅ Available (default: false) | ❌ Not available |
partial | ✅ Available (default: true) | ✅ Available (default: true) |
The actual_numbers property was supported in v1 during the "start" action but has been removed in v2 for simplification and standardization.
The partial property remains available in v2 and works the same way as in v1.