Skip to main content

Batch Transcription API

Upload audio and video files for asynchronous speech-to-text transcription.

What is Batch Transcription?

Batch transcription processes pre-recorded audio/video files asynchronously:

  1. Upload audio or video file
  2. Wait for processing (minutes to hours depending on file size)
  3. Retrieve transcript with timestamps, speakers, and formatting

Best for: Podcasts, interviews, meetings, video subtitles, recorded content

Quick Start

1. Upload File

curl -X POST https://api.scriptix.io/api/v3/stt \
-H "Authorization: Bearer YOUR_API_KEY" \
-F "language=en" \
-F "audio_file=@meeting.mp3"

Response:

{
"id": "stt_abc123",
"status": "processing",
"language": "en",
"created_at": "2025-01-17T10:00:00Z"
}

2. Check Status

curl https://api.scriptix.io/api/v3/stt/stt_abc123/status \
-H "Authorization: Bearer YOUR_API_KEY"

Response:

{
"id": "stt_abc123",
"status": "completed",
"progress": 100,
"document_id": 456
}

3. Get Transcript

curl https://api.scriptix.io/api/v3/stt/stt_abc123/result \
-H "Authorization: Bearer YOUR_API_KEY"

Response:

{
"id": "stt_abc123",
"document_id": 456,
"transcript": "Hello, welcome to today's meeting...",
"segments": [
{
"start": 0.0,
"end": 2.5,
"text": "Hello, welcome to today's meeting",
"speaker": "Speaker 1"
}
]
}

Supported File Formats

Audio Formats

  • MP3 (.mp3)
  • WAV (.wav)
  • FLAC (.flac)
  • M4A (.m4a)
  • AAC (.aac)
  • OGG (.ogg)

Video Formats

  • MP4 (.mp4)
  • MOV (.mov)
  • AVI (.avi)
  • MKV (.mkv)
  • WEBM (.webm)

Note: Audio is extracted automatically from video files.

File Size Limits

Upload MethodMax File SizeBest For
Standard Upload500 MBMost files
TUS Upload5 GBLarge files, unstable connections

For files > 500MB, use TUS Upload.

Processing Time

Typical processing time: 10-20% of audio duration

Audio DurationProcessing Time
10 minutes1-2 minutes
1 hour6-12 minutes
3 hours18-36 minutes

Factors affecting speed:

  • File size and format
  • Audio quality
  • Features enabled (speaker diarization, etc.)
  • Current system load

Features

Speaker Diarization

Automatically detect and label different speakers:

curl -X POST https://api.scriptix.io/api/v3/stt \
-H "Authorization: Bearer YOUR_API_KEY" \
-F "language=en" \
-F "diarization=true" \
-F "audio_file=@meeting.mp3"

Timestamps

Get word-level timestamps:

curl -X POST https://api.scriptix.io/api/v3/stt \
-H "Authorization: Bearer YOUR_API_KEY" \
-F "language=en" \
-F "timestamps=word" \
-F "audio_file=@audio.mp3"

Custom Models

Use custom models for domain-specific accuracy:

curl -X POST https://api.scriptix.io/api/v3/stt \
-H "Authorization: Bearer YOUR_API_KEY" \
-F "model=custom_model_123" \
-F "language=en" \
-F "audio_file=@medical.mp3"

Glossaries

Apply custom glossaries for terminology:

curl -X POST https://api.scriptix.io/api/v3/stt \
-H "Authorization: Bearer YOUR_API_KEY" \
-F "language=en" \
-F "glossary_id=789" \
-F "audio_file=@audio.mp3"

Webhooks

Receive notifications when transcription completes:

curl -X POST https://api.scriptix.io/api/v3/stt \
-H "Authorization: Bearer YOUR_API_KEY" \
-F "language=en" \
-F "webhook_url=https://yourapp.com/webhook" \
-F "audio_file=@audio.mp3"

See Webhooks Guide.

Status Lifecycle

uploaded → queued → processing → completed
→ failed
StatusDescription
uploadedFile uploaded successfully
queuedWaiting in processing queue
processingTranscription in progress
completedTranscription finished
failedError occurred

API Endpoints

MethodEndpointDescription
POST/api/v3/sttUpload file for transcription
POST/api/v3/stt/tusInitialize TUS upload
GET/api/v3/stt/{id}/statusCheck transcription status
GET/api/v3/stt/{id}/resultGet transcript result

Pricing

Batch transcription is charged per audio minute:

PlanPrice per MinuteIncluded Minutes
Free-60 min/month
Bronze$0.006500 min/month
Silver$0.0052,000 min/month
Gold$0.00410,000 min/month
EnterpriseCustomCustom

Additional features:

  • Speaker diarization: +$0.002/min
  • Custom models: Included (Gold+)
  • Priority processing: Included (Gold+)

Best Practices

1. Use Webhooks

Don't poll for status - use webhooks:

# ❌ Polling (wastes API calls)
while True:
status = check_status(job_id)
if status == 'completed':
break
time.sleep(10)

# ✅ Webhooks (efficient)
upload_file(audio, webhook_url='https://yourapp.com/webhook')

2. Specify Language

Always specify language for best accuracy:

# ✅ Specify language
-F "language=en"

# ⚠️ Auto-detection (slower)
-F "language=auto"

3. Optimize Audio Files

  • Use compressed formats (MP3, AAC) for faster upload
  • Minimum sample rate: 16kHz
  • Mono audio sufficient for most use cases

4. Handle Errors

Implement retry logic for failed transcriptions:

def transcribe_with_retry(file_path, max_retries=3):
for attempt in range(max_retries):
try:
result = transcribe_file(file_path)
if result['status'] != 'failed':
return result
except Exception as e:
if attempt == max_retries - 1:
raise
time.sleep(2 ** attempt)

Complete Example

import requests
import time

API_KEY = 'YOUR_API_KEY'
BASE_URL = 'https://api.scriptix.io/api/v3'

def transcribe_file(file_path, language='en'):
# 1. Upload file
with open(file_path, 'rb') as f:
response = requests.post(
f'{BASE_URL}/stt',
headers={'Authorization': f'Bearer {API_KEY}'},
files={'audio_file': f},
data={'language': language, 'diarization': 'true'}
)

job = response.json()
job_id = job['id']
print(f"Uploaded: {job_id}")

# 2. Poll status
while True:
status_response = requests.get(
f'{BASE_URL}/stt/{job_id}/status',
headers={'Authorization': f'Bearer {API_KEY}'}
)
status_data = status_response.json()

if status_data['status'] == 'completed':
break
elif status_data['status'] == 'failed':
raise Exception("Transcription failed")

print(f"Progress: {status_data.get('progress', 0)}%")
time.sleep(10)

# 3. Get result
result_response = requests.get(
f'{BASE_URL}/stt/{job_id}/result',
headers={'Authorization': f'Bearer {API_KEY}'}
)

return result_response.json()

# Usage
result = transcribe_file('meeting.mp3', language='en')
print(result['transcript'])

Next Steps


Ready to transcribe? Start with Upload Files.