Real-time Streaming API
Stream live audio for real-time speech-to-text transcription via WebSocket.
What is Real-time Transcription?
Stream audio in real-time and receive transcripts as speech occurs:
- Initialize WebSocket session
- Stream audio chunks continuously
- Receive transcripts in real-time
Best for: Live captions, voice assistants, call centers, live events
Quick Start
1. Initialize Session
curl -X POST https://api.scriptix.io/api/v4/realtime/session \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"language": "en",
"sample_rate": 16000
}'
Response:
{
"session_id": "session_abc123",
"websocket_url": "wss://api.scriptix.io/realtime/session_abc123",
"expires_at": "2025-01-17T14:00:00Z"
}
2. Connect WebSocket
const ws = new WebSocket('wss://api.scriptix.io/realtime/session_abc123');
ws.on('open', () => {
console.log('Connected!');
// Start streaming audio
});
ws.on('message', (data) => {
const message = JSON.parse(data);
if (message.type === 'transcript') {
console.log(message.text);
}
});
3. Stream Audio
// Send audio chunks (binary)
function streamAudio(audioBuffer) {
ws.send(audioBuffer);
}
// Or send base64-encoded
function streamAudioBase64(audioData) {
ws.send(JSON.stringify({
type: 'audio',
data: audioData // base64 string
}));
}
4. Receive Transcripts
ws.on('message', (data) => {
const message = JSON.parse(data);
if (message.type === 'partial') {
// Interim result (may change)
console.log('Partial:', message.text);
} else if (message.type === 'final') {
// Final result (won't change)
console.log('Final:', message.text);
}
});
Supported Audio Formats
Required Specifications
| Parameter | Value |
|---|---|
| Sample Rate | 8000, 16000, 22050, 44100, 48000 Hz |
| Bit Depth | 16-bit |
| Channels | Mono (1 channel) |
| Encoding | PCM, μ-law, A-law |
| Format | Raw audio (PCM) or base64-encoded |
Recommended Settings
- Sample Rate: 16000 Hz
- Bit Depth: 16-bit
- Encoding: PCM
- Chunk Size: 20-50ms of audio
Message Protocol
Client → Server
1. Audio Data (Binary)
Send raw audio bytes:
ws.send(audioBuffer); // ArrayBuffer or Buffer
2. Audio Data (JSON)
{
"type": "audio",
"data": "base64_encoded_audio_data"
}
3. Control Messages
{
"type": "configure",
"sample_rate": 16000,
"language": "en"
}
{
"type": "end_of_speech"
}
Server → Client
1. Partial Transcript
{
"type": "partial",
"text": "Hello how are",
"confidence": 0.85,
"timestamp": 1642089600.5
}
2. Final Transcript
{
"type": "final",
"text": "Hello, how are you doing today?",
"confidence": 0.95,
"start": 0.0,
"end": 2.5,
"timestamp": 1642089603.0
}
3. Error
{
"type": "error",
"error": "Invalid audio format",
"error_code": "INVALID_AUDIO_FORMAT"
}
Complete Example
Python with websocket-client
import websocket
import json
import pyaudio
# 1. Initialize session
response = requests.post(
'https://api.scriptix.io/api/v4/realtime/session',
headers={
'Authorization': 'Bearer YOUR_API_KEY',
'Content-Type': 'application/json'
},
json={'language': 'en', 'sample_rate': 16000}
)
session = response.json()
ws_url = session['websocket_url']
# 2. Connect WebSocket
def on_message(ws, message):
data = json.loads(message)
if data['type'] == 'final':
print(f"Final: {data['text']}")
elif data['type'] == 'partial':
print(f"Partial: {data['text']}")
def on_error(ws, error):
print(f"Error: {error}")
ws = websocket.WebSocketApp(
ws_url,
on_message=on_message,
on_error=on_error
)
# 3. Stream audio from microphone
def stream_microphone():
p = pyaudio.PyAudio()
stream = p.open(
format=pyaudio.paInt16,
channels=1,
rate=16000,
input=True,
frames_per_buffer=1024
)
while True:
data = stream.read(1024)
ws.send(data, opcode=websocket.ABNF.OPCODE_BINARY)
# Run in separate threads
import threading
threading.Thread(target=ws.run_forever).start()
threading.Thread(target=stream_microphone).start()
JavaScript/Node.js
const WebSocket = require('ws');
const mic = require('mic');
// 1. Initialize session
const response = await fetch(
'https://api.scriptix.io/api/v4/realtime/session',
{
method: 'POST',
headers: {
'Authorization': 'Bearer YOUR_API_KEY',
'Content-Type': 'application/json'
},
body: JSON.stringify({
language: 'en',
sample_rate: 16000
})
}
);
const session = await response.json();
// 2. Connect WebSocket
const ws = new WebSocket(session.websocket_url);
ws.on('open', () => {
console.log('Connected!');
// 3. Stream microphone audio
const micInstance = mic({
rate: '16000',
channels: '1',
encoding: 'signed-integer'
});
const micInputStream = micInstance.getAudioStream();
micInputStream.on('data', (data) => {
ws.send(data);
});
micInstance.start();
});
// 4. Receive exports
ws.on('message', (data) => {
const message = JSON.parse(data);
if (message.type === 'final') {
console.log('Final:', message.text);
} else if (message.type === 'partial') {
process.stdout.write(`\rPartial: ${message.text}`);
}
});
Features
Interim Results
Get partial transcripts while speaking:
- Partial: May change as more audio received
- Final: Won't change, confirmed result
Low Latency
Typical latency: 200-500ms from speech to transcript
Continuous Sessions
Keep session open for up to 4 hours of continuous streaming
Session Management
Session Timeout
- Idle timeout: 30 seconds of silence closes session
- Max duration: 4 hours
- Reconnection: Create new session if disconnected
Close Session
// Graceful close
ws.send(JSON.stringify({ type: 'end_of_session' }));
ws.close();
Pricing
Real-time transcription is charged per audio minute:
| Plan | Price per Minute | Concurrent Sessions |
|---|---|---|
| Free | - | 0 (not available) |
| Bronze | $0.012 | 2 |
| Silver | $0.010 | 5 |
| Gold | $0.008 | 10 |
| Enterprise | Custom | Custom |
Note: Real-time costs 2x batch transcription due to compute requirements.
Best Practices
1. Optimal Audio Chunks
Send 20-50ms chunks for best latency/quality balance:
# 16kHz, 16-bit, mono
chunk_size = 16000 * 2 * 0.02 # 20ms chunks = 640 bytes
2. Handle Network Issues
Implement reconnection logic:
let reconnectAttempts = 0;
function connect() {
const ws = new WebSocket(websocketUrl);
ws.on('close', () => {
if (reconnectAttempts < 5) {
setTimeout(() => {
reconnectAttempts++;
connect();
}, 1000 * reconnectAttempts);
}
});
}
3. Buffer Management
Don't buffer too much audio client-side:
// ❌ Large buffer causes delay
const buffer = [];
setInterval(() => {
ws.send(buffer.join());
buffer = [];
}, 5000); // 5 second delay!
// ✅ Send immediately
micInputStream.on('data', (chunk) => {
ws.send(chunk);
});
Limitations
- Max session: 4 hours
- Idle timeout: 30 seconds
- Audio format: PCM only (no MP3/AAC)
- Sample rates: Fixed set (8, 16, 22.05, 44.1, 48 kHz)
- Channels: Mono only
Next Steps
- Initialize Session - Session creation
- WebSocket Connection - Connection guide
- Message Protocol - Complete protocol reference
- Audio Formats - Audio specifications
- Examples - Code examples
Ready to stream? Start with Initialize Session.