Message Protocol
Complete WebSocket message protocol reference.
Client → Server Messages
1. Audio Data (Binary)
Send raw PCM audio bytes:
ws.send(audioBuffer); // ArrayBuffer or Buffer
Format: Raw 16-bit PCM audio data
2. Audio Data (JSON)
{
"type": "audio",
"data": "base64_encoded_pcm_audio"
}
3. Configuration
{
"type": "configure",
"sample_rate": 16000,
"language": "en",
"interim_results": true
}
4. End of Speech
{
"type": "end_of_speech"
}
Signal end of utterance to get final result immediately.
5. End of Session
{
"type": "end_of_session"
}
Gracefully close session.
Server → Client Messages
1. Partial Transcript
{
"type": "partial",
"text": "Hello how are",
"confidence": 0.85,
"timestamp": 1642089600.5,
"is_final": false
}
Interim result that may change.
2. Final Transcript
{
"type": "final",
"text": "Hello, how are you doing today?",
"confidence": 0.95,
"start": 0.0,
"end": 2.5,
"timestamp": 1642089603.0,
"is_final": true
}
Confirmed result that won't change.
3. Error
{
"type": "error",
"error": "Invalid audio format",
"error_code": "INVALID_AUDIO_FORMAT"
}
4. Session Info
{
"type": "session_info",
"session_id": "session_abc123",
"language": "en",
"sample_rate": 16000
}
Message Flow Example
Client Server
| |
|-- Connect WebSocket ---------->|
|<-- session_info ---------------|
| |
|-- audio chunk (binary) ------->|
|-- audio chunk (binary) ------->|
|<-- partial: "Hello" -----------|
|-- audio chunk (binary) ------->|
|<-- partial: "Hello how" -------|
|-- audio chunk (binary) ------->|
|<-- partial: "Hello how are" ---|
|-- end_of_speech -------------->|
|<-- final: "Hello, how are..." -|
| |
|-- end_of_session ------------->|
|-- Close WebSocket ------------>|