Skip to main content

Message Protocol

Complete WebSocket message protocol reference.

Client → Server Messages

1. Audio Data (Binary)

Send raw PCM audio bytes:

ws.send(audioBuffer);  // ArrayBuffer or Buffer

Format: Raw 16-bit PCM audio data

2. Audio Data (JSON)

{
"type": "audio",
"data": "base64_encoded_pcm_audio"
}

3. Configuration

{
"type": "configure",
"sample_rate": 16000,
"language": "en",
"interim_results": true
}

4. End of Speech

{
"type": "end_of_speech"
}

Signal end of utterance to get final result immediately.

5. End of Session

{
"type": "end_of_session"
}

Gracefully close session.

Server → Client Messages

1. Partial Transcript

{
"type": "partial",
"text": "Hello how are",
"confidence": 0.85,
"timestamp": 1642089600.5,
"is_final": false
}

Interim result that may change.

2. Final Transcript

{
"type": "final",
"text": "Hello, how are you doing today?",
"confidence": 0.95,
"start": 0.0,
"end": 2.5,
"timestamp": 1642089603.0,
"is_final": true
}

Confirmed result that won't change.

3. Error

{
"type": "error",
"error": "Invalid audio format",
"error_code": "INVALID_AUDIO_FORMAT"
}

4. Session Info

{
"type": "session_info",
"session_id": "session_abc123",
"language": "en",
"sample_rate": 16000
}

Message Flow Example

Client                          Server
| |
|-- Connect WebSocket ---------->|
|<-- session_info ---------------|
| |
|-- audio chunk (binary) ------->|
|-- audio chunk (binary) ------->|
|<-- partial: "Hello" -----------|
|-- audio chunk (binary) ------->|
|<-- partial: "Hello how" -------|
|-- audio chunk (binary) ------->|
|<-- partial: "Hello how are" ---|
|-- end_of_speech -------------->|
|<-- final: "Hello, how are..." -|
| |
|-- end_of_session ------------->|
|-- Close WebSocket ------------>|