Skip to main content

WebSocket Message Protocol

Real-time WebSocket message protocol for audio streaming and transcription.

Client to Server Messages

Start Command

Microphone mode only. Send after WebSocket connection opens:

{
"action": "start"
}

Audio Data

Send raw audio data as binary (ArrayBuffer) directly through the WebSocket.

Server to Client Messages

State Messages

Loading State:

{
"state": "loading"
}

Listening State:

{
"state": "listening"
}

Transcription Messages

Partial Result:

{
"text": "string",
"is_final": false,
"offset_ms": 0,
"stability": 0
}
FieldTypeDescription
textstringPartial transcription text
is_finalbooleanAlways false for partial results
offset_msnumberTime offset in milliseconds
stabilitynumberStability score

Final Result:

{
"text": "string",
"is_final": true,
"offset_ms": 0,
"words": [],
"speaker": "string"
}
FieldTypeDescription
textstringFinal transcription text
is_finalbooleanAlways true for final results
offset_msnumberTime offset in milliseconds
wordsarrayArray of word segments
speakerstringSpeaker identifier (optional)

Words Array Format:

Each word in the words array: [word, start_ms, end_ms, confidence]

  • word (string) - The word text
  • start_ms (number) - Start time in milliseconds
  • end_ms (number) - End time in milliseconds
  • confidence (number) - Confidence score

Error Messages

{
"error": "string"
}

TypeScript Types

WebSocketMessage

type WebSocketMessage =
| { state: 'loading' | 'listening' }
| { error: string }
| RealtimePartialResult
| RealtimeFinalResult;

RealtimePartialResult

type RealtimePartialResult = {
text: string;
is_final: false;
offset_ms: number;
stability: number;
};

RealtimeFinalResult

type RealtimeFinalResult = {
text: string;
is_final: true;
offset_ms: number;
words: [string, number, number, number][];
speaker?: string;
};

Message Handling

The client receives JSON messages from the server:

  1. State messages - Indicate service status (loading/listening)
  2. Transcription messages - Partial or final results with text
  3. Error messages - Error descriptions

The client sends:

  1. Start command - JSON message to begin (microphone mode only)
  2. Audio data - Binary ArrayBuffer containing audio samples

Notes

  • All server messages are JSON except audio data which is binary
  • Partial results may change as more audio is processed
  • Final results are confirmed and won't change
  • Speaker field is optional in final results
  • Connection timeout is 15 seconds