WebSocket Message Protocol

Real-time WebSocket message protocol for audio streaming and transcription.

Client to Server Messages

Start Command

Microphone mode only. Send after WebSocket connection opens:

{
  "action": "start"
}

Audio Data

Send raw audio data as binary (ArrayBuffer) directly through the WebSocket.

Server to Client Messages

State Messages

Loading State:

{
  "state": "loading"
}

Listening State:

{
  "state": "listening"
}

Transcription Messages

Partial Result:

{
  "text": "string",
  "is_final": false,
  "offset_ms": 0,
  "stability": 0
}

Field	Type	Description
`text`	string	Partial transcription text
`is_final`	boolean	Always false for partial results
`offset_ms`	number	Time offset in milliseconds
`stability`	number	Stability score

Final Result:

{
  "text": "string",
  "is_final": true,
  "offset_ms": 0,
  "words": [],
  "speaker": "string"
}

Field	Type	Description
`text`	string	Final transcription text
`is_final`	boolean	Always true for final results
`offset_ms`	number	Time offset in milliseconds
`words`	array	Array of word segments
`speaker`	string	Speaker identifier (optional)

Words Array Format:

Each word in the words array: [word, start_ms, end_ms, confidence]

word (string) - The word text
start_ms (number) - Start time in milliseconds
end_ms (number) - End time in milliseconds
confidence (number) - Confidence score

Error Messages

{
  "error": "string"
}

TypeScript Types

WebSocketMessage

type WebSocketMessage =
  | { state: 'loading' | 'listening' }
  | { error: string }
  | RealtimePartialResult
  | RealtimeFinalResult;

RealtimePartialResult

type RealtimePartialResult = {
  text: string;
  is_final: false;
  offset_ms: number;
  stability: number;
};

RealtimeFinalResult

type RealtimeFinalResult = {
  text: string;
  is_final: true;
  offset_ms: number;
  words: [string, number, number, number][];
  speaker?: string;
};

Message Handling

The client receives JSON messages from the server:

State messages - Indicate service status (loading/listening)
Transcription messages - Partial or final results with text
Error messages - Error descriptions

The client sends:

Start command - JSON message to begin (microphone mode only)
Audio data - Binary ArrayBuffer containing audio samples

Notes

All server messages are JSON except audio data which is binary
Partial results may change as more audio is processed
Final results are confirmed and won't change
Speaker field is optional in final results
Connection timeout is 15 seconds

Client to Server Messages​

Start Command​

Audio Data​

Server to Client Messages​

State Messages​

Transcription Messages​

Error Messages​

TypeScript Types​

WebSocketMessage​

RealtimePartialResult​

RealtimeFinalResult​

Message Handling​

Notes​

Related​