WebSocket Message Protocol
Real-time WebSocket message protocol for audio streaming and transcription.
Client to Server Messages
Start Command
Microphone mode only. Send after WebSocket connection opens:
{
"action": "start"
}
Audio Data
Send raw audio data as binary (ArrayBuffer) directly through the WebSocket.
Server to Client Messages
State Messages
Loading State:
{
"state": "loading"
}
Listening State:
{
"state": "listening"
}
Transcription Messages
Partial Result:
{
"text": "string",
"is_final": false,
"offset_ms": 0,
"stability": 0
}
| Field | Type | Description |
|---|---|---|
text | string | Partial transcription text |
is_final | boolean | Always false for partial results |
offset_ms | number | Time offset in milliseconds |
stability | number | Stability score |
Final Result:
{
"text": "string",
"is_final": true,
"offset_ms": 0,
"words": [],
"speaker": "string"
}
| Field | Type | Description |
|---|---|---|
text | string | Final transcription text |
is_final | boolean | Always true for final results |
offset_ms | number | Time offset in milliseconds |
words | array | Array of word segments |
speaker | string | Speaker identifier (optional) |
Words Array Format:
Each word in the words array: [word, start_ms, end_ms, confidence]
word(string) - The word textstart_ms(number) - Start time in millisecondsend_ms(number) - End time in millisecondsconfidence(number) - Confidence score
Error Messages
{
"error": "string"
}
TypeScript Types
WebSocketMessage
type WebSocketMessage =
| { state: 'loading' | 'listening' }
| { error: string }
| RealtimePartialResult
| RealtimeFinalResult;
RealtimePartialResult
type RealtimePartialResult = {
text: string;
is_final: false;
offset_ms: number;
stability: number;
};
RealtimeFinalResult
type RealtimeFinalResult = {
text: string;
is_final: true;
offset_ms: number;
words: [string, number, number, number][];
speaker?: string;
};
Message Handling
The client receives JSON messages from the server:
- State messages - Indicate service status (loading/listening)
- Transcription messages - Partial or final results with text
- Error messages - Error descriptions
The client sends:
- Start command - JSON message to begin (microphone mode only)
- Audio data - Binary ArrayBuffer containing audio samples
Notes
- All server messages are JSON except audio data which is binary
- Partial results may change as more audio is processed
- Final results are confirmed and won't change
- Speaker field is optional in final results
- Connection timeout is 15 seconds