Skip to main content

Messages

All control messages are exchanged in JSON format, with the exception of the audio stream. The following control messages are supported.

Client Messages

The following messages can be sent to the server:

ActionMessageDescription
Start Session{"action": "start"}Connection with the Speech-to-Text engine is initialised. If the connection is successful and the WebSocket is ready to process data, you will receive a "state": "listening" message. Other properties are optionally described below.
Stop Session{"action": "stop"}Connection with the Speech-to-Text engine is stopped. The service will return the remaining blob to be processed. Once the connection is closed, you will receive a "state": "stopped" message. You cannot start a new session afterward.
Send audio data<binary>Binary data formatted in PCM WAVE Mono 16kHz.

Start Properties

Below are the configuration options available when initializing a session:

PropertyDefaultDescription
actual_numbersFalseBool – Enable or disable conversion between actual numbers and textual numbers. Default: False.
keynullString – Provide the API key directly instead of using HTTP headers.
partialTrueBool – Enable or disable partial results. Default: True.

Server Messages

The following messages can be received from the server:


Status Changes

ActionMessageDescription
Server listening{"state": "listening"}Connection with the Speech-to-Text engine is initialized. If the connection is successful and the WebSocket is ready to process data, you will receive back a "state": "listening" message. After receiving this message, you are able to send audio.
Server stopped listening{"state": "stopped"}Connection with the Speech-to-Text engine is stopped. The service will return the remaining blob that has to be processed. Once the connection is closed, you will receive a "state": "stopped" message. It is not possible to start a new session once you've stopped it.
Engine shutting down{"state": "shutting_down", "at": 1234567890}The real-time engine sends this message to let you know it will shut down at a specific time in the future. This message is sent one hour before shutdown and gives you enough time to finish the session gracefully.

Partial results

Partial result The first result sent after receiving a binary blob is a partial. The partial contains the spoken text currently detected and may be subject to change.

{
"partial": "Je hoort natuurlijk zeker"
}
info

The result is an object with two keys, the result and text. The result consists of the words spoken with metadata. The metadata is formatted as following:

{
"result": [
[ "Je", 12046, 12286, 1 ],
[ "hoort", 12286, 12526, 1 ],
[ "natuurlijk", 12526, 12796, 1 ],
[ "zeker", 12796, 13096, 1 ],
[ "in", 13096, 13156, 1 ],
[ "’s-gravenhage", 13156, 13666, 1 ],
[ "verhalen", 13666, 14055, 0.999713 ]
],
"text": "Je hoort natuurlijk zeker in ‘s-gravenhage verhalen"
}

The time_start and time_stop are timepoints in the amount audio processed (including silence). This does not correspond with the actual session duration. This makes it possible to send audio up to twice the real-time speed. The confidence is a float corresponding to a percentage of confidence the result is accurate. The text key contains a full representation of all words detected

Error messages

MessageDescription
{"error": "Session not started"}The client sent binary data (audio stream) but did not start a session yet. Data will not be processed.
{"error": "backend Client tried to start a new session while there is already listening"}There is already a backend session running.
{"error": "restarting of sessions is not supported"}Client tried to start a new session after finishing a previous one. This is not supported; the connection should be closed.
{"error": "unable to start backend"}The real-time engine was unable to connect to a backend system. Contact support if the problem persists.
{"error": "engine_not_responding"}Raised when the backend system doesn’t respond to a session request. Contact support if this problem persists.