Skip to main content

Audio Encoding

To ensure optimal accuracy and performance with the Scriptix Real-time API, your audio must meet specific encoding standards. This guide outlines the supported format, conversion tips, and an example using ffmpeg.


Supported Format

All audio streamed via WebSocket must meet the following criteria:

ParameterRequired Value
EncodingLinear PCM (pcm_s16le)
Sample Rate16,000 Hz
Bit Depth16-bit
ChannelsMono (1 channel)
ContainerWAV or raw PCM (no headers)

Submitting audio that does not match these requirements may result in degraded transcription quality or rejected input.


Converting Audio Using FFmpeg

Use the following command to convert any audio file into the correct format using FFmpeg:

ffmpeg -y -i "${input_file}" \
-ac 1 -acodec pcm_s16le -ar 16000 \
-f wav "${output_file}"

This will:

Convert audio to mono

Set the sample rate to 16kHz

Encode the audio using signed 16-bit little-endian PCM


Using FFmpeg in a Streaming Pipeline

You can also use FFmpeg to pipe audio directly from stdin to stdout:

ffmpeg -i - \
-ac 1 -acodec pcm_s16le -ar 16000 \
-f wav -

This is useful for real-time microphone input or live stream conversion.

Recommendations

Test the audio quality before streaming—poor input = poor transcription.

Avoid compression formats (e.g., MP3, AAC)—they must be decoded to PCM first.

Keep audio chunk sizes around 100ms per WebSocket message.

Need Help?

If you experience issues with audio encoding or setup, reach out to our support team at info@scriptix.io.


Next Steps