Skip to main content

Audio Encoding

To ensure optimal accuracy and performance with the Scriptix Real-time API, your audio must meet specific encoding standards. This guide outlines the supported format, conversion tips, and an example using ffmpeg.


✅ Supported Format

All audio streamed via WebSocket must meet the following criteria:

ParameterRequired Value
EncodingLinear PCM (pcm_s16le)
Sample Rate16,000 Hz
Bit Depth16-bit
ChannelsMono (1 channel)
ContainerWAV or raw PCM (no headers)

❗ Submitting audio that does not match these requirements may result in degraded transcription quality or rejected input.


🔄 Converting Audio Using FFmpeg

Use the following command to convert any audio file into the correct format using FFmpeg:

ffmpeg -y -i "${input_file}" \
-ac 1 -acodec pcm_s16le -ar 16000 \
-f wav "${output_file}"

This will:

Convert audio to mono

Set the sample rate to 16kHz

Encode the audio using signed 16-bit little-endian PCM


🔁 Using FFmpeg in a Streaming Pipeline

You can also use FFmpeg to pipe audio directly from stdin to stdout:

ffmpeg -i - \
-ac 1 -acodec pcm_s16le -ar 16000 \
-f wav -

This is useful for real-time microphone input or live stream conversion.

🎧 Recommendations

Test the audio quality before streaming—poor input = poor transcription.

Avoid compression formats (e.g., MP3, AAC)—they must be decoded to PCM first.

Keep audio chunk sizes around 100ms per WebSocket message.

🛠 Need Help?

If you experience issues with audio encoding or setup, reach out to our support team at info@scriptix.io.


Next Steps