Audio Encoding
To ensure optimal accuracy and performance with the Scriptix Real-time API, your audio must meet specific encoding standards. This guide outlines the supported format, conversion tips, and an example using ffmpeg.
Supported Format
All audio streamed via WebSocket must meet the following criteria:
| Parameter | Required Value |
|---|---|
| Encoding | Linear PCM (pcm_s16le) |
| Sample Rate | 16,000 Hz |
| Bit Depth | 16-bit |
| Channels | Mono (1 channel) |
| Container | WAV or raw PCM (no headers) |
Submitting audio that does not match these requirements may result in degraded transcription quality or rejected input.
Converting Audio Using FFmpeg
Use the following command to convert any audio file into the correct format using FFmpeg:
ffmpeg -y -i "${input_file}" \
-ac 1 -acodec pcm_s16le -ar 16000 \
-f wav "${output_file}"
This will:
Convert audio to mono
Set the sample rate to 16kHz
Encode the audio using signed 16-bit little-endian PCM
Using FFmpeg in a Streaming Pipeline
You can also use FFmpeg to pipe audio directly from stdin to stdout:
ffmpeg -i - \
-ac 1 -acodec pcm_s16le -ar 16000 \
-f wav -
This is useful for real-time microphone input or live stream conversion.
Recommendations
Test the audio quality before streaming—poor input = poor transcription.
Avoid compression formats (e.g., MP3, AAC)—they must be decoded to PCM first.
Keep audio chunk sizes around 100ms per WebSocket message.
Need Help?
If you experience issues with audio encoding or setup, reach out to our support team at info@scriptix.io.