Audio Encoding
To ensure optimal accuracy and performance with the Scriptix Real-time API, your audio must meet specific encoding standards. This guide outlines the supported format, conversion tips, and an example using ffmpeg
.
✅ Supported Format
All audio streamed via WebSocket must meet the following criteria:
Parameter | Required Value |
---|---|
Encoding | Linear PCM (pcm_s16le ) |
Sample Rate | 16,000 Hz |
Bit Depth | 16-bit |
Channels | Mono (1 channel) |
Container | WAV or raw PCM (no headers) |
❗ Submitting audio that does not match these requirements may result in degraded transcription quality or rejected input.
🔄 Converting Audio Using FFmpeg
Use the following command to convert any audio file into the correct format using FFmpeg:
ffmpeg -y -i "${input_file}" \
-ac 1 -acodec pcm_s16le -ar 16000 \
-f wav "${output_file}"
This will:
Convert audio to mono
Set the sample rate to 16kHz
Encode the audio using signed 16-bit little-endian PCM
🔁 Using FFmpeg in a Streaming Pipeline
You can also use FFmpeg to pipe audio directly from stdin to stdout:
ffmpeg -i - \
-ac 1 -acodec pcm_s16le -ar 16000 \
-f wav -
This is useful for real-time microphone input or live stream conversion.
🎧 Recommendations
Test the audio quality before streaming—poor input = poor transcription.
Avoid compression formats (e.g., MP3, AAC)—they must be decoded to PCM first.
Keep audio chunk sizes around 100ms per WebSocket message.
🛠 Need Help?
If you experience issues with audio encoding or setup, reach out to our support team at info@scriptix.io.