Audio Encoding

To ensure optimal accuracy and performance with the Scriptix Real-time API, your audio must meet specific encoding standards. This guide outlines the supported format, conversion tips, and an example using ffmpeg.

✅ Supported Format

All audio streamed via WebSocket must meet the following criteria:

Parameter	Required Value
Encoding	Linear PCM (`pcm_s16le`)
Sample Rate	16,000 Hz
Bit Depth	16-bit
Channels	Mono (1 channel)
Container	WAV or raw PCM (no headers)

❗ Submitting audio that does not match these requirements may result in degraded transcription quality or rejected input.

🔄 Converting Audio Using FFmpeg

Use the following command to convert any audio file into the correct format using FFmpeg:

ffmpeg -y -i "${input_file}" \
  -ac 1 -acodec pcm_s16le -ar 16000 \
  -f wav "${output_file}"

This will:

Convert audio to mono

Set the sample rate to 16kHz

Encode the audio using signed 16-bit little-endian PCM

🔁 Using FFmpeg in a Streaming Pipeline

You can also use FFmpeg to pipe audio directly from stdin to stdout:

ffmpeg -i - \
  -ac 1 -acodec pcm_s16le -ar 16000 \
  -f wav -

This is useful for real-time microphone input or live stream conversion.

🎧 Recommendations

Test the audio quality before streaming—poor input = poor transcription.

Avoid compression formats (e.g., MP3, AAC)—they must be decoded to PCM first.

Keep audio chunk sizes around 100ms per WebSocket message.

🛠 Need Help?

If you experience issues with audio encoding or setup, reach out to our support team at info@scriptix.io.

✅ Supported Format​

🔄 Converting Audio Using FFmpeg​

🔁 Using FFmpeg in a Streaming Pipeline​

🎧 Recommendations​

🛠 Need Help?​

Next Steps​

✅ Supported Format

🔄 Converting Audio Using FFmpeg

🔁 Using FFmpeg in a Streaming Pipeline

🎧 Recommendations

🛠 Need Help?

Next Steps