Audio Encoding

To ensure optimal accuracy and performance with the Scriptix Real-time API, your audio must meet specific encoding standards. This guide outlines the supported format, conversion tips, and an example using ffmpeg.

Supported Format

All audio streamed via WebSocket must meet the following criteria:

Parameter	Required Value
Encoding	Linear PCM (`pcm_s16le`)
Sample Rate	16,000 Hz
Bit Depth	16-bit
Channels	Mono (1 channel)
Container	WAV or raw PCM (no headers)

Submitting audio that does not match these requirements may result in degraded transcription quality or rejected input.

Converting Audio Using FFmpeg

Use the following command to convert any audio file into the correct format using FFmpeg:

ffmpeg -y -i "${input_file}" \
  -ac 1 -acodec pcm_s16le -ar 16000 \
  -f wav "${output_file}"

This will:

Convert audio to mono

Set the sample rate to 16kHz

Encode the audio using signed 16-bit little-endian PCM

Using FFmpeg in a Streaming Pipeline

You can also use FFmpeg to pipe audio directly from stdin to stdout:

ffmpeg -i - \
  -ac 1 -acodec pcm_s16le -ar 16000 \
  -f wav -

This is useful for real-time microphone input or live stream conversion.

Recommendations

Test the audio quality before streaming—poor input = poor transcription.

Avoid compression formats (e.g., MP3, AAC)—they must be decoded to PCM first.

Keep audio chunk sizes around 100ms per WebSocket message.

Need Help?

If you experience issues with audio encoding or setup, reach out to our support team at info@scriptix.io.

Supported Format​

Converting Audio Using FFmpeg​

Using FFmpeg in a Streaming Pipeline​

Recommendations​

Need Help?​

Next Steps​

Supported Format

Converting Audio Using FFmpeg

Using FFmpeg in a Streaming Pipeline

Recommendations

Need Help?

Next Steps