Skip to main content

Performance Tips

This guide outlines best practices for optimizing the responsiveness and transcription quality when using the Scriptix Real-time API.

Performance is influenced by how you encode and stream audio. The size, frequency, and format of your audio chunks all affect both latency and accuracy.


To achieve the best balance between speed and quality:

Audio FormatRecommended Chunk SizeReason
PCM 16kHz (default)8 KB – 64 KBMaintains low latency while preserving context
PCM 8kHz (call center)Adjust accordinglySmaller bandwidth, slower response—requires tuning

With 16kHz 16-bit PCM audio, 1 second of audio ≈ 32 KB of data. So, sending 256 ms of audio ≈ 8 KB.


Chunk Size vs. Performance

Chunk SizeLatencyAccuracy
~4 KBVery fastMay reduce contextual quality
8–32 KBFastGood balance
64 KB+SlowerHigh contextual accuracy

Tip: Test with your actual audio source. Some streams benefit more from context than others.


Why Size Matters

Smaller chunks result in:

  • Faster responses
  • Less contextual information for the model

Larger chunks result in:

  • Slower responses
  • More accurate transcriptions due to richer context

Special Case: 8kHz Audio Models

Scriptix offers 8kHz private models for specific use cases like call center transcriptions.

If you're using an 8kHz model:

  • Use 16-bit little-endian PCM audio
  • Adjust chunk size to match the lower sample rate (e.g., 1 second ≈ 16 KB)
  • Expect slightly higher latency but optimized for narrow-band audio

Contact Scriptix support if you're interested in using 8kHz models.


Final Best Practices

  • Stream regularly – Avoid sending large bursts or long gaps
  • Maintain audio rate – Consistent format = consistent results
  • Monitor round-trip time (RTT) – Latency spikes may indicate buffer or network issues
  • Test different chunk sizes – Depending on your use case, smaller or larger blocks may yield better trade-offs