Skip to main content

⚑ Performance Tips

This guide outlines best practices for optimizing the responsiveness and transcription quality when using the Scriptix Real-time API.

Performance is influenced by how you encode and stream audio. The size, frequency, and format of your audio chunks all affect both latency and accuracy.


To achieve the best balance between speed and quality:

Audio FormatRecommended Chunk SizeReason
PCM 16kHz (default)8 KB – 64 KBMaintains low latency while preserving context
PCM 8kHz (call center)Adjust accordinglySmaller bandwidth, slower responseβ€”requires tuning

With 16kHz 16-bit PCM audio, 1 second of audio β‰ˆ 32 KB of data. So, sending 256 ms of audio β‰ˆ 8 KB.


πŸ“ˆ Chunk Size vs. Performance​

Chunk SizeLatencyAccuracy
~4 KBβœ… Very fast⚠️ May reduce contextual quality
8–32 KBβœ… Fastβœ… Good balance
64 KB+⚠️ Slowerβœ… High contextual accuracy

βœ… Tip: Test with your actual audio source. Some streams benefit more from context than others.


🧠 Why Size Matters​

Smaller chunks result in:

  • Faster responses
  • Less contextual information for the model

Larger chunks result in:

  • Slower responses
  • More accurate transcriptions due to richer context

πŸ“ž Special Case: 8kHz Audio Models​

Scriptix offers 8kHz private models for specific use cases like call center transcriptions.

If you're using an 8kHz model:

  • Use 16-bit little-endian PCM audio
  • Adjust chunk size to match the lower sample rate (e.g., 1 second β‰ˆ 16 KB)
  • Expect slightly higher latency but optimized for narrow-band audio

πŸ“© Contact Scriptix support if you're interested in using 8kHz models.


βœ… Final Best Practices​

  • Stream regularly – Avoid sending large bursts or long gaps
  • Maintain audio rate – Consistent format = consistent results
  • Monitor round-trip time (RTT) – Latency spikes may indicate buffer or network issues
  • Test different chunk sizes – Depending on your use case, smaller or larger blocks may yield better trade-offs