Skip to main content

Audio Formats - Real-time API

Audio format specifications for real-time streaming.

Required Specifications

ParameterValue
Sample Rate8000, 16000, 22050, 44100, or 48000 Hz
Bit Depth16-bit
ChannelsMono (1 channel)
EncodingPCM (signed-integer)
EndiannessLittle-endian
  • Sample Rate: 16000 Hz (best quality/bandwidth balance)
  • Chunk Size: 20-50ms of audio
  • Buffer Size: 640-1600 bytes (for 16kHz, 20-50ms)

Chunk Size Calculation

bytes_per_chunk = sample_rate * duration_seconds * 2

# Examples:
# 16kHz, 20ms: 16000 * 0.02 * 2 = 640 bytes
# 16kHz, 50ms: 16000 * 0.05 * 2 = 1600 bytes

Audio Capture Examples

Python (PyAudio)

import pyaudio

p = pyaudio.PyAudio()

stream = p.open(
format=pyaudio.paInt16, # 16-bit
channels=1, # Mono
rate=16000, # 16kHz
input=True,
frames_per_buffer=640 # 20ms chunks
)

while True:
chunk = stream.read(640)
ws.send(chunk, opcode=websocket.ABNF.OPCODE_BINARY)

JavaScript (Node.js with mic)

const mic = require('mic');

const micInstance = mic({
rate: '16000',
channels: '1',
encoding: 'signed-integer',
bitwidth: '16'
});

const micInputStream = micInstance.getAudioStream();

micInputStream.on('data', (data) => {
ws.send(data);
});

micInstance.start();

Browser (Web Audio API)

navigator.mediaDevices.getUserMedia({ audio: true })
.then(stream => {
const audioContext = new AudioContext({ sampleRate: 16000 });
const source = audioContext.createMediaStreamSource(stream);
const processor = audioContext.createScriptProcessor(1024, 1, 1);

source.connect(processor);
processor.connect(audioContext.destination);

processor.onaudioprocess = (e) => {
const audioData = e.inputBuffer.getChannelData(0);
const pcm = convertFloat32ToPCM16(audioData);
ws.send(pcm);
};
});

function convertFloat32ToPCM16(float32Array) {
const pcm16 = new Int16Array(float32Array.length);
for (let i = 0; i < float32Array.length; i++) {
const s = Math.max(-1, Math.min(1, float32Array[i]));
pcm16[i] = s < 0 ? s * 0x8000 : s * 0x7FFF;
}
return pcm16.buffer;
}

Unsupported Formats

  • ❌ MP3, AAC, OGG (compressed formats)
  • ❌ Stereo audio (must be mono)
  • ❌ 8-bit audio (must be 16-bit)
  • ❌ Floating-point PCM (must be integer)

Format Conversion

Stereo to Mono

import numpy as np

def stereo_to_mono(stereo_audio):
# Average left and right channels
return np.mean(stereo_audio, axis=1)

Resample Audio

import librosa

# Resample to 16kHz
audio_16k = librosa.resample(audio, orig_sr=44100, target_sr=16000)