Audio Formats - Real-time API
Audio format specifications for real-time streaming.
Required Specifications
| Parameter | Value |
|---|---|
| Sample Rate | 8000, 16000, 22050, 44100, or 48000 Hz |
| Bit Depth | 16-bit |
| Channels | Mono (1 channel) |
| Encoding | PCM (signed-integer) |
| Endianness | Little-endian |
Recommended Settings
- Sample Rate: 16000 Hz (best quality/bandwidth balance)
- Chunk Size: 20-50ms of audio
- Buffer Size: 640-1600 bytes (for 16kHz, 20-50ms)
Chunk Size Calculation
bytes_per_chunk = sample_rate * duration_seconds * 2
# Examples:
# 16kHz, 20ms: 16000 * 0.02 * 2 = 640 bytes
# 16kHz, 50ms: 16000 * 0.05 * 2 = 1600 bytes
Audio Capture Examples
Python (PyAudio)
import pyaudio
p = pyaudio.PyAudio()
stream = p.open(
format=pyaudio.paInt16, # 16-bit
channels=1, # Mono
rate=16000, # 16kHz
input=True,
frames_per_buffer=640 # 20ms chunks
)
while True:
chunk = stream.read(640)
ws.send(chunk, opcode=websocket.ABNF.OPCODE_BINARY)
JavaScript (Node.js with mic)
const mic = require('mic');
const micInstance = mic({
rate: '16000',
channels: '1',
encoding: 'signed-integer',
bitwidth: '16'
});
const micInputStream = micInstance.getAudioStream();
micInputStream.on('data', (data) => {
ws.send(data);
});
micInstance.start();
Browser (Web Audio API)
navigator.mediaDevices.getUserMedia({ audio: true })
.then(stream => {
const audioContext = new AudioContext({ sampleRate: 16000 });
const source = audioContext.createMediaStreamSource(stream);
const processor = audioContext.createScriptProcessor(1024, 1, 1);
source.connect(processor);
processor.connect(audioContext.destination);
processor.onaudioprocess = (e) => {
const audioData = e.inputBuffer.getChannelData(0);
const pcm = convertFloat32ToPCM16(audioData);
ws.send(pcm);
};
});
function convertFloat32ToPCM16(float32Array) {
const pcm16 = new Int16Array(float32Array.length);
for (let i = 0; i < float32Array.length; i++) {
const s = Math.max(-1, Math.min(1, float32Array[i]));
pcm16[i] = s < 0 ? s * 0x8000 : s * 0x7FFF;
}
return pcm16.buffer;
}
Unsupported Formats
- ❌ MP3, AAC, OGG (compressed formats)
- ❌ Stereo audio (must be mono)
- ❌ 8-bit audio (must be 16-bit)
- ❌ Floating-point PCM (must be integer)
Format Conversion
Stereo to Mono
import numpy as np
def stereo_to_mono(stereo_audio):
# Average left and right channels
return np.mean(stereo_audio, axis=1)
Resample Audio
import librosa
# Resample to 16kHz
audio_16k = librosa.resample(audio, orig_sr=44100, target_sr=16000)