Migrate from Realtime API V1 to V2
This guide helps you upgrade from the Realtime API V1 to V2. V2 offers improved performance, better connection stability, and new features for live transcription applications.
Overview
Why Upgrade to V2?
Performance Improvements:
- 40% lower latency for transcription results
- Better handling of network interruptions
- Improved audio quality detection
- More efficient WebSocket connection management
New Features:
- Partial results for real-time feedback
- Confidence scores per word
- Speaker diarization in real-time
- Support for more audio formats
- Better punctuation and capitalization
- Custom vocabulary support
Reliability:
- Automatic reconnection with state recovery
- Better error handling and reporting
- Connection health monitoring
- Graceful degradation
Timeline
- V1 Support: Continues through June 2025
- V2 Recommended: For all new implementations
- Migration Window: 6 months to migrate existing apps
Key Differences
1. WebSocket URL
V1:
wss://api.scriptix.io/v1/realtime
V2:
wss://api.scriptix.io/v2/realtime
2. Connection Initialization
V1: Simple connection with auth token
const ws = new WebSocket('wss://api.scriptix.io/v1/realtime');
ws.onopen = () => {
ws.send(JSON.stringify({
type: 'auth',
token: 'YOUR_API_KEY'
}));
};
V2: Session-based with initialization
// Step 1: Create session via REST API
const response = await fetch('https://api.scriptix.io/api/v2/realtime/sessions', {
method: 'POST',
headers: {
'Authorization': `Bearer ${API_KEY}`,
'Content-Type': 'application/json'
},
body: JSON.stringify({
language: 'en',
enable_diarization: true,
enable_partial_results: true
})
});
const { session_id, ws_url } = await response.json();
// Step 2: Connect to WebSocket
const ws = new WebSocket(`${ws_url}?session_id=${session_id}`);
3. Message Protocol
V1: Simple message types
// V1 Messages
{
"type": "audio",
"data": "base64_encoded_audio"
}
{
"type": "result",
"text": "transcribed text"
}
V2: Structured message protocol
// V2 Messages - More detailed
{
"type": "audio",
"audio": {
"data": "base64_encoded_audio",
"format": "pcm16",
"sample_rate": 16000
}
}
{
"type": "partial_result",
"result": {
"text": "transcribed",
"confidence": 0.95,
"is_final": false
}
}
{
"type": "final_result",
"result": {
"text": "transcribed text",
"confidence": 0.98,
"words": [
{
"word": "transcribed",
"start_time": 0.0,
"end_time": 0.5,
"confidence": 0.97
},
{
"word": "text",
"start_time": 0.5,
"end_time": 0.8,
"confidence": 0.99
}
],
"speaker_id": 1
}
}
4. Audio Formats
V1: Limited to PCM16
// Only supported PCM16 at 16kHz
V2: Multiple formats supported
// Supported formats in V2:
// - PCM16 (16kHz, 8kHz)
// - MULAW (8kHz)
// - OPUS (16kHz, 48kHz)
const config = {
audio_format: "opus",
sample_rate: 48000
};
5. Partial Results
V1: No partial results - only final transcripts
V2: Real-time partial results
ws.onmessage = (event) => {
const message = JSON.parse(event.data);
if (message.type === 'partial_result') {
// Update UI with partial transcription
updateTranscript(message.result.text, false);
} else if (message.type === 'final_result') {
// Update UI with final transcription
updateTranscript(message.result.text, true);
}
};
Migration Steps
Step 1: Update WebSocket URL
Before (V1):
const WS_URL = 'wss://api.scriptix.io/v1/realtime';
After (V2):
const API_URL = 'https://api.scriptix.io/api/v2/realtime';
const WS_BASE_URL = 'wss://api.scriptix.io/v2/realtime';
Step 2: Implement Session Creation
V2 requires creating a session first:
async function createRealtimeSession(config) {
const response = await fetch(`${API_URL}/sessions`, {
method: 'POST',
headers: {
'Authorization': `Bearer ${API_KEY}`,
'Content-Type': 'application/json'
},
body: JSON.stringify({
language: config.language || 'en',
enable_diarization: config.enableDiarization || false,
enable_partial_results: config.enablePartialResults || true,
audio_format: config.audioFormat || 'pcm16',
sample_rate: config.sampleRate || 16000
})
});
if (!response.ok) {
throw new Error(`Session creation failed: ${response.statusText}`);
}
return await response.json();
}
Step 3: Update WebSocket Connection
async function connectRealtimeV2(config) {
// Create session
const session = await createRealtimeSession(config);
// Connect to WebSocket
const ws = new WebSocket(`${WS_BASE_URL}?session_id=${session.session_id}`);
ws.onopen = () => {
console.log('Connected to V2 Realtime API');
// V2 doesn't need separate auth message
};
ws.onmessage = (event) => {
const message = JSON.parse(event.data);
handleRealtimeMessage(message);
};
ws.onerror = (error) => {
console.error('WebSocket error:', error);
};
ws.onclose = (event) => {
console.log('WebSocket closed:', event.code, event.reason);
// Implement reconnection logic
};
return { ws, sessionId: session.session_id };
}
Step 4: Update Message Handlers
function handleRealtimeMessage(message) {
switch (message.type) {
case 'session_started':
console.log('Session started:', message.session_id);
break;
case 'partial_result':
// V2 feature - real-time updates
onPartialResult(message.result);
break;
case 'final_result':
// Final transcription
onFinalResult(message.result);
break;
case 'error':
console.error('Error:', message.error);
onError(message.error);
break;
case 'session_ended':
console.log('Session ended');
onSessionEnd();
break;
default:
console.warn('Unknown message type:', message.type);
}
}
function onPartialResult(result) {
// Update UI with partial transcription
document.getElementById('transcript').textContent = result.text;
}
function onFinalResult(result) {
// Add final transcription to history
const finalText = result.text;
const confidence = result.confidence;
appendToTranscript(finalText, confidence);
// V2 provides word-level details
if (result.words) {
displayWordTimings(result.words);
}
// V2 provides speaker information
if (result.speaker_id !== undefined) {
updateSpeaker(result.speaker_id);
}
}
Step 5: Update Audio Sending
V1:
function sendAudio(audioData) {
ws.send(JSON.stringify({
type: 'audio',
data: btoa(String.fromCharCode(...audioData))
}));
}
V2:
function sendAudioV2(audioData, format = 'pcm16', sampleRate = 16000) {
const message = {
type: 'audio',
audio: {
data: btoa(String.fromCharCode(...audioData)),
format: format,
sample_rate: sampleRate
}
};
ws.send(JSON.stringify(message));
}
Step 6: Implement Reconnection Logic
V2 supports session recovery:
let reconnectAttempts = 0;
const MAX_RECONNECT_ATTEMPTS = 5;
async function reconnect(sessionId, config) {
if (reconnectAttempts >= MAX_RECONNECT_ATTEMPTS) {
console.error('Max reconnection attempts reached');
return;
}
reconnectAttempts++;
console.log(`Reconnecting... (Attempt ${reconnectAttempts})`);
try {
// V2 allows reconnecting to existing session
const ws = new WebSocket(
`${WS_BASE_URL}?session_id=${sessionId}&reconnect=true`
);
ws.onopen = () => {
console.log('Reconnected successfully');
reconnectAttempts = 0;
};
// ... rest of handlers
} catch (error) {
console.error('Reconnection failed:', error);
setTimeout(() => reconnect(sessionId, config), 2000 * reconnectAttempts);
}
}
Complete Migration Example
V1 Implementation
// V1 Realtime Client
class RealtimeClientV1 {
constructor(apiKey, language = 'en') {
this.apiKey = apiKey;
this.language = language;
this.ws = null;
}
connect() {
this.ws = new WebSocket('wss://api.scriptix.io/v1/realtime');
this.ws.onopen = () => {
// Authenticate
this.ws.send(JSON.stringify({
type: 'auth',
token: this.apiKey,
language: this.language
}));
};
this.ws.onmessage = (event) => {
const message = JSON.parse(event.data);
if (message.type === 'result') {
this.onResult(message.text);
}
};
}
sendAudio(audioData) {
if (this.ws && this.ws.readyState === WebSocket.OPEN) {
this.ws.send(JSON.stringify({
type: 'audio',
data: btoa(String.fromCharCode(...audioData))
}));
}
}
disconnect() {
if (this.ws) {
this.ws.close();
}
}
onResult(text) {
// Override this method
console.log('Result:', text);
}
}
V2 Implementation
// V2 Realtime Client
class RealtimeClientV2 {
constructor(apiKey, config = {}) {
this.apiKey = apiKey;
this.config = {
language: config.language || 'en',
enableDiarization: config.enableDiarization || false,
enablePartialResults: config.enablePartialResults !== false,
audioFormat: config.audioFormat || 'pcm16',
sampleRate: config.sampleRate || 16000
};
this.ws = null;
this.sessionId = null;
}
async connect() {
try {
// Create session
const response = await fetch(
'https://api.scriptix.io/api/v2/realtime/sessions',
{
method: 'POST',
headers: {
'Authorization': `Bearer ${this.apiKey}`,
'Content-Type': 'application/json'
},
body: JSON.stringify(this.config)
}
);
if (!response.ok) {
throw new Error(`Session creation failed: ${response.statusText}`);
}
const session = await response.json();
this.sessionId = session.session_id;
// Connect WebSocket
this.ws = new WebSocket(
`wss://api.scriptix.io/v2/realtime?session_id=${this.sessionId}`
);
this.ws.onopen = () => {
console.log('Connected to Realtime V2');
this.onConnect();
};
this.ws.onmessage = (event) => {
const message = JSON.parse(event.data);
this.handleMessage(message);
};
this.ws.onerror = (error) => {
console.error('WebSocket error:', error);
this.onError(error);
};
this.ws.onclose = (event) => {
console.log('WebSocket closed:', event.code, event.reason);
this.onDisconnect();
};
} catch (error) {
console.error('Connection failed:', error);
throw error;
}
}
handleMessage(message) {
switch (message.type) {
case 'session_started':
this.onSessionStarted(message);
break;
case 'partial_result':
this.onPartialResult(message.result);
break;
case 'final_result':
this.onFinalResult(message.result);
break;
case 'error':
this.onError(message.error);
break;
case 'session_ended':
this.onSessionEnded();
break;
}
}
sendAudio(audioData) {
if (this.ws && this.ws.readyState === WebSocket.OPEN) {
const message = {
type: 'audio',
audio: {
data: btoa(String.fromCharCode(...audioData)),
format: this.config.audioFormat,
sample_rate: this.config.sampleRate
}
};
this.ws.send(JSON.stringify(message));
}
}
async disconnect() {
if (this.ws) {
this.ws.close();
}
if (this.sessionId) {
// End session via API
await fetch(
`https://api.scriptix.io/api/v2/realtime/sessions/${this.sessionId}`,
{
method: 'DELETE',
headers: {
'Authorization': `Bearer ${this.apiKey}`
}
}
);
}
}
// Override these methods
onConnect() {}
onDisconnect() {}
onSessionStarted(data) {}
onPartialResult(result) {
console.log('Partial:', result.text);
}
onFinalResult(result) {
console.log('Final:', result.text, `(${result.confidence})`);
}
onError(error) {
console.error('Error:', error);
}
onSessionEnded() {}
}
// Usage
const client = new RealtimeClientV2('YOUR_API_KEY', {
language: 'en',
enablePartialResults: true,
enableDiarization: true
});
client.onFinalResult = (result) => {
console.log('Transcription:', result.text);
console.log('Confidence:', result.confidence);
console.log('Speaker:', result.speaker_id);
};
await client.connect();
Feature Mapping
V1 → V2 Feature Comparison
| Feature | V1 | V2 | Notes |
|---|---|---|---|
| Basic transcription | ✅ | ✅ | Same |
| Partial results | ❌ | ✅ | New in V2 |
| Word-level timestamps | ❌ | ✅ | New in V2 |
| Confidence scores | ❌ | ✅ | New in V2 |
| Speaker diarization | ❌ | ✅ | New in V2 |
| Multiple audio formats | ❌ | ✅ | V1 only PCM16 |
| Session recovery | ❌ | ✅ | New in V2 |
| Custom vocabulary | ❌ | ✅ | New in V2 |
Best Practices for V2
1. Handle Partial Results
let currentPartialText = '';
client.onPartialResult = (result) => {
// Show partial result in real-time
currentPartialText = result.text;
updateLiveTranscript(currentPartialText);
};
client.onFinalResult = (result) => {
// Replace partial with final
appendFinalTranscript(result.text);
currentPartialText = '';
};
2. Monitor Connection Health
setInterval(() => {
if (client.ws.readyState === WebSocket.OPEN) {
client.ws.send(JSON.stringify({ type: 'ping' }));
}
}, 30000); // Ping every 30 seconds
3. Handle Errors Gracefully
client.onError = (error) => {
if (error.code === 'INSUFFICIENT_CREDITS') {
showWarning('Low balance - please add credits');
} else if (error.code === 'UNSUPPORTED_LANGUAGE') {
showError('Selected language not supported');
} else {
showError(`Error: ${error.message}`);
}
};
Testing Your Migration
Test Checklist
- Session creation works
- WebSocket connects successfully
- Audio streaming works
- Partial results display correctly
- Final results are accurate
- Reconnection logic works
- Error handling implemented
- Session cleanup on disconnect
- Performance is acceptable
- All supported languages tested
Performance Comparison
Latency Improvements
| Metric | V1 | V2 | Improvement |
|---|---|---|---|
| First result | 1.5s | 0.8s | 47% faster |
| Avg latency | 800ms | 450ms | 44% faster |
| Connection time | 500ms | 300ms | 40% faster |
Common Issues
Issue 1: Session Creation Fails
Cause: Invalid API key or configuration
Solution:
try {
await client.connect();
} catch (error) {
console.error('Connection failed:', error.message);
// Check API key and config
}
Issue 2: No Partial Results
Cause: Partial results not enabled
Solution:
const client = new RealtimeClientV2(apiKey, {
enablePartialResults: true // Make sure this is true
});
Issue 3: Audio Not Transcribed
Cause: Incorrect audio format
Solution:
// Ensure format matches your audio source
const client = new RealtimeClientV2(apiKey, {
audioFormat: 'pcm16', // Match your audio format
sampleRate: 16000 // Match your sample rate
});
Support
Need help migrating?
- Realtime API V2 Documentation
- WebSocket Protocol
- Audio Formats Guide
- Contact support via dashboard