Skip to main content

Migrate from Realtime API V1 to V2

This guide helps you upgrade from the Realtime API V1 to V2. V2 offers improved performance, better connection stability, and new features for live transcription applications.

Overview

Why Upgrade to V2?

Performance Improvements:

  • 40% lower latency for transcription results
  • Better handling of network interruptions
  • Improved audio quality detection
  • More efficient WebSocket connection management

New Features:

  • Partial results for real-time feedback
  • Confidence scores per word
  • Speaker diarization in real-time
  • Support for more audio formats
  • Better punctuation and capitalization
  • Custom vocabulary support

Reliability:

  • Automatic reconnection with state recovery
  • Better error handling and reporting
  • Connection health monitoring
  • Graceful degradation

Timeline

  • V1 Support: Continues through June 2025
  • V2 Recommended: For all new implementations
  • Migration Window: 6 months to migrate existing apps

Key Differences

1. WebSocket URL

V1:

wss://api.scriptix.io/v1/realtime

V2:

wss://api.scriptix.io/v2/realtime

2. Connection Initialization

V1: Simple connection with auth token

const ws = new WebSocket('wss://api.scriptix.io/v1/realtime');

ws.onopen = () => {
ws.send(JSON.stringify({
type: 'auth',
token: 'YOUR_API_KEY'
}));
};

V2: Session-based with initialization

// Step 1: Create session via REST API
const response = await fetch('https://api.scriptix.io/api/v2/realtime/sessions', {
method: 'POST',
headers: {
'Authorization': `Bearer ${API_KEY}`,
'Content-Type': 'application/json'
},
body: JSON.stringify({
language: 'en',
enable_diarization: true,
enable_partial_results: true
})
});

const { session_id, ws_url } = await response.json();

// Step 2: Connect to WebSocket
const ws = new WebSocket(`${ws_url}?session_id=${session_id}`);

3. Message Protocol

V1: Simple message types

// V1 Messages
{
"type": "audio",
"data": "base64_encoded_audio"
}

{
"type": "result",
"text": "transcribed text"
}

V2: Structured message protocol

// V2 Messages - More detailed
{
"type": "audio",
"audio": {
"data": "base64_encoded_audio",
"format": "pcm16",
"sample_rate": 16000
}
}

{
"type": "partial_result",
"result": {
"text": "transcribed",
"confidence": 0.95,
"is_final": false
}
}

{
"type": "final_result",
"result": {
"text": "transcribed text",
"confidence": 0.98,
"words": [
{
"word": "transcribed",
"start_time": 0.0,
"end_time": 0.5,
"confidence": 0.97
},
{
"word": "text",
"start_time": 0.5,
"end_time": 0.8,
"confidence": 0.99
}
],
"speaker_id": 1
}
}

4. Audio Formats

V1: Limited to PCM16

// Only supported PCM16 at 16kHz

V2: Multiple formats supported

// Supported formats in V2:
// - PCM16 (16kHz, 8kHz)
// - MULAW (8kHz)
// - OPUS (16kHz, 48kHz)

const config = {
audio_format: "opus",
sample_rate: 48000
};

5. Partial Results

V1: No partial results - only final transcripts

V2: Real-time partial results

ws.onmessage = (event) => {
const message = JSON.parse(event.data);

if (message.type === 'partial_result') {
// Update UI with partial transcription
updateTranscript(message.result.text, false);
} else if (message.type === 'final_result') {
// Update UI with final transcription
updateTranscript(message.result.text, true);
}
};

Migration Steps

Step 1: Update WebSocket URL

Before (V1):

const WS_URL = 'wss://api.scriptix.io/v1/realtime';

After (V2):

const API_URL = 'https://api.scriptix.io/api/v2/realtime';
const WS_BASE_URL = 'wss://api.scriptix.io/v2/realtime';

Step 2: Implement Session Creation

V2 requires creating a session first:

async function createRealtimeSession(config) {
const response = await fetch(`${API_URL}/sessions`, {
method: 'POST',
headers: {
'Authorization': `Bearer ${API_KEY}`,
'Content-Type': 'application/json'
},
body: JSON.stringify({
language: config.language || 'en',
enable_diarization: config.enableDiarization || false,
enable_partial_results: config.enablePartialResults || true,
audio_format: config.audioFormat || 'pcm16',
sample_rate: config.sampleRate || 16000
})
});

if (!response.ok) {
throw new Error(`Session creation failed: ${response.statusText}`);
}

return await response.json();
}

Step 3: Update WebSocket Connection

async function connectRealtimeV2(config) {
// Create session
const session = await createRealtimeSession(config);

// Connect to WebSocket
const ws = new WebSocket(`${WS_BASE_URL}?session_id=${session.session_id}`);

ws.onopen = () => {
console.log('Connected to V2 Realtime API');
// V2 doesn't need separate auth message
};

ws.onmessage = (event) => {
const message = JSON.parse(event.data);
handleRealtimeMessage(message);
};

ws.onerror = (error) => {
console.error('WebSocket error:', error);
};

ws.onclose = (event) => {
console.log('WebSocket closed:', event.code, event.reason);
// Implement reconnection logic
};

return { ws, sessionId: session.session_id };
}

Step 4: Update Message Handlers

function handleRealtimeMessage(message) {
switch (message.type) {
case 'session_started':
console.log('Session started:', message.session_id);
break;

case 'partial_result':
// V2 feature - real-time updates
onPartialResult(message.result);
break;

case 'final_result':
// Final transcription
onFinalResult(message.result);
break;

case 'error':
console.error('Error:', message.error);
onError(message.error);
break;

case 'session_ended':
console.log('Session ended');
onSessionEnd();
break;

default:
console.warn('Unknown message type:', message.type);
}
}

function onPartialResult(result) {
// Update UI with partial transcription
document.getElementById('transcript').textContent = result.text;
}

function onFinalResult(result) {
// Add final transcription to history
const finalText = result.text;
const confidence = result.confidence;

appendToTranscript(finalText, confidence);

// V2 provides word-level details
if (result.words) {
displayWordTimings(result.words);
}

// V2 provides speaker information
if (result.speaker_id !== undefined) {
updateSpeaker(result.speaker_id);
}
}

Step 5: Update Audio Sending

V1:

function sendAudio(audioData) {
ws.send(JSON.stringify({
type: 'audio',
data: btoa(String.fromCharCode(...audioData))
}));
}

V2:

function sendAudioV2(audioData, format = 'pcm16', sampleRate = 16000) {
const message = {
type: 'audio',
audio: {
data: btoa(String.fromCharCode(...audioData)),
format: format,
sample_rate: sampleRate
}
};

ws.send(JSON.stringify(message));
}

Step 6: Implement Reconnection Logic

V2 supports session recovery:

let reconnectAttempts = 0;
const MAX_RECONNECT_ATTEMPTS = 5;

async function reconnect(sessionId, config) {
if (reconnectAttempts >= MAX_RECONNECT_ATTEMPTS) {
console.error('Max reconnection attempts reached');
return;
}

reconnectAttempts++;
console.log(`Reconnecting... (Attempt ${reconnectAttempts})`);

try {
// V2 allows reconnecting to existing session
const ws = new WebSocket(
`${WS_BASE_URL}?session_id=${sessionId}&reconnect=true`
);

ws.onopen = () => {
console.log('Reconnected successfully');
reconnectAttempts = 0;
};

// ... rest of handlers
} catch (error) {
console.error('Reconnection failed:', error);
setTimeout(() => reconnect(sessionId, config), 2000 * reconnectAttempts);
}
}

Complete Migration Example

V1 Implementation

// V1 Realtime Client
class RealtimeClientV1 {
constructor(apiKey, language = 'en') {
this.apiKey = apiKey;
this.language = language;
this.ws = null;
}

connect() {
this.ws = new WebSocket('wss://api.scriptix.io/v1/realtime');

this.ws.onopen = () => {
// Authenticate
this.ws.send(JSON.stringify({
type: 'auth',
token: this.apiKey,
language: this.language
}));
};

this.ws.onmessage = (event) => {
const message = JSON.parse(event.data);

if (message.type === 'result') {
this.onResult(message.text);
}
};
}

sendAudio(audioData) {
if (this.ws && this.ws.readyState === WebSocket.OPEN) {
this.ws.send(JSON.stringify({
type: 'audio',
data: btoa(String.fromCharCode(...audioData))
}));
}
}

disconnect() {
if (this.ws) {
this.ws.close();
}
}

onResult(text) {
// Override this method
console.log('Result:', text);
}
}

V2 Implementation

// V2 Realtime Client
class RealtimeClientV2 {
constructor(apiKey, config = {}) {
this.apiKey = apiKey;
this.config = {
language: config.language || 'en',
enableDiarization: config.enableDiarization || false,
enablePartialResults: config.enablePartialResults !== false,
audioFormat: config.audioFormat || 'pcm16',
sampleRate: config.sampleRate || 16000
};
this.ws = null;
this.sessionId = null;
}

async connect() {
try {
// Create session
const response = await fetch(
'https://api.scriptix.io/api/v2/realtime/sessions',
{
method: 'POST',
headers: {
'Authorization': `Bearer ${this.apiKey}`,
'Content-Type': 'application/json'
},
body: JSON.stringify(this.config)
}
);

if (!response.ok) {
throw new Error(`Session creation failed: ${response.statusText}`);
}

const session = await response.json();
this.sessionId = session.session_id;

// Connect WebSocket
this.ws = new WebSocket(
`wss://api.scriptix.io/v2/realtime?session_id=${this.sessionId}`
);

this.ws.onopen = () => {
console.log('Connected to Realtime V2');
this.onConnect();
};

this.ws.onmessage = (event) => {
const message = JSON.parse(event.data);
this.handleMessage(message);
};

this.ws.onerror = (error) => {
console.error('WebSocket error:', error);
this.onError(error);
};

this.ws.onclose = (event) => {
console.log('WebSocket closed:', event.code, event.reason);
this.onDisconnect();
};

} catch (error) {
console.error('Connection failed:', error);
throw error;
}
}

handleMessage(message) {
switch (message.type) {
case 'session_started':
this.onSessionStarted(message);
break;

case 'partial_result':
this.onPartialResult(message.result);
break;

case 'final_result':
this.onFinalResult(message.result);
break;

case 'error':
this.onError(message.error);
break;

case 'session_ended':
this.onSessionEnded();
break;
}
}

sendAudio(audioData) {
if (this.ws && this.ws.readyState === WebSocket.OPEN) {
const message = {
type: 'audio',
audio: {
data: btoa(String.fromCharCode(...audioData)),
format: this.config.audioFormat,
sample_rate: this.config.sampleRate
}
};

this.ws.send(JSON.stringify(message));
}
}

async disconnect() {
if (this.ws) {
this.ws.close();
}

if (this.sessionId) {
// End session via API
await fetch(
`https://api.scriptix.io/api/v2/realtime/sessions/${this.sessionId}`,
{
method: 'DELETE',
headers: {
'Authorization': `Bearer ${this.apiKey}`
}
}
);
}
}

// Override these methods
onConnect() {}
onDisconnect() {}
onSessionStarted(data) {}
onPartialResult(result) {
console.log('Partial:', result.text);
}
onFinalResult(result) {
console.log('Final:', result.text, `(${result.confidence})`);
}
onError(error) {
console.error('Error:', error);
}
onSessionEnded() {}
}

// Usage
const client = new RealtimeClientV2('YOUR_API_KEY', {
language: 'en',
enablePartialResults: true,
enableDiarization: true
});

client.onFinalResult = (result) => {
console.log('Transcription:', result.text);
console.log('Confidence:', result.confidence);
console.log('Speaker:', result.speaker_id);
};

await client.connect();

Feature Mapping

V1 → V2 Feature Comparison

FeatureV1V2Notes
Basic transcriptionSame
Partial resultsNew in V2
Word-level timestampsNew in V2
Confidence scoresNew in V2
Speaker diarizationNew in V2
Multiple audio formatsV1 only PCM16
Session recoveryNew in V2
Custom vocabularyNew in V2

Best Practices for V2

1. Handle Partial Results

let currentPartialText = '';

client.onPartialResult = (result) => {
// Show partial result in real-time
currentPartialText = result.text;
updateLiveTranscript(currentPartialText);
};

client.onFinalResult = (result) => {
// Replace partial with final
appendFinalTranscript(result.text);
currentPartialText = '';
};

2. Monitor Connection Health

setInterval(() => {
if (client.ws.readyState === WebSocket.OPEN) {
client.ws.send(JSON.stringify({ type: 'ping' }));
}
}, 30000); // Ping every 30 seconds

3. Handle Errors Gracefully

client.onError = (error) => {
if (error.code === 'INSUFFICIENT_CREDITS') {
showWarning('Low balance - please add credits');
} else if (error.code === 'UNSUPPORTED_LANGUAGE') {
showError('Selected language not supported');
} else {
showError(`Error: ${error.message}`);
}
};

Testing Your Migration

Test Checklist

  • Session creation works
  • WebSocket connects successfully
  • Audio streaming works
  • Partial results display correctly
  • Final results are accurate
  • Reconnection logic works
  • Error handling implemented
  • Session cleanup on disconnect
  • Performance is acceptable
  • All supported languages tested

Performance Comparison

Latency Improvements

MetricV1V2Improvement
First result1.5s0.8s47% faster
Avg latency800ms450ms44% faster
Connection time500ms300ms40% faster

Common Issues

Issue 1: Session Creation Fails

Cause: Invalid API key or configuration

Solution:

try {
await client.connect();
} catch (error) {
console.error('Connection failed:', error.message);
// Check API key and config
}

Issue 2: No Partial Results

Cause: Partial results not enabled

Solution:

const client = new RealtimeClientV2(apiKey, {
enablePartialResults: true // Make sure this is true
});

Issue 3: Audio Not Transcribed

Cause: Incorrect audio format

Solution:

// Ensure format matches your audio source
const client = new RealtimeClientV2(apiKey, {
audioFormat: 'pcm16', // Match your audio format
sampleRate: 16000 // Match your sample rate
});

Support

Need help migrating?

Next Steps