Skip to main content

Real-time Speech-to-Text

The Scriptix real-time API enables live speech-to-text transcription using a WebSocket interface. This interface is optimized for low-latency audio streaming, allowing you to transcribe audio in real-time as it's being captured or transmitted.

The API uses a lightweight binary protocol over WebSocket and is best suited for interactive or live-streaming applications such as meetings, broadcasts, or voice interfaces.


How It Works

Once a WebSocket connection is established, you can begin streaming audio data directly to the Scriptix engine. In return, the API will emit structured JSON messages containing transcribed text segments, metadata, and optional speaker information.

The session is asynchronous by nature and may require multi-threading or coroutine-based implementations to manage audio input and real-time responses effectively.


Key Features

  • 🔄 Real-time audio streaming over WebSocket
  • ⚙️ Low-latency transcription with near-instant results
  • 🧠 Optional speaker diarization and confidence scores
  • 📦 Lightweight and binary-efficient communication protocol

Before You Start

To use the real-time API, you need a valid API token with real-time access. Visit the Authentication Guide for instructions on generating and managing your tokens.


Learn More


Need a quick setup?

Visit the Quick Start Guide for a full example using WebSocket and real-time transcription.