Skip to content
Research Brief · LiveKit · Realtime Voice & Video Infrastructure
Realtime Infrastructure · Voice AI · WebRTC

LiveKit — the plumbing
behind realtime voice AI.

An open-source WebRTC stack used by OpenAI's ChatGPT voice mode and a long list of voice-AI products. This is a plain-English brief on what LiveKit is, how its Agents framework works, and where it fits for an SME operator who wants to ship a voice assistant without reinventing the audio pipeline.

Filed 23 May 2026 · Research Brief
WebRTCTransport layer
OpenApache 2.0 source
Cloud + Self-hostTwo ways to run it
AgentsVoice-AI framework
01 · What it is

The short version.

LiveKit is the realtime audio & video layer most modern voice-AI products sit on top of. Instead of stitching together WebRTC, STUN/TURN, codecs, jitter buffers and reconnection logic yourself, you point a client SDK at a LiveKit server (cloud or self-hosted) and you get a low-latency audio pipe between a user's microphone and your backend — with SDKs for web, iOS, Android, React Native, Flutter, Unity and the server side in Node, Python, Go, Rust and Ruby.

It is the same stack OpenAI uses for ChatGPT's voice mode. That single fact tells you most of what you need to know about its production posture.

02 · Building blocks

Four nouns to remember.

Room

A session. Everyone who joins the same room can publish and subscribe to each other's tracks.

Participant

A user, a bot, or an AI agent inside a room. Each has an identity and a JWT-issued token.

Track

An audio or video stream a participant publishes. Other participants subscribe to consume it.

Agent

A server-side participant that listens, thinks (LLM + STT + TTS), and speaks back — built with the Agents SDK.

Authentication is JWT-based: your backend mints a short-lived access token with the participant's identity and the room they're allowed to join, the client connects over wss://, and LiveKit takes care of the rest.

03 · The Agents framework

Voice AI, without the duct tape.

LiveKit Agents is the part most operators actually want. It is a server-side framework (Python and Node) that wires together the four pieces a voice assistant needs:

The framework handles turn-taking, interruptions ("barge-in"), partial transcripts, tool calls, and reconnection. You write the agent's behaviour; you don't write the audio loop.

A minimal Python agent

from livekit.agents import AgentSession, Agent
from livekit.plugins import openai, deepgram, elevenlabs, silero

async def entrypoint(ctx):
    session = AgentSession(
        vad=silero.VAD.load(),
        stt=deepgram.STT(),
        llm=openai.LLM(model="gpt-4o-mini"),
        tts=elevenlabs.TTS(),
    )
    await session.start(
        room=ctx.room,
        agent=Agent(instructions="You are a friendly UAE concierge."),
    )

That's the whole shape of a voice agent. The plugins are swappable; the loop is not your problem.

04 · Where it fits for an SME

Use cases worth shipping.

Operator note

The unlock is not the technology — STT, LLMs and TTS are commodities now. The unlock is the round trip: getting a user's audio to the LLM and back as speech with sub-second latency, on any network, without the call breaking. That round trip is what LiveKit sells.

05 · Cloud or self-host

Two ways to run it.

LiveKit Cloud is the managed offering: you get a wss:// URL, an API key and secret, and a generous free tier. Globally distributed media servers, so a user in Dubai and an agent process in Frankfurt still get low latency.

Self-hosted is the open-source server (livekit-server) that you run on your own infra — one Docker container plus a TURN setup for users behind strict NATs. Good for data-residency requirements, but you own the SRE.

For an SME shipping its first voice agent, Cloud is almost always the right answer. Move to self-host only when a customer contract or a regulator forces the question.

06 · The first hour

How a build usually starts.

Most teams get to a working prototype in a long afternoon. The expensive part — the realtime audio plumbing — is solved.

07 · Resources

Where to read next.

Caveat on facts

This brief is a snapshot. Plugin lists, free-tier limits, SDK names and pricing change. Treat the official docs as the source of truth and use this page as the orientation map.