meetbot.dev

05 · transcription api

shipping Q3 2026 · BYOK works today

Transcripts, shipping Q3.

We do not ship transcription today. What we ship today: per-speaker audio tracks you can pipe straight into Whisper, Deepgram, AssemblyAI, Gladia, AWS Transcribe, or ElevenLabs on your own key — zero meetbot fee on that leg. See the action-items-bot sample for a working Whisper integration. Hosted Whisper-large-v3 lands Q3 2026 at $0.10/hr.

overview

Why this exists.

Honest scope. We do not generate transcripts today. The recording side ships per-speaker audio (one Opus track per participant, name-tagged), and we surface the meeting platform's native captions verbatim — Meet/Teams/Zoom each have their own captioner and we pass it through as captions.jsonl. We do not run ASR ourselves. If you need a transcript today, the path is BYOK: pipe each per-speaker track into your provider of choice, get a per-speaker transcript back. The action-items-bot sample at github.com/meetbot/samples shows the Whisper integration end-to-end.

Q3 2026: hosted Whisper. When the hosted path ships it'll run on a Hetzner GPU box (RTX 4090), serve about twenty concurrent realtime streams, and support mid-meeting language switching. Speaker tagging will inherit straight from the bot's existing per-speaker audio mapping — we already know who said what; the transcript will inherit that. Pricing at GA: $0.10/hr add-on. Default for new accounts will stay "no transcription," because the cheapest API call is the one you don't make.

BYOK today, async or realtime. The shape we'll ship at GA: async is one POST after the meeting ends; realtime opens a WebSocket on wss://api.meetbot.dev/v1/transcripts/:bot_id and streams partial + finalized utterances as they're produced. Today, you build the equivalent on your end — the per-speaker audio is in your S3 bucket the second the meeting ends; route to your provider, then to your downstream consumer. The JSONL shape we'll use at GA matches the captions JSONL we already emit today, so the migration is a one-line consumer change.

planned surface

Spec, in the open.

item

transcript: { mode, provider }

Per-bot config on dispatch. mode ∈ {async, realtime}. provider ∈ {hosted-whisper, deepgram, assemblyai, gladia, aws-transcribe, elevenlabs}.

item

transcript.jsonl

Newline-delimited JSON. One row per finalized utterance, with speakerId, name, text, tStart, tEnd. Same shape as captions.

item

wss://api.meetbot.dev/v1/transcripts/:id

Realtime WebSocket. Emits {type: partial|final, ...} frames as utterances are produced. Per-speaker.

item

POST /v1/recordings/:id/transcript

Async transcript on a previously-completed recording. Useful if you decided to enable transcription only after the call.

item

Multilingual + language switching

Whisper-large-v3 detects mid-meeting language switches. Per-utterance lang tag in the JSONL. No need to declare upfront.

item

BYOK key vault

Provider keys stored encrypted with per-tenant KMS-derived keys. Rotation through /account/keys without redeploys.