Technical

Why not local Whisper — and why we went local anyway

Re-evaluating the local-vs-cloud trade after on-device transducers got fast enough.

May 20, 20263 min

When Lven launched, we deliberately chose server-side transcription. The Whisper large-v3 model was 1.5 GB, slow on every consumer device we benchmarked, and drained battery hard on phones. Sending audio to dedicated hardware behind a TLS connection produced text in under a second — fast enough that nobody noticed the round trip.

What changed: a compact 0.6 B-parameter transformer transducer, quantised to int8 via Sherpa-ONNX, runs inside a 200 MB install footprint and matches the accuracy band most users actually need. On a five-year-old laptop it transcribes faster than people can finish speaking. On a current Android phone it runs in real time without measurable battery impact.

Once a credible local model existed, the cloud trade-off became indefensible. Sending audio to a server has two unavoidable costs: privacy exposure (the audio crosses infrastructure you don't control) and network dependence (a slow connection means slow transcription). Local removes both.

Lven Instant ships across Windows, macOS, Linux, Android, and iPhone with the same quantised model. The cloud build remains supported for users on hardware that sits below Instant's CPU floor or who specifically prefer the cloud trade-off. But on-device is now the default — because it can be.