Open-Source Conversational AI Flow — Whisper + Llama Pipeline
A flow-based conversational AI application built entirely on open-source STT and LLM models — zero vendor lock-in, full data sovereignty, production-ready orchestration.

What Conversational AI platform was up against
Most production conversational AI stacks depend on proprietary APIs that get expensive fast, leak data outside client infrastructure, and leave the product vulnerable to upstream pricing and capability changes. The client wanted a stack where every link — STT, intent, LLM, TTS — ran on open-source models they could host themselves or on their own cloud, with clean orchestration around it.
What we built
We designed the flow around Whisper Large v3 Turbo (open STT with ~100ms latency and 50+ language LID), Llama-family and Mistral models (open LLMs) behind vLLM for throughput, and our own multilingual TTS (40+ languages, <300ms time-to-first-audio, all trained on 8× H100). A flow-editor UI lets ops teams design conversations visually; every node is versioned. The whole stack runs in the client's VPC with per-tenant isolation, and a lightweight eval harness monitors response quality against labeled samples so model upgrades are safe.
What shipped
Want something similar?
Other work we’ve shipped
Telluswhatyouwanttoautomate.We'llreplyinonebusinessday.
Describe the problem, the constraint, the deadline. We'll send back a scoped plan and a senior engineer to kick it off — no sales theater.