Skip to content
Home/Case Studies/Voice-first SaaS client
Case study · Voice AI / Media

Multilingual TTS Model — 40+ Languages, 100+ Dialects, <300ms Latency

A production multilingual TTS with voice cloning — 40+ languages, 100+ dialects, and <300ms latency. ElevenLabs-class quality, self-hosted, trained on 8×H100 GPUs.

Client: Voice-first SaaS clientDuration: 7 monthsTeam: 5 engineers
Voice-first SaaS client logo
Client
Voice-first SaaS client
Industry
Voice AI / Media
Duration
7 months
Team size
5 engineers
01 / The Challenge
What Voice-first SaaS client was up against

ElevenLabs-class TTS was commercially perfect for the client's product — but the price per character and vendor dependency made it unworkable at their target scale. Off-the-shelf open TTS had either insufficient voice quality, narrow language coverage, or no practical way to clone a new voice from a short sample. They wanted ElevenLabs-level quality across 40+ languages and 100+ regional dialects, with sub-300ms time-to-first-audio, all running inside their own infrastructure.

02 / The Solution
What we built

We started from open TTS architectures (XTTS-style, VITS-derived), curated and cleaned a multilingual training corpus spanning 40+ languages and 100+ dialects, and fine-tuned the model on an 8× H100 GPU cluster with prosody-preserving augmentation. Voice cloning works from a 30-second reference sample. The model ships with a streaming inference server optimized for real-time use, a language-expansion pipeline so the client can onboard new dialects without our help, and built-in safety filters plus voice-consent gating.

03 / Outcomes

What shipped

40+
Languages supported
100+
Regional dialects
<300ms
Time-to-first-audio (ElevenLabs-class)
30s
Reference audio for voice cloning
Stack we used
XTTS / VITS architecturesPyTorchCoqui toolkitCustom training pipeline8× H100 GPU training clusterTriton inference serverPythonCUDAS3
Related services

Want something similar?

Free consultation

Telluswhatyouwanttoautomate.We'llreplyinonebusinessday.

Describe the problem, the constraint, the deadline. We'll send back a scoped plan and a senior engineer to kick it off — no sales theater.

Discovery call within 48 hours
Scoped proposal in one week
NDA-first, IP assigned to you
Dedicated Slack / Teams channel
Transparent weekly reporting
SOC 2 / GDPR / HIPAA-ready workflows
01 / 01replies in 24h
Schedule a free consultation
No sales pitch. A real engineer reads every message.