Inflect-Nano: Ultra-tiny 4.63M parameter TTS model released

🔑 Enhanced Key Takeaways

•Inflect-Nano is explicitly designed as an experimental model to push the boundaries of ultra-lightweight speech synthesis, rather than aiming for state-of-the-art quality.
•It is notable for including its vocoder within the 4.63M parameter count, making it a complete text-to-waveform stack under 5M parameters, which differentiates it from many other small TTS projects that rely on larger external vocoders.
•The model's creator is open to training a v2 with a larger budget if Inflect-Nano-v1 gains sufficient interest and utility.
•Inflect-Nano-v1 is considered the second smallest publicly released TTS model after TinyTTS, and is significantly smaller than competitors like Kokoro (~17x smaller) and Fish Audio S2 Pro (~1000x smaller).

📊 Competitor Analysis▸ Show

Feature/Metric	Inflect-Nano	Kokoro TTS	Fish Audio S2 Pro	MOSS-TTS-Nano
Total Parameters	4.63M	82M	5 Billion (4B Slow AR + 400M Fast AR)	~100M (0.1B)
Languages Supported	English-only	Multilingual (English, French, Korean, Japanese, Mandarin, etc.)	Multilingual (80+ languages)	Multilingual (20+ languages including Chinese, English, Japanese, Korean, Spanish, French)
Voice Styles/Cloning	Single English male voice	Multiple voice styles (19 distinct voices), no arbitrary voice cloning	Zero-shot voice cloning (10-30s audio)	Voice cloning with short reference clip
Key Features	Ultra-tiny, local PyTorch inference, includes vocoder in parameter count, experimental	High efficiency, low data requirement (<100 hrs), ONNX support, browser-first (WebGPU/WASM for some versions), streaming	Fine-grained inline control of prosody/emotion (15,000+ tags), dual-autoregressive architecture, trained on 10M+ hrs audio, SGLang streaming	Deployment-first, CPU-friendly, 48 kHz stereo output, pure autoregressive (Audio Tokenizer + LLM), streaming, long-text auto-chunking
Quality/Benchmarks	Can sound robotic, buzzy, or unstable; vocoder is a bottleneck; not SOTA	Achieved #1 ranking in TTS Spaces Arena (Elo rating), RTF 0.03 on GPU	Lowest WER in Seed-TTS Eval (0.54% Chinese, 0.99% English); RTF 0.195 on NVIDIA H200 GPU; time-to-first-audio ~100ms	Designed for "good enough quality for real-time products"
Licensing/Pricing	Open-source (Hugging Face)	Apache 2.0; $0.02/1,000 characters for some versions	FISH AUDIO RESEARCH LICENSE	Apache 2.0

🛠️ Technical Deep Dive

The acoustic model is a compact non-autoregressive FastSpeech-style network.
The vocoder is a small Snake-activation HiFi-GAN-style generator.
The model predicts duration, pitch, energy, and brightness, then decodes an 80-bin mel spectrogram.
It supports a 24 kHz audio sample rate and uses 80 mel bins.
The acoustic model has a hidden size of 168 and 5 encoder layers.
The full inference pipeline is: text -> English text frontend -> compact FastSpeech-style acoustic model -> 80-bin mel spectrogram -> small Snake HiFi-GAN-style vocoder -> 24 kHz waveform.
It utilizes a vendored text frontend (third_party/tiny_tts_frontend/) for English G2P/token IDs.

🔮 Future ImplicationsAI analysis grounded in cited sources

Ultra-small TTS models like Inflect-Nano will accelerate the development of fully offline and embedded voice AI applications.

Their minimal parameter count and local execution capability remove dependencies on cloud services, enabling privacy-focused and low-latency voice agents on resource-constrained devices.

The focus on extreme parameter efficiency will drive innovation in model compression and specialized architectures for edge AI.

Demonstrating usable speech synthesis at such a small scale encourages further research into highly optimized models that can run on 'potato computers' and browser/WASM environments.

⏳ Timeline

2026-06-17

Inflect-Nano-v1 is released on Hugging Face and announced on Reddit.

Inflect-Nano: Ultra-tiny 4.63M parameter TTS model released

⚡ 30-Second TL;DR

🧠 Deep Insight

🔑 Enhanced Key Takeaways

🛠️ Technical Deep Dive

🔮 Future ImplicationsAI analysis grounded in cited sources

⏳ Timeline

📎 Sources (16)

👉Related Updates