Minimax
Multilingual scale voice at $60 per 1M characters and 90ms latency.
Minimax.io occupies the middle ground in the voice market — 90ms model latency and a flat $60 per 1M characters pricing model. Its standout strength is multilingual scale: broad language coverage at predictable PAYG pricing, which fits global content and customer-service workloads where the per-language voice subscription model would multiply costs.
Models
Minimax TTS
Multilingual90ms latency, broad language coverage
Input Price
$60 / 1M characters
Input Types
Text (multilingual)
Output Types
Audio
Use Cases
Global Customer Service
Voice agents across many languages on a single PAYG meter rather than per-language subscriptions.
Multilingual Content Audio
Convert content into many languages for podcasts, training, or product audio at predictable cost.
Scale-Out Voice Workflows
High-volume TTS where the character-based meter beats per-second voice subscriptions on cost.
Why use Minimax on Nagent?
Nagent adds enterprise orchestration, observability, and workflow automation on top of Minimax's raw model capabilities.
PAYG meter fits global rollouts where per-language pricing would explode
90ms latency is competitive for non-real-time conversational use
Drop-in alongside ElevenLabs / Cartesia / Smallest AI in our voice routing stack
KARMIC scoring helps pick the right voice provider per language
How to access Minimax on Nagent
Open Agent Studio
Navigate to Agent Studio in your Nagent workspace.
Select Minimax
Choose Minimax TTS under Audio / Voice in Model Configuration.
Pick Languages
Select target languages and voice profiles — the agent will route per-content automatically.
Common questions about Minimax
Real buyer and developer questions, answered. Click any item to expand.
When does Minimax beat ElevenLabs and Cartesia?
When you need broad multilingual coverage at predictable PAYG pricing without per-language subscriptions. For single-language English work, ElevenLabs (quality) or Cartesia (latency) win.
What's the deal with character-based pricing?
$60 per 1M characters is a flat meter — easier to forecast than per-second voice subscriptions when traffic patterns are unpredictable. Particularly useful for content TTS where the input is text, not stream length.
How does the 90ms latency feel in practice?
Fine for most conversational use, noticeably slower than Cartesia's 40ms in side-by-side high-stakes real-time. Acceptable for content TTS, voice notifications, and async voice work where the user is not actively waiting.
Can I use Minimax for podcast generation?
Yes — particularly good for multilingual podcast localisation at predictable cost. For premium-quality original podcast production in a single language, ElevenLabs remains the standard.
Ready to use Minimax inside your agents?
Get started in minutes — no API key management required.
