Minimax

Audio & Voice

Multilingual scale voice at $60 per 1M characters and 90ms latency.

Minimax.io occupies the middle ground in the voice market — 90ms model latency and a flat $60 per 1M characters pricing model. Its standout strength is multilingual scale: broad language coverage at predictable PAYG pricing, which fits global content and customer-service workloads where the per-language voice subscription model would multiply costs.

Access on Nagent

Book a Demo Try Free

Models available1

Modalities1

Available on Nagent

Models

Minimax TTS

Multilingual

90ms latency, broad language coverage

Input Price

$60 / 1M characters

Input Types

Text (multilingual)

Output Types

Audio

What You Can Build

Use Cases

Global Customer Service

Voice agents across many languages on a single PAYG meter rather than per-language subscriptions.

Multilingual Content Audio

Convert content into many languages for podcasts, training, or product audio at predictable cost.

Scale-Out Voice Workflows

High-volume TTS where the character-based meter beats per-second voice subscriptions on cost.

Platform Advantage

Why use Minimax on Nagent?

Nagent adds enterprise orchestration, observability, and workflow automation on top of Minimax's raw model capabilities.

PAYG meter fits global rollouts where per-language pricing would explode

90ms latency is competitive for non-real-time conversational use

Drop-in alongside ElevenLabs / Cartesia / Smallest AI in our voice routing stack

KARMIC scoring helps pick the right voice provider per language

Getting Started

How to access Minimax on Nagent

Open Agent Studio

Navigate to Agent Studio in your Nagent workspace.

Select Minimax

Choose Minimax TTS under Audio / Voice in Model Configuration.

Pick Languages

Select target languages and voice profiles — the agent will route per-content automatically.

FAQs

Common questions about Minimax

Real buyer and developer questions, answered. Click any item to expand.

When does Minimax beat ElevenLabs and Cartesia?

When you need broad multilingual coverage at predictable PAYG pricing without per-language subscriptions. For single-language English work, ElevenLabs (quality) or Cartesia (latency) win.

What's the deal with character-based pricing?

$60 per 1M characters is a flat meter — easier to forecast than per-second voice subscriptions when traffic patterns are unpredictable. Particularly useful for content TTS where the input is text, not stream length.

How does the 90ms latency feel in practice?

Fine for most conversational use, noticeably slower than Cartesia's 40ms in side-by-side high-stakes real-time. Acceptable for content TTS, voice notifications, and async voice work where the user is not actively waiting.

Can I use Minimax for podcast generation?

Yes — particularly good for multilingual podcast localisation at predictable cost. For premium-quality original podcast production in a single language, ElevenLabs remains the standard.

Ready to use Minimax inside your agents?

Get started in minutes — no API key management required.

Book a Demo Try Free

All model providers