🎉 MiniMax-M2.7: Peak Performance. Ultimate Value. Master the Complex. ➔ Try Now.
curl --request POST \
--url https://api.minimax.io/v1/t2a_v2 \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: <content-type>' \
--data '
{
"model": "speech-2.8-hd",
"text": "Omg(sighs), the real danger is not that computers start thinking like people, but that people start thinking like computers. Computers can only help us with simple tasks.",
"stream": false,
"language_boost": "auto",
"output_format": "hex",
"voice_setting": {
"voice_id": "English_expressive_narrator",
"speed": 1,
"vol": 1,
"pitch": 0
},
"pronunciation_dict": {
"tone": [
"Omg/Oh my god"
]
},
"audio_setting": {
"sample_rate": 32000,
"bitrate": 128000,
"format": "mp3",
"channel": 1
},
"voice_modify": {
"pitch": 0,
"intensity": 0,
"timbre": 0,
"sound_effects": "spacious_echo"
}
}
'{
"data": {
"audio": "<hex encoded audio>",
"status": 2
},
"extra_info": {
"audio_length": 11124,
"audio_sample_rate": 32000,
"audio_size": 179926,
"bitrate": 128000,
"word_count": 163,
"invisible_character_ratio": 0,
"usage_characters": 163,
"audio_format": "mp3",
"audio_channel": 1
},
"trace_id": "01b8bf9bb7433cc75c18eee6cfa8fe21",
"base_resp": {
"status_code": 0,
"status_msg": "success"
}
}Use this API for synchronous t2a over HTTP.
curl --request POST \
--url https://api.minimax.io/v1/t2a_v2 \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: <content-type>' \
--data '
{
"model": "speech-2.8-hd",
"text": "Omg(sighs), the real danger is not that computers start thinking like people, but that people start thinking like computers. Computers can only help us with simple tasks.",
"stream": false,
"language_boost": "auto",
"output_format": "hex",
"voice_setting": {
"voice_id": "English_expressive_narrator",
"speed": 1,
"vol": 1,
"pitch": 0
},
"pronunciation_dict": {
"tone": [
"Omg/Oh my god"
]
},
"audio_setting": {
"sample_rate": 32000,
"bitrate": 128000,
"format": "mp3",
"channel": 1
},
"voice_modify": {
"pitch": 0,
"intensity": 0,
"timbre": 0,
"sound_effects": "spacious_echo"
}
}
'{
"data": {
"audio": "<hex encoded audio>",
"status": 2
},
"extra_info": {
"audio_length": 11124,
"audio_sample_rate": 32000,
"audio_size": 179926,
"bitrate": 128000,
"word_count": 163,
"invisible_character_ratio": 0,
"usage_characters": 163,
"audio_format": "mp3",
"audio_channel": 1
},
"trace_id": "01b8bf9bb7433cc75c18eee6cfa8fe21",
"base_resp": {
"status_code": 0,
"status_msg": "success"
}
}Alternative Endpoint, Reduced Time to First Audio (TTFA):Documentation Index
Fetch the complete documentation index at: https://platform.minimax.io/docs/llms.txt
Use this file to discover all available pages before exploring further.
https://api-uw.minimax.io/v1/t2a_v2HTTP: Bearer Auth
Bearer API_key, can be found in Account Management>API Keys.The media type of the request body. Must be set to application/json to ensure the data is sent in JSON format.
application/json The speech synthesis model version to use. Options include:
speech-2.8-hd, speech-2.8-turbo, speech-2.6-hd, speech-2.6-turbo, speech-02-hd, speech-02-turbo, speech-01-hd, speech-01-turbo.
speech-2.8-hd, speech-2.8-turbo, speech-2.6-hd, speech-2.6-turbo, speech-02-hd, speech-02-turbo, speech-01-hd, speech-01-turbo The text to be converted into speech. Must be less than 10,000 characters.
For texts over 3,000 characters, streaming output is recommended.
Paragraph breaks should be marked with newline characters.
Pause control: You can customize speech pauses by adding markers in the form <#x#>, where x is the pause duration in seconds. Valid range: [0.01, 99.99], up to two decimal places. Pause markers must be placed between speakable text segments and cannot be used consecutively.
Inline pronunciation: Wrap Mandarin Pinyin (with tone number 1–5) or IPA symbols or Cantonese Jyutping (with tone number 1–6) in half-width parentheses to override pronunciation of the target word or polyphonic character.
"The word live is pronounced (lɪv) as a verb and (laɪv) as an adjective.""This is (he2)平, not (huo4)面.""去街市買啲(sung3)。"Interjection tags: Only supported when using speech-2.8-hd or speech-2.8-turbo models. Supported interjections: (laughs), (chuckle), (coughs), (clear-throat), (groans), (breath), (pant), (inhale), (exhale), (gasps), (sniffs), (sighs), (snorts), (burps), (lip-smacking), (humming), (hissing), (emm), (sneezes).
Whether to enable streaming output. Defaults to false.
Show child attributes
Show child attributes
Show child attributes
Show child attributes
Timbre weights (legacy field)
Show child attributes
Controls whether recognition for specific minority languages and dialects is enhanced. Default is null. If the language type is unknown, set to "auto" and the model will automatically detect it.
Note: The speech-01 and speech-02 series models do not currently support Persian, Filipino, or Tamil.
Chinese, Chinese,Yue, English, Arabic, Russian, Spanish, French, Portuguese, German, Turkish, Dutch, Ukrainian, Vietnamese, Indonesian, Japanese, Italian, Korean, Thai, Polish, Romanian, Greek, Czech, Finnish, Hindi, Bulgarian, Danish, Hebrew, Malay, Persian, Slovak, Swedish, Croatian, Filipino, Hungarian, Norwegian, Slovenian, Catalan, Nynorsk, Tamil, Afrikaans, auto Voice effects configuration.
Supported audio formats:
mp3, wav, flacmp3Show child attributes
Controls whether subtitles are enabled. Default is false. Available for models: speech-2.8-hd, speech-2.8-turbo, speech-2.6-hd, speech-2.6-turbo, speech-02-hd, speech-02-turbo, speech-01-hd, speech-01-turbo.
Subtitle granularity. Default is sentence. Options:
sentence: sentence-level timestampsword: word-level timestampsword_streaming: word-level timestamps optimized for streaming, only valid when stream=truesentence, word, word_streaming Controls the output format. Options: [url, hex]. Default is hex. Only effective in non-streaming scenarios. In streaming, only hex is supported. Returned url is valid for 24 hours.
url, hex The synthesized audio data object. The returned data object may be null, so a null check is required.
Show child attributes
The session ID, used for troubleshooting and support.
Additional audio information.
Show child attributes
Status code and details of this request.
Show child attributes