🎉 MiniMax-M2.7: Peak Performance. Ultimate Value. Master the Complex. ➔ Try Now.





curl --request POST \
--url https://api.minimax.io/v1/t2a_async_v2 \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: <content-type>' \
--data '
{
"model": "speech-2.8-hd",
"text": "Omg(sighs), the real danger is not that computers start thinking like people, but that people start thinking like computers. Computers can only help us with simple tasks.",
"language_boost": "auto",
"voice_setting": {
"voice_id": "English_expressive_narrator",
"speed": 1,
"vol": 1,
"pitch": 1
},
"pronunciation_dict": {
"tone": [
"Omg/Oh my god"
]
},
"audio_setting": {
"audio_sample_rate": 32000,
"bitrate": 128000,
"format": "mp3",
"channel": 2
},
"voice_modify": {
"pitch": 0,
"intensity": 0,
"timbre": 0,
"sound_effects": "spacious_echo"
}
}
'{
"task_id": 95157322514444,
"task_token": "eyJhbGciOiJSUz",
"file_id": 95157322514444,
"usage_characters": 101,
"base_resp": {
"status_code": 0,
"status_msg": "success"
}
}Use this API to create an asynchronous Text-to-Speech task.
curl --request POST \
--url https://api.minimax.io/v1/t2a_async_v2 \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: <content-type>' \
--data '
{
"model": "speech-2.8-hd",
"text": "Omg(sighs), the real danger is not that computers start thinking like people, but that people start thinking like computers. Computers can only help us with simple tasks.",
"language_boost": "auto",
"voice_setting": {
"voice_id": "English_expressive_narrator",
"speed": 1,
"vol": 1,
"pitch": 1
},
"pronunciation_dict": {
"tone": [
"Omg/Oh my god"
]
},
"audio_setting": {
"audio_sample_rate": 32000,
"bitrate": 128000,
"format": "mp3",
"channel": 2
},
"voice_modify": {
"pitch": 0,
"intensity": 0,
"timbre": 0,
"sound_effects": "spacious_echo"
}
}
'{
"task_id": 95157322514444,
"task_token": "eyJhbGciOiJSUz",
"file_id": 95157322514444,
"usage_characters": 101,
"base_resp": {
"status_code": 0,
"status_msg": "success"
}
}Documentation Index
Fetch the complete documentation index at: https://platform.minimax.io/docs/llms.txt
Use this file to discover all available pages before exploring further.
title Field Output Files (if this field is empty, no files will be generated)
content Field Output Files (if this field is empty, no files will be generated)
extra Field Output Files (if this field is empty, no files will be generated)
HTTP: Bearer Auth
Bearer API_key, can be found in Account Management>API Keys.The media type of the request body. Must be set to application/json to ensure the data is sent in JSON format.
application/json Model version to call. Supported
speech-2.8-hd, speech-2.8-turbo, speech-2.6-hd, speech-2.6-turbo, speech-02-hd, speech-02-turbo, speech-01-hd, speech-01-turbo Text content to convert to audio, max length 50,000 characters. Mutually exclusive with text_file_id (one is required).
speech-2.8-hd or speech-2.8-turbo models. Supported interjections: (laughs), (chuckle), (coughs), (clear-throat), (groans), (breath), (pant), (inhale), (exhale), (gasps), (sniffs), (sighs), (snorts), (burps), (lip-smacking), (humming), (hissing), (emm), (whistles), (sneezes), (crying), (applause).ID of the text file to synthesize. Max 1,000,000 characters. Supported formats: txt, zip. Mutually exclusive with text (one is required).
txt file: Supports customizing speech pauses by adding markers in the form <#x#>, where x is the pause duration in seconds. Valid range: [0.01, 99.99], up to two decimal places. Pause markers must be placed between speakable text segments and cannot be used consecutively.
zip file: Must contain files of the same type (txt or json).
"title", "content", "extra"] fields. Each non-empty field generates an audio file, subtitles, and metadata and would be stored in a folder.Show child attributes
Show child attributes
Show child attributes
Controls whether recognition for specific minority languages and dialects is enhanced. Default is null. If the language type is unknown, set to "auto" and the model will automatically detect it.
Note: The speech-01 and speech-02 series models do not currently support Persian, Filipino, or Tamil.
Chinese, Chinese,Yue, English, Arabic, Russian, Spanish, French, Portuguese, German, Turkish, Dutch, Ukrainian, Vietnamese, Indonesian, Japanese, Italian, Korean, Thai, Polish, Romanian, Greek, Czech, Finnish, Hindi, Bulgarian, Danish, Hebrew, Malay, Persian, Slovak, Swedish, Croatian, Filipino, Hungarian, Norwegian, Slovenian, Catalan, Nynorsk, Tamil, Afrikaans, auto Voice effects configuration.
Supported audio formats: mp3, wav, flac. (Other formats such as pcm, pcmu_raw, pcmu_wav, and opus are not supported and will be rejected with a parameter error.)
Show child attributes
Task ID
The corresponding audio file ID is returned once the task is successfully created.
When the task is complete, you can use the file_id to call the File (Retrieve) API to download the file.
If the request fails, this field will not be returned.
Note: The download URL is valid for 9 hours (32,400 seconds) from the time it is generated. After expiration, the file will no longer be available and the generated data will be lost, so please ensure you download it within the validity period.
Token for completing the task
Number of billed characters
Status code and details.
Show child attributes