Skip to main content
POST
/
v1
/
t2a_v2
curl --request POST \
  --url https://api.minimax.io/v1/t2a_v2 \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: <content-type>' \
  --data '
{
  "model": "speech-2.8-hd",
  "text": "Omg(sighs), the real danger is not that computers start thinking like people, but that people start thinking like computers. Computers can only help us with simple tasks.",
  "stream": false,
  "language_boost": "auto",
  "output_format": "hex",
  "voice_setting": {
    "voice_id": "English_expressive_narrator",
    "speed": 1,
    "vol": 1,
    "pitch": 0
  },
  "pronunciation_dict": {
    "tone": [
      "Omg/Oh my god"
    ]
  },
  "audio_setting": {
    "sample_rate": 32000,
    "bitrate": 128000,
    "format": "mp3",
    "channel": 1
  },
  "voice_modify": {
    "pitch": 0,
    "intensity": 0,
    "timbre": 0,
    "sound_effects": "spacious_echo"
  }
}
'
{
  "data": {
    "audio": "<hex encoded audio>",
    "status": 2
  },
  "extra_info": {
    "audio_length": 11124,
    "audio_sample_rate": 32000,
    "audio_size": 179926,
    "bitrate": 128000,
    "word_count": 163,
    "invisible_character_ratio": 0,
    "usage_characters": 163,
    "audio_format": "mp3",
    "audio_channel": 1
  },
  "trace_id": "01b8bf9bb7433cc75c18eee6cfa8fe21",
  "base_resp": {
    "status_code": 0,
    "status_msg": "success"
  }
}

Documentation Index

Fetch the complete documentation index at: https://platform.minimax.io/docs/llms.txt

Use this file to discover all available pages before exploring further.

Alternative Endpoint, Reduced Time to First Audio (TTFA): https://api-uw.minimax.io/v1/t2a_v2

Authorizations

Authorization
string
header
required

HTTP: Bearer Auth

Headers

Content-Type
enum<string>
default:application/json
required

The media type of the request body. Must be set to application/json to ensure the data is sent in JSON format.

Available options:
application/json

Body

application/json
model
enum<string>
required

The speech synthesis model version to use. Options include: speech-2.8-hd, speech-2.8-turbo, speech-2.6-hd, speech-2.6-turbo, speech-02-hd, speech-02-turbo, speech-01-hd, speech-01-turbo.

Available options:
speech-2.8-hd,
speech-2.8-turbo,
speech-2.6-hd,
speech-2.6-turbo,
speech-02-hd,
speech-02-turbo,
speech-01-hd,
speech-01-turbo
text
string
required

The text to be converted into speech. Must be less than 10,000 characters.

  • For texts over 3,000 characters, streaming output is recommended.

  • Paragraph breaks should be marked with newline characters.

  • Pause control: You can customize speech pauses by adding markers in the form <#x#>, where x is the pause duration in seconds. Valid range: [0.01, 99.99], up to two decimal places. Pause markers must be placed between speakable text segments and cannot be used consecutively.

  • Inline pronunciation: Wrap Mandarin Pinyin (with tone number 1–5) or IPA symbols or Cantonese Jyutping (with tone number 1–6) in half-width parentheses to override pronunciation of the target word or polyphonic character.

    • "The word live is pronounced (lɪv) as a verb and (laɪv) as an adjective."
    • "This is (he2)平, not (huo4)面."
    • "去街市買啲(sung3)。"
  • Interjection tags: Only supported when using speech-2.8-hd or speech-2.8-turbo models. Supported interjections: (laughs), (chuckle), (coughs), (clear-throat), (groans), (breath), (pant), (inhale), (exhale), (gasps), (sniffs), (sighs), (snorts), (burps), (lip-smacking), (humming), (hissing), (emm), (sneezes).

stream
boolean

Whether to enable streaming output. Defaults to false.

stream_options
object
voice_setting
object
audio_setting
object
pronunciation_dict
object
timbre_weights
object[]

Timbre weights (legacy field)

language_boost
enum<string>

Controls whether recognition for specific minority languages and dialects is enhanced. Default is null. If the language type is unknown, set to "auto" and the model will automatically detect it.

Note: The speech-01 and speech-02 series models do not currently support Persian, Filipino, or Tamil.

Available options:
Chinese,
Chinese,Yue,
English,
Arabic,
Russian,
Spanish,
French,
Portuguese,
German,
Turkish,
Dutch,
Ukrainian,
Vietnamese,
Indonesian,
Japanese,
Italian,
Korean,
Thai,
Polish,
Romanian,
Greek,
Czech,
Finnish,
Hindi,
Bulgarian,
Danish,
Hebrew,
Malay,
Persian,
Slovak,
Swedish,
Croatian,
Filipino,
Hungarian,
Norwegian,
Slovenian,
Catalan,
Nynorsk,
Tamil,
Afrikaans,
auto
voice_modify
object

Voice effects configuration.

Supported audio formats:

  • Non-streaming: mp3, wav, flac
  • Streaming: mp3
subtitle_enable
boolean
default:false

Controls whether subtitles are enabled. Default is false. Available for models: speech-2.8-hd, speech-2.8-turbo, speech-2.6-hd, speech-2.6-turbo, speech-02-hd, speech-02-turbo, speech-01-hd, speech-01-turbo.

subtitle_type
enum<string>
default:sentence

Subtitle granularity. Default is sentence. Options:

  • sentence: sentence-level timestamps
  • word: word-level timestamps
  • word_streaming: word-level timestamps optimized for streaming, only valid when stream=true
Available options:
sentence,
word,
word_streaming
output_format
enum<string>
default:hex

Controls the output format. Options: [url, hex]. Default is hex. Only effective in non-streaming scenarios. In streaming, only hex is supported. Returned url is valid for 24 hours.

Available options:
url,
hex

Response

data
object

The synthesized audio data object. The returned data object may be null, so a null check is required.

trace_id
string

The session ID, used for troubleshooting and support.

extra_info
object

Additional audio information.

base_resp
object

Status code and details of this request.