Returned File Information
The return result for a single file input is shown below.If the input is a compressed package containing multiple files, a corresponding folder will be generated for each file. The contents inside each folder are the same as those for a single file input.
Input File Type: txt File
- Output Files:
- Audio File: Format follows the request body settings.
- Subtitle File: Sentence-level subtitle information.
- Extra JSON File: Additional information related to the audio file.
Input File Type: json File
titleField Output Files (if this field is empty, no files will be generated)- Audio File: Format follows the request body settings
- Subtitle File: Sentence-level subtitle information
- Extra JSON File: Additional information related to the audio file
contentField Output Files (if this field is empty, no files will be generated)- Audio File: Format follows the request body settings
- Subtitle File: Sentence-level subtitle information
- Extra JSON File: Additional information related to the audio file
extraField Output Files (if this field is empty, no files will be generated)- Audio File: Format follows the request body settings
- Subtitle File: Sentence-level subtitle information
- Extra JSON File: Additional information related to the audio file
Authorizations
HTTP: Bearer Auth
- Security Scheme Type: http
- HTTP Authorization Scheme:
Bearer API_key, can be found in Account Management>API Keys.
Headers
The media type of the request body. Must be set to application/json to ensure the data is sent in JSON format.
application/json Body
Model version to call. Supported
speech-2.5-hd-preview, speech-2.5-turbo-preview, speech-02-hd, speech-02-turbo, speech-01-hd, speech-01-turbo Text content to convert to audio, max length 50,000 characters. Mutually exclusive with text_file_id (one is required).
ID of the text file to synthesize. Max 100,000 characters. Supported formats: txt, zip. Mutually exclusive with text (one is required).
-
txt file: Supports customizing speech pauses by adding markers in the form
<#x#>, wherexis the pause duration in seconds. Valid range:[0.01, 99.99], up to two decimal places. Pause markers must be placed between speakable text segments and cannot be used consecutively. -
zip file: Must contain files of the same type (txt or json).
- json format supports [
"title","content","extra"] fields. Each non-empty field generates an audio file, subtitles, and metadata and would be stored in a folder.
- json format supports [
Controls whether recognition for specific minority languages and dialects is enhanced. Default is null. If the language type is unknown, set to "auto" and the model will automatically detect it.
Chinese, Chinese,Yue, English, Arabic, Russian, Spanish, French, Portuguese, German, Turkish, Dutch, Ukrainian, Vietnamese, Indonesian, Japanese, Italian, Korean, Thai, Polish, Romanian, Greek, Czech, Finnish, Hindi, Bulgarian, Danish, Hebrew, Malay, Persian, Slovak, Swedish, Croatian, Filipino, Hungarian, Norwegian, Slovenian, Catalan, Nynorsk, Tamil, Afrikaans, auto Voice effect settings. Supported formats: mp3, flac.
Response
Task ID
The corresponding audio file ID is returned once the task is successfully created.
When the task is complete, you can use the file_id to call the File (Retrieve) API to download the file.
If the request fails, this field will not be returned.
Note: The download URL is valid for 9 hours (32,400 seconds) from the time it is generated. After expiration, the file will no longer be available and the generated data will be lost, so please ensure you download it within the validity period.
Token for completing the task
Number of billed characters
Status code and details.






