ext-to-Speech (TTS) models are AI models that convert text information into spoken output. These models generate natural and expressive speech from input text, suitable for various use cases:
online listeningof the above audio.When using the system-predefined voices in the request, you need to prepend the model name, such as:FunAudioLLM/CosyVoice2-0.5B:alex indicates the alex voice from the FunAudioLLM/CosyVoice2-0.5B model.
Note: Using user-predefined voices requires real-name authentication.
To ensure the quality of the generated voice, it is recommended that users upload a voice sample that is 8 to 10 seconds long, with clear pronunciation and no background noise or interference.
Note: The supported TTS models may be subject to change. Please filter by the “Speech” tag on the 「Models」to obtain the current list of supported models.