1. Model Output Encoding Issues
Currently, some models are prone to encoding issues when parameters are not set. In such cases, you can try setting the parameters such astemperature
, top_k
, top_p
, and frequency_penalty
.
Modify the payload as follows, adjusting as needed for different languages:
2.Explanation of max_tokens
The max_tokens is equal to the context length. Since some model inference services are still being updated, please do not set max_tokens to the maximum value (context length) when making a request. It is recommended to reserve around 10k as space for input content.3.Explanation of context_length
The context_length varies for different LLM models. You can search for the specific model on the Models to view the model details.4. What Are the Differences Between Pro and Non-Pro Models
- For some models, the platform provides both a free version and a paid version. The free version is named as is, while the paid version is prefixed with “Pro/” to distinguish it. The free version has fixed Rate Limits, whereas the paid version has variable Rate Limits. For specific rules, please refer to: Rate Limits.
-
For the
DeepSeek R1
andDeepSeek V3
models, the platform distinguishes and names them based on the payment method. The Pro version only supports payment with recharged balance, while the non-Pro version supports payment with both granted balance and recharged balance.
5. Are There Any Time and Quality Requirements for Custom Voice Samples in the Voice Models
- For cosyvoice2, the custom voice sample must be less than 30 seconds.
6. Output Truncation Issues in Model Inference
Here are several aspects to troubleshoot the issue:- When encountering output truncation through API requests:
- Max Tokens Setting: Set the max_token to an appropriate value. If the output exceeds the max_token, it will be truncated. For the deepseek R1 series, the max_token can be set up to 16,384.
- Stream Request Setting: In non-stream requests, long output content is prone to 504 timeout issues.
- Client Timeout Setting: Increase the client timeout to prevent truncation before the output is fully completed.
- When encountering output truncation through third-party client requests:
- CherryStdio has a default max_tokens of 4,096. Users can enable the “Enable Message Length Limit” switch to set the max_token to an appropriate value.

7. Troubleshooting 429 Error During Model Usage
Here are some areas to check for the issue:- General Users: Verify your user tier and the corresponding Rate Limits (rate limits) for the model. If the request exceeds the Rate Limits, consider retrying after some time.
- Dedicated Instance Users: Dedicated instances typically do not have Rate Limits. If a 429 error occurs, first confirm whether the correct model name for the dedicated instance is being called, and check if the api_key used matches the dedicated instance.
8. Account Balance Insufficient Despite Successful Recharge
Here are some areas to check for the issue:- Ensure the api_key being used matches the account that was just recharged.
- If the api_key is correct, there may be a network delay during the recharge process. Consider waiting a few minutes and then retry.
9. Unable to Access Certain Models Despite Completing Real-name Verification
Here are some areas to check for the issue:- Confirm that the api_key being used matches the account that completed real-name verification.
- If the api_key is correct, visit the Real-name Verification page to check the verification status. If the status shows “Verification in Progress,” you can try canceling and re-verifying.
10. Issues with fnlp/MOSS-TTSD-v0.5
- The model tends to produce errors when the input text is too short
- When using this model for dialogue synthesis, the input text format should be as follows:
- [S1]Indicates Speaker 1 is speaking.[S2]Indicates Speaker 2 is speaking.
If you encounter other issues, please click on the SiliconCloud MaaS Online Requirement Collection Form to provide feedback.