Other Issues
1. Model Output Encoding Issues
Currently, some models are prone to encoding issues when parameters are not set. In such cases, you can try setting the parameters such as temperature
, top_k
, top_p
, and frequency_penalty
.
Modify the payload as follows, adjusting as needed for different languages:
2.Explanation of max_tokens
For the LLM models provided by the platform:
-
The model with a max_tokens limit of
16384
:- Pro/deepseek-ai/DeepSeek-R1
- Qwen/QVQ-72B-Preview
- deepseek-ai/DeepSeek-R1-Distill-Llama-70B
- deepseek-ai/DeepSeek-R1-Distill-Qwen-32B
- deepseek-ai/DeepSeek-R1-Distill-Qwen-14B
- deepseek-ai/DeepSeek-R1-Distill-Llama-8B
- deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
- deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
- Pro/deepseek-ai/DeepSeek-R1-Distill-Llama-8B
- Pro/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
- Pro/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
-
The model with a max_tokens limit of
8192
:- Qwen/QwQ-32B-Preview
- AIDC-AI/Marco-o1
- deepseek-ai/DeepSeek-R1
-
The model with a max_tokens limit of
4096
:- Other LLM models aside from those mentioned above
If you have special requirements, please provide feedback by clicking on the SiliconCloud MaaS Online Requirement Collection Form.
3.Explanation of context_length
The context_length varies for different LLM models. You can search for the specific model on the Model Square to view the model details.
4. About DeepSeek-R1 and DeepSeek-V3 Model Calls Returning 429
-
Unverified Users
: Can only make100 requests
per day. If the daily limit of100 requests
is exceeded, a429
error will be returned with the message “Details: RPD limit reached. Could only send 100 requests per day without real name verification.” Real name verification can be used to unlock higher rate limits. -
Verified Users
: Have higher rate limits, with specific values referenced in the Model Square.
If the request limits are exceeded, a 429
error will also be returned.
5. What Are the Differences Between Pro and Non-Pro Models
-
For some models, the platform provides both a free version and a paid version. The free version is named as is, while the paid version is prefixed with “Pro/” to distinguish it. The free version has fixed Rate Limits, whereas the paid version has variable Rate Limits. For specific rules, please refer to: Rate Limits.
-
For the
DeepSeek R1
andDeepSeek V3
models, the platform distinguishes and names them based on the payment method. The Pro version only supports payment with recharged balance, while the non-Pro version supports payment with both granted balance and recharged balance.
6. Are There Any Time and Quality Requirements for Custom Voice Samples in the Voice Models
- For cosyvoice2, the custom voice sample must be less than 30 seconds.
- For GPT-SoVITS, the custom voice sample should be between 3 and 10 seconds.
- For fishaudio, there are no special restrictions.
To ensure the quality of the generated voice, it is recommended that users upload a voice sample that is 8 to 10 seconds long, with clear pronunciation and no background noise or interference.
7. Output Truncation Issues in Model Inference
Here are several aspects to troubleshoot the issue:
- When encountering output truncation through API requests:
- Max Tokens Setting: Set the max_token to an appropriate value. If the output exceeds the max_token, it will be truncated. For the deepseek R1 series, the max_token can be set up to 16,384.
- Stream Request Setting: In non-stream requests, long output content is prone to 504 timeout issues.
- Client Timeout Setting: Increase the client timeout to prevent truncation before the output is fully completed.
- When encountering output truncation through third-party client requests:
- CherryStdio has a default max_tokens of 4,096. Users can enable the “Enable Message Length Limit” switch to set the max_token to an appropriate value.