Rate limits
1. Rate limits overview
1.1 What are rate limits
Rate limits refer to the rules governing the frequency of API requests a user can make to the SiliconCloud platform services within a specified time period.
1.2 Why implement rate limits
Rate limits are a common practice for APIs, and the reasons for implementing them include:
- Ensuring Fair and Efficient Resource Use: Ensuring that resources are used fairly. Preventing some users from making too many requests, which could affect the normal usage experience of other users.
- Avoiding Overload: Enhancing service reliability. Helps manage overall platform load to avoid performance issues due to sudden increases in requests.
- Security Protection: Preventing malicious attacks that could overload the platform and cause service interruptions.
1.3 Rate limits metrics
Currently, rate limits are measured by four metrics:
- RPM (requests per minute, the maximum number of requests that can be initiated per minute)
- RPH (requests per hour, the maximum number of requests allowed per hour)
- RPD (Requests per day, daily maximum number of requests allowed)
- TPM (tokens per minute, the maximum number of tokens allowed per minute)
- TPD (Tokens per day, daily maximum number of tokens allowed)
- IPM (images per minute, the maximum number of images generated per minute)
- IPD (images per day, the maximum number of images generated per day)
1.4 Rate limits metrics for different models
Model name | Rate limits metrics | Current metrics |
---|---|---|
Language model (Chat) | RPM、 TPM | RPM=1000-10000 TPM=50000-5000000 |
Vector model (embedding) | RPM、 TPM | RPM:2000-10000 TPM:500000-10000000 |
Re-ranking model (reranker) | RPM、 TPM | RPM:2000 TPM:500000 |
Image generation model (image) | IPM、IPD | IPM:2- IPD:400- |
Multimodal model (multimodal models) | - | - |
Rate limits may trigger based on whichever metric (RPM、RPH、RPD、TPM、TPD、IPM、IPD) reaches its peak first.
For example, with an RPM limit of 20 and a TPM limit of 200K, if an account sends 20 requests to ChatCompletions in a minute, each with 100 tokens, the limit will be triggered even if the account did not use up 200K tokens in these 20 requests.
1.5 Rate limits subject
- Rate limits are defined at the user account level, not at the API key level.
- Each model separately sets its own Rate Limits. Exceeding the Rate Limits for one model does not affect the normal use of other models.
2. Rate limits rules
- The Rate limits for free models are fixed values, while those for paid models vary based on the account’s usage level and are displayed in the Rate Limits section.
- For the same usage level, the peak Rate Limits vary depending on the model category and the size of the model parameters.
2.1 Free model rate limits
- After verifying your identity, you can use all free models.
- Free model calls are free, and you will see the cost of these models as 0 in your account bill.
- The Rate limits for free models are fixed. Some models are available in both free and paid versions. The free version is named after the original name, while the paid version is prefixed with “Pro/” to distinguish it. For example, the free version of Qwen2.5-7B-Instruct is named “Qwen/Qwen2.5-7B-Instruct,” and the paid version is named “Pro/Qwen/Qwen2.5-7B-Instruct.”
2.2 Paid model rate limits
- You are charged based on usage. API calls are included in your account bill.
- Rate Limits are tiered based on the account’s usage level. The peak Rate Limits increase with the usage level.
- For the same usage level, the peak Rate Limits vary depending on the model category and the size of the model parameters.
2.3 User usage level and rate limits
The platform categorizes accounts into different usage levels based on the monthly consumption amount. Each level has its own Rate Limits standards. When the monthly consumption reaches the criteria for a higher level, the account is automatically upgraded to that level. The upgrade takes effect immediately and provides more generous rate limits.
- Monthly consumption amount: This includes both the amount you recharge and any gifted amounts.
- Level setting: The highest consumption amount between the previous natural month and the current month (from the 1st to today) is used to determine the corresponding usage level. New users start at L0.
Usage level | Qualification (in RMB) |
---|---|
L0 | Monthly highest consumption amount < ¥50 |
L1 | ¥50 ≤ Monthly highest consumption amount < ¥200 |
L2 | ¥200 ≤ Monthly highest consumption amount < ¥2000 |
L3 | ¥2000 ≤ Monthly highest consumption amount < ¥5000 |
L4 | ¥5000 ≤ Monthly highest consumption amount < ¥10000 |
L5 | ¥10000 ≤ Monthly highest consumption amount |
2.4 Specific model rate limits
The platform currently offers five categories: text generation, image generation, vectorization, re-ranking, and speech. Specific model rate limits can be found in the model square.
2.5 deepseek-ai/DeepSeek-R1
和 deepseek-ai/DeepSeek-V3
Rate Limits
To ensure the quality of platform services and the rational allocation of resources, the following adjustments to Rate Limits policies are now in effect:
- Adjustments
New RPH Limit (Requests Per Hour, Per Hour Requests)
- Model Scope:deepseek-ai/DeepSeek-R1, deepseek-ai/DeepSeek-V3
- Applicable Users: All users
- Limit Standard: 30 requests/hour
2.New RPD Limit (Requests Per Day, Per Day Requests)
- Model Scope: deepseek-ai/DeepSeek-R1, deepseek-ai/DeepSeek-V3
- Applicable Users: Users who have not completed real-name authentication
- Limit Standard: 100 requests/day
Please note that these policies may be adjusted at any time based on traffic and load changes. Silicon Flowing Reserves the right to interpret these policies.
3. Handling exceeding rate limits
3.1 Error messages for exceeding rate limits
If the API call exceeds the Rate Limits, the user’s request will fail due to exceeding the Rate Limits. Users need to wait until the Rate Limits conditions are met before they can call again. The corresponding HTTP error message is:
3.2 Handling exceeding rate limits
- You can refer to the Handling Rate Limits example to avoid errors under existing rate limits.
- You can also increase your usage level to increase the peak Rate Limits for your models.
4. How to increase model rate limits
4.1 Ways to increase rate limits
- Automatic upgrade: You can increase your monthly consumption to increase your monthly consumption amount. When the consumption meets the criteria for a higher level, the account will be automatically upgraded.
- Quick Upgrade with Package: If you need to quickly reach a higher usage level and increase the peak Rate Limits, you can purchase a package to boost your usage level.
4.2 Package purchase details
- Online purchase: Please go to the platform to purchase the package online.
- Validity Period: Packages take effect immediately after purchase and are valid for the current month (N) and the next natural month (N+1). Starting from the month after next (N+2), the account’s latest usage level will be recalculated based on the consumption of the previous month (N+1).
- Payment method: Packages can only be paid with the platform’s recharge balance and cannot be paid with gifted balance.
- Invoice: For details on how to issue an invoice for packages, refer to the invoice section.
- Exclusive instances: Packages are not applicable for exclusive instance needs. If you have such needs, please contact your exclusive account manager.
4.3 Other scenarios
- Contact Us: For scenarios not covered above, please contact us.