Rate limits

1. Rate limits overview

1.1 What are rate limits

Rate limits refer to the rules governing the frequency of API requests a user can make to the SiliconCloud platform services within a specified time period.

1.2 Why implement rate limits

Rate limits are a common practice for APIs, and the reasons for implementing them include:

Ensuring Fair and Efficient Resource Use: Ensuring that resources are used fairly. Preventing some users from making too many requests, which could affect the normal usage experience of other users.
Avoiding Overload: Enhancing service reliability. Helps manage overall platform load to avoid performance issues due to sudden increases in requests.
Security Protection: Preventing malicious attacks that could overload the platform and cause service interruptions.

1.3 Rate limits metrics

Currently, rate limits are measured by four metrics:

RPM (requests per minute, the maximum number of requests that can be initiated per minute)
RPH (requests per hour, the maximum number of requests allowed per hour)
RPD (Requests per day, daily maximum number of requests allowed)
TPM (tokens per minute, the maximum number of tokens allowed per minute)
TPD (Tokens per day, daily maximum number of tokens allowed)
IPM (images per minute, the maximum number of images generated per minute)
IPD (images per day, the maximum number of images generated per day)

1.4 Rate limits metrics for different models

Model name	Rate limits metrics	Current metrics
Language model (Chat)	RPM、 TPM	RPM=1000-10000 TPM=50000-5000000
Vector model (embedding)	RPM、 TPM	RPM:2000-10000 TPM:500000-10000000
Re-ranking model (reranker)	RPM、 TPM	RPM:2000 TPM:500000
Image generation model (image)	IPM、IPD	IPM:2- IPD:400-
Multimodal model (multimodal models)	-	-

Rate limits may trigger based on whichever metric (RPM、RPH、RPD、TPM、TPD、IPM、IPD) reaches its peak first. For example, with an RPM limit of 20 and a TPM limit of 200K, if an account sends 20 requests to ChatCompletions in a minute, each with 100 tokens, the limit will be triggered even if the account did not use up 200K tokens in these 20 requests.

1.5 Rate limits subject

Rate limits are defined at the user account level, not at the API key level.
Each model separately sets its own Rate Limits. Exceeding the Rate Limits for one model does not affect the normal use of other models.

2. Rate limits rules

The Rate limits for free models are fixed values, while those for paid models vary based on the account’s usage level and are displayed in the Rate Limits section.
For the same usage level, the peak Rate Limits vary depending on the model category and the size of the model parameters.

2.1 Free model rate limits

After verifying your identity, you can use all free models.
Free model calls are free, and you will see the cost of these models as 0 in your account bill.
The Rate limits for free models are fixed. Some models are available in both free and paid versions. The free version is named after the original name, while the paid version is prefixed with “Pro/” to distinguish it. For example, the free version of Qwen2.5-7B-Instruct is named “Qwen/Qwen2.5-7B-Instruct,” and the paid version is named “Pro/Qwen/Qwen2.5-7B-Instruct.”

2.2 Paid model rate limits

You are charged based on usage. API calls are included in your account bill.
Rate Limits are tiered based on the account’s usage level. The peak Rate Limits increase with the usage level.
For the same usage level, the peak Rate Limits vary depending on the model category and the size of the model parameters.

2.3 User usage level and rate limits

The platform categorizes accounts into different usage levels based on the monthly consumption amount. Each level has its own Rate Limits standards. When the monthly consumption reaches the criteria for a higher level, the account is automatically upgraded to that level. The upgrade takes effect immediately and provides more generous rate limits.

Monthly consumption amount: This includes both the amount you recharge and any gifted amounts.
Level setting: The highest consumption amount between the previous natural month and the current month (from the 1st to today) is used to determine the corresponding usage level. New users start at L0.

Usage level	Qualification (in RMB)
L0	Monthly highest consumption amount < ¥50
L1	¥50 ≤ Monthly highest consumption amount < ¥200
L2	¥200 ≤ Monthly highest consumption amount < ¥2000
L3	¥2000 ≤ Monthly highest consumption amount < ¥5000
L4	¥5000 ≤ Monthly highest consumption amount < ¥10000
L5	¥10000 ≤ Monthly highest consumption amount

2.4 Specific model rate limits

The platform currently offers five categories: text generation, image generation, vectorization, re-ranking, and speech. Specific model rate limits can be found in the models.

3. Handling exceeding rate limits

3.1 Error messages for exceeding rate limits

If the API call exceeds the Rate Limits, the user’s request will fail due to exceeding the Rate Limits. Users need to wait until the Rate Limits conditions are met before they can call again. The corresponding HTTP error message is:

    HTTP/1.1 429
    Too Many Requests
    Content Type: application/json
    Request was rejected due to rate limiting. If you want more, please contact contact@siliconflow.cn

3.2 Handling exceeding rate limits

You can refer to the Handling Rate Limits example to avoid errors under existing rate limits.
You can also increase your usage level to increase the peak Rate Limits for your models.

4. How to increase model rate limits

4.1 Ways to increase rate limits

Automatic upgrade: You can increase your monthly consumption to increase your monthly consumption amount. When the consumption meets the criteria for a higher level, the account will be automatically upgraded.
Quick Upgrade with Package: If you need to quickly reach a higher usage level and increase the peak Rate Limits, you can purchase a package to boost your usage level.

4.2 Package purchase details

Online purchase: Please go to the platform to purchase the package online.
Validity Period: Packages take effect immediately after purchase and are valid for the current month (N) and the next natural month (N+1). Starting from the month after next (N+2), the account’s latest usage level will be recalculated based on the consumption of the previous month (N+1).
Payment method: Packages can only be paid with the platform’s recharge balance and cannot be paid with gifted balance.
Invoice: For details on how to issue an invoice for packages, refer to the invoice section.
Exclusive instances: Packages are not applicable for exclusive instance needs. If you have such needs, please contact your exclusive account manager.

GET STARTED

CAPABILITIES

FEATURES

SILICONFLOW PRODUCT SUITE

Rate limits

1. Rate limits overview

1.1 What are rate limits

1.2 Why implement rate limits

1.3 Rate limits metrics

1.4 Rate limits metrics for different models

1.5 Rate limits subject

2. Rate limits rules

2.1 Free model rate limits

2.2 Paid model rate limits

2.3 User usage level and rate limits

2.4 Specific model rate limits

3. Handling exceeding rate limits

3.1 Error messages for exceeding rate limits

3.2 Handling exceeding rate limits

4. How to increase model rate limits

4.1 Ways to increase rate limits

4.2 Package purchase details

4.3 Other scenarios

GET STARTED

CAPABILITIES

FEATURES

RATE LIMITS

SILICONFLOW PRODUCT SUITE

​1. Rate limits overview

​1.1 What are rate limits

​1.2 Why implement rate limits

​1.3 Rate limits metrics

​1.4 Rate limits metrics for different models

​1.5 Rate limits subject

​2. Rate limits rules

​2.1 Free model rate limits

​2.2 Paid model rate limits

​2.3 User usage level and rate limits

​2.4 Specific model rate limits

​3. Handling exceeding rate limits

​3.1 Error messages for exceeding rate limits

​3.2 Handling exceeding rate limits

​4. How to increase model rate limits

​4.1 Ways to increase rate limits

​4.2 Package purchase details

​4.3 Other scenarios

1. Rate limits overview

1.1 What are rate limits

1.2 Why implement rate limits

1.3 Rate limits metrics

1.4 Rate limits metrics for different models

1.5 Rate limits subject

2. Rate limits rules

2.1 Free model rate limits

2.2 Paid model rate limits

2.3 User usage level and rate limits

2.4 Specific model rate limits

3. Handling exceeding rate limits

3.1 Error messages for exceeding rate limits

3.2 Handling exceeding rate limits

4. How to increase model rate limits

4.1 Ways to increase rate limits

4.2 Package purchase details

4.3 Other scenarios