Reasoning models

Overview

DeepSeek-R1 is a high-level language model developed by deepseek-ai, designed to enhance the accuracy of final answers by outputting the reasoning chain content (reasoning_content). When using this model, it is recommended to upgrade the OpenAI SDK to support new parameters.

Supported Model List:

Qwen/Qwen3-30B-A3B
Qwen/Qwen3-32B
Qwen/Qwen3-14B
Qwen/Qwen3-8B
Qwen/Qwen3-235B-A22B
THUDM/GLM-Z1-32B-0414
THUDM/GLM-Z1-Rumination-32B-0414
THUDM/GLM-Z1-9B-0414
deepseek-ai/DeepSeek-R1
Pro/deepseek-ai/DeepSeek-R1
eepseek-ai/DeepSeek-R1-Distill-Qwen-32B
deepseek-ai/DeepSeek-R1-Distill-Qwen-14B
deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
Pro/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
Pro/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B

Installation and upgrade

Before using DeepSeek-R1, ensure that you have the latest version of the OpenAI SDK installed. You can upgrade it using the following command:

pip3 install -U openai

DeepSeek-R1 API parameters

Input parameters:
- max_tokens：The maximum length of the response (including the thought chain output), as a reference: The DeepSeek-R1 series models support a maximum output length (max_tokens) of 16k tokens. The QwQ-32B model supports a maximum context length and maximum output length of 32K tokens each. However, when making API requests, do not directly set max_tokens to 32K. Instead, leave it empty or set it to a value less than 32K to avoid errors due to input tokens occupying the context length.
Return parameters:
- reasoning_content：Reasoning chain content, at the same level as content.
- content：Final answer content
Usage Recommendations
- Set the temperature within the range of 0.5-0.7 (0.6 is recommended) to prevent endless repetitions or incoherent outputs.
- Set the value of top_p to 0.95.
- Avoid adding a system prompt; all instructions should be contained within the user prompt
- For mathematical problems, it is advisable to include a directive in your prompt such as: “Please reason step by step, and put your final answer within \boxed.”
- When evaluating model performance, it is recommended to conduct multiple tests and average the results.
- the DeepSeek-R1 series models tend to bypass thinking pattern (i.e., outputting "<think>\n\n</think>") when responding to certain queries, which can adversely affect the model’s performance. To ensure that the model engages in thorough reasoning, we recommend enforcing the model to initiate its response with "<think>\n" at the beginning of every output.

Context concatenation

During each round of the conversation, the model outputs the reasoning chain content (reasoning_content) and the final answer (content). In the next round of the conversation, the reasoning chain content from the previous rounds will not be concatenated to the context.

OpenAI request examples

Stream Mode Request

from openai import OpenAI

url = 'https://api.siliconflow.cn/v1/'
api_key = 'your api_key'

client = OpenAI(
    base_url=url,
    api_key=api_key
)

# 发送带有流式输出的请求
content = ""
reasoning_content=""
messages = [
    {"role": "user", "content": "奥运会的传奇名将有哪些？"}
]
response = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-R1",
    messages=messages,
    stream=True,  # 启用流式输出
    max_tokens=4096
)
# 逐步接收并处理响应
for chunk in response:
    if chunk.choices[0].delta.content:
        content += chunk.choices[0].delta.content
    if chunk.choices[0].delta.reasoning_content:
        reasoning_content += chunk.choices[0].delta.reasoning_content

# Round 2
messages.append({"role": "assistant", "content": content})
messages.append({'role': 'user', 'content': "继续"})
response = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-R1",
    messages=messages,
    stream=True
)

Non-Stream Mode Request

from openai import OpenAI
url = 'https://api.siliconflow.cn/v1/'
api_key = 'your api_key'

client = OpenAI(
    base_url=url,
    api_key=api_key
)

# 发送非流式输出的请求
messages = [
    {"role": "user", "content": "奥运会的传奇名将有哪些？"}
]
response = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-R1",
    messages=messages,
    stream=False, 
    max_tokens=4096
)
content = response.choices[0].message.content
reasoning_content = response.choices[0].message.reasoning_content

# Round 2
messages.append({"role": "assistant", "content": content})
messages.append({'role': 'user', 'content': "继续"})
response = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-R1",
    messages=messages,
    stream=False
)

Notes

API Key: Ensure you use the correct API key for authentication.
Stream Mode: Stream mode is suitable for scenarios where responses need to be received incrementally, while non-stream mode is suitable for scenarios where a complete response is needed at once.

Common questions

How to obtain the API key?

Please visit SiliconFlow to register and obtain the API key.
How to handle long text?

You can adjust the max_tokens parameter to control the length of the output, but please note that the maximum length is 16K.

GET STARTED

CAPABILITIES

FEATURES

RATE LIMITS

SILICONFLOW PRODUCT SUITE

Reasoning

Reasoning models

Overview

Installation and upgrade

DeepSeek-R1 API parameters

Context concatenation

OpenAI request examples

Stream Mode Request

Non-Stream Mode Request

Notes

Common questions

GET STARTED

CAPABILITIES

FEATURES

RATE LIMITS

SILICONFLOW PRODUCT SUITE

​Reasoning models

​Overview

​Installation and upgrade

​DeepSeek-R1 API parameters

​Context concatenation

​OpenAI request examples

​Stream Mode Request

​Non-Stream Mode Request

​Notes

​Common questions

Reasoning models

Overview

Installation and upgrade

DeepSeek-R1 API parameters

Context concatenation

OpenAI request examples

Stream Mode Request

Non-Stream Mode Request

Notes

Common questions