Reasoning models

Overview

DeepSeek-R1 is a high-level language model developed by deepseek-ai, designed to enhance the accuracy of final answers by outputting the reasoning chain content (reasoning_content). When using this model, it is recommended to upgrade the OpenAI SDK to support new parameters.

Supported Model List:

  • deepseek-ai/DeepSeek-R1
  • Pro/deepseek-ai/DeepSeek-R1
  • deepseek-ai/DeepSeek-R1-Distill-Llama-70B
  • eepseek-ai/DeepSeek-R1-Distill-Qwen-32B
  • deepseek-ai/DeepSeek-R1-Distill-Qwen-14B
  • deepseek-ai/DeepSeek-R1-Distill-Llama-8B
  • deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
  • deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
  • Pro/deepseek-ai/DeepSeek-R1-Distill-Llama-8B
  • Pro/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
  • Pro/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B

Installation and upgrade

Before using DeepSeek-R1, ensure that you have the latest version of the OpenAI SDK installed. You can upgrade it using the following command:

pip3 install -U openai

API parameters

  • Input parameters:

    • max_tokens:Maximum length of the response (including reasoning chain output), Among the models listed above, the maximum value of max_tokens for deepseek-ai/DeepSeek-R1 is 8K, and for other models, it is 16K.
  • Return parameters:

    • reasoning_content:Reasoning chain content, at the same level as content.

    • content:Final answer content

  • Usage Recommendations

    • Set the temperature within the range of 0.5-0.7 (0.6 is recommended) to prevent endless repetitions or incoherent outputs.
    • Avoid adding a system prompt; all instructions should be contained within the user prompt
    • For mathematical problems, it is advisable to include a directive in your prompt such as: “Please reason step by step, and put your final answer within \boxed.”
    • the DeepSeek-R1 series models tend to bypass thinking pattern (i.e., outputting "<think>\n\n</think>") when responding to certain queries, which can adversely affect the model’s performance. To ensure that the model engages in thorough reasoning, we recommend enforcing the model to initiate its response with "<think>\n" at the beginning of every output.

Context concatenation

During each round of the conversation, the model outputs the reasoning chain content (reasoning_content) and the final answer (content). In the next round of the conversation, the reasoning chain content from the previous rounds will not be concatenated to the context.

OpenAI request examples

Stream Mode Request

from openai import OpenAI

url = 'https://api.siliconflow.cn/v1/'
api_key = 'your api_key'

client = OpenAI(
    base_url=url,
    api_key=api_key
)

# 发送带有流式输出的请求
content = ""
reasoning_content=""
messages = [
    {"role": "user", "content": "奥运会的传奇名将有哪些?"}
]
response = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-R1",
    messages=messages,
    stream=True,  # 启用流式输出
    max_tokens=4096
)
# 逐步接收并处理响应
for chunk in response:
    if chunk.choices[0].delta.content:
        content += chunk.choices[0].delta.content
    if chunk.choices[0].delta.reasoning_content:
        reasoning_content += chunk.choices[0].delta.reasoning_content

# Round 2
messages.append({"role": "assistant", "content": content})
messages.append({'role': 'user', 'content': "继续"})
response = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-R1",
    messages=messages,
    stream=True
)

Non-Stream Mode Request

from openai import OpenAI
url = 'https://api.siliconflow.cn/v1/'
api_key = 'your api_key'

client = OpenAI(
    base_url=url,
    api_key=api_key
)

# 发送非流式输出的请求
messages = [
    {"role": "user", "content": "奥运会的传奇名将有哪些?"}
]
response = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-R1",
    messages=messages,
    stream=False, 
    max_tokens=4096
)
content = response.choices[0].message.content
reasoning_content = response.choices[0].message.reasoning_content

# Round 2
messages.append({"role": "assistant", "content": content})
messages.append({'role': 'user', 'content': "继续"})
response = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-R1",
    messages=messages,
    stream=False
)

Notes

  • API Key: Ensure you use the correct API key for authentication.
  • Stream Mode: Stream mode is suitable for scenarios where responses need to be received incrementally, while non-stream mode is suitable for scenarios where a complete response is needed at once.

Common questions

  • How to obtain the API key?

    Please visit SiliconFlow to register and obtain the API key.

  • How to handle long text?

    You can adjust the max_tokens parameter to control the length of the output, but please note that the maximum length is 16K.