Reasoning
Reasoning models
Overview
DeepSeek-R1 is a high-level language model developed by deepseek-ai, designed to enhance the accuracy of final answers by outputting the reasoning chain content (reasoning_content). When using this model, it is recommended to upgrade the OpenAI SDK to support new parameters.
Supported Model List:
- deepseek-ai/DeepSeek-R1
- Pro/deepseek-ai/DeepSeek-R1
- eepseek-ai/DeepSeek-R1-Distill-Qwen-32B
- deepseek-ai/DeepSeek-R1-Distill-Qwen-14B
- deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
- deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
- Pro/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
- Pro/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
Installation and upgrade
Before using DeepSeek-R1, ensure that you have the latest version of the OpenAI SDK installed. You can upgrade it using the following command:
API parameters
-
Input parameters:
- max_tokens:The maximum length of the response (including the thought chain output), as a reference: The DeepSeek-R1 series models support a maximum output length (max_tokens) of 16k tokens. The QwQ-32B model supports a maximum context length and maximum output length of 32K tokens each. However, when making API requests, do not directly set max_tokens to 32K. Instead, leave it empty or set it to a value less than 32K to avoid errors due to input tokens occupying the context length.
-
Return parameters:
-
reasoning_content:Reasoning chain content, at the same level as content.
-
content:Final answer content
-
-
Usage Recommendations
- Set the temperature within the range of 0.5-0.7 (0.6 is recommended) to prevent endless repetitions or incoherent outputs.
- Set the value of top_p to 0.95.
- Avoid adding a system prompt; all instructions should be contained within the user prompt
- For mathematical problems, it is advisable to include a directive in your prompt such as: “Please reason step by step, and put your final answer within \boxed.”
- When evaluating model performance, it is recommended to conduct multiple tests and average the results.
- the DeepSeek-R1 series models tend to bypass thinking pattern (i.e., outputting
"<think>\n\n</think>"
) when responding to certain queries, which can adversely affect the model’s performance. To ensure that the model engages in thorough reasoning, we recommend enforcing the model to initiate its response with"<think>\n"
at the beginning of every output.
Context concatenation
During each round of the conversation, the model outputs the reasoning chain content (reasoning_content) and the final answer (content). In the next round of the conversation, the reasoning chain content from the previous rounds will not be concatenated to the context.
OpenAI request examples
Stream Mode Request
Non-Stream Mode Request
Notes
- API Key: Ensure you use the correct API key for authentication.
- Stream Mode: Stream mode is suitable for scenarios where responses need to be received incrementally, while non-stream mode is suitable for scenarios where a complete response is needed at once.
Common questions
-
How to obtain the API key?
Please visit SiliconFlow to register and obtain the API key.
-
How to handle long text?
You can adjust the max_tokens parameter to control the length of the output, but please note that the maximum length is 16K.