Reasoning
Reasoning models
Overview
DeepSeek-R1 is a high-level language model developed by deepseek-ai, designed to enhance the accuracy of final answers by outputting the reasoning chain content (reasoning_content). When using this model, it is recommended to upgrade the OpenAI SDK to support new parameters.
Supported Model List:
- deepseek-ai/DeepSeek-R1
- Pro/deepseek-ai/DeepSeek-R1
- deepseek-ai/DeepSeek-R1-Distill-Llama-70B
- eepseek-ai/DeepSeek-R1-Distill-Qwen-32B
- deepseek-ai/DeepSeek-R1-Distill-Qwen-14B
- deepseek-ai/DeepSeek-R1-Distill-Llama-8B
- deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
- deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
- Pro/deepseek-ai/DeepSeek-R1-Distill-Llama-8B
- Pro/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
- Pro/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
Installation and upgrade
Before using DeepSeek-R1, ensure that you have the latest version of the OpenAI SDK installed. You can upgrade it using the following command:
API parameters
-
Input parameters:
- max_tokens:Maximum length of the response (including reasoning chain output), Among the models listed above, the maximum value of max_tokens for deepseek-ai/DeepSeek-R1 is 8K, and for other models, it is 16K.
-
Return parameters:
-
reasoning_content:Reasoning chain content, at the same level as content.
-
content:Final answer content
-
-
Usage Recommendations
- Set the temperature within the range of 0.5-0.7 (0.6 is recommended) to prevent endless repetitions or incoherent outputs.
- Avoid adding a system prompt; all instructions should be contained within the user prompt
- For mathematical problems, it is advisable to include a directive in your prompt such as: “Please reason step by step, and put your final answer within \boxed.”
- the DeepSeek-R1 series models tend to bypass thinking pattern (i.e., outputting
"<think>\n\n</think>"
) when responding to certain queries, which can adversely affect the model’s performance. To ensure that the model engages in thorough reasoning, we recommend enforcing the model to initiate its response with"<think>\n"
at the beginning of every output.
Context concatenation
During each round of the conversation, the model outputs the reasoning chain content (reasoning_content) and the final answer (content). In the next round of the conversation, the reasoning chain content from the previous rounds will not be concatenated to the context.
OpenAI request examples
Stream Mode Request
Non-Stream Mode Request
Notes
- API Key: Ensure you use the correct API key for authentication.
- Stream Mode: Stream mode is suitable for scenarios where responses need to be received incrementally, while non-stream mode is suitable for scenarios where a complete response is needed at once.
Common questions
-
How to obtain the API key?
Please visit SiliconFlow to register and obtain the API key.
-
How to handle long text?
You can adjust the max_tokens parameter to control the length of the output, but please note that the maximum length is 16K.