Reasoning models are AI systems based on deep learning that solve complex tasks through logical deduction, knowledge association, and context analysis. Typical applications include mathematical problem solving, code generation, logical judgment, and multi-step reasoning scenarios. These types of models typically have the following characteristics:
Structured thinking: Using techniques like Chain-of-Thought to break down complex problems
Knowledge integration: Combining domain knowledge bases with common sense reasoning capabilities
Self-correction mechanism: Enhancing result reliability through feedback loops
Multimodal processing: Some advanced models support mixed input of text, code, and formulas
Maximum Chain-of-Thought Length (thinking_budget): The number of tokens the model uses for internal reasoning. Adjusting the thinking_budget controls the length of the chain-of-thought process.
Maximum Response Length (max_tokens): This is used to limit the number of tokens in the final output to the user. Users can configure this normally to control the maximum length of the response.
Maximum Context Length (context_length): This is the maximum total content length. It is not a request parameter and does not need to be set by the user.
After decoupling the reasoning model’s chain-of-thought process from the response length, the output behavior will follow the following rules:
If the number of tokens generated during the “thinking phase” reaches the thinking_budget, the Qwen3 series reasoning model, which natively supports this parameter, will forcibly stop the chain-of-thought reasoning. Other reasoning models might continue to output the thinking content.
If the maximum response length exceeds the max_tokens limit or the context length exceeds the context_length restriction, the response content will be truncated. The finish_reason field in the response will be marked as length, indicating that the output was terminated due to length constraints.
Set the temperature within the range of 0.5-0.7 (0.6 is recommended) to prevent endless repetitions or incoherent outputs.
Set the value of top_p to 0.95.
Avoid adding a system prompt; all instructions should be contained within the user prompt
For mathematical problems, it is advisable to include a directive in your prompt such as: “Please reason step by step, and put your final answer within \boxed.”
When evaluating model performance, it is recommended to conduct multiple tests and average the results.
API Key: Ensure you use the correct API key for authentication.
Stream Mode: Stream mode is suitable for scenarios where responses need to be received incrementally, while non-stream mode is suitable for scenarios where a complete response is needed at once.