POST
/
messages
Chat Completions
curl --request POST \
  --url https://api.siliconflow.cn/v1/messages \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
  "model": "Pro/moonshotai/Kimi-K2-Instruct",
  "messages": [
    {
      "role": "user",
      "content": "What opportunities and challenges will the Chinese large model industry face in 2025?"
    }
  ],
  "max_tokens": 8192
}'
{
  "id": "<string>",
  "type": "message",
  "role": "assistant",
  "content": [
    {
      "id": "<string>",
      "input": {},
      "name": "<string>",
      "type": "tool_use"
    }
  ],
  "model": "<string>",
  "stop_reason": "end_turn",
  "stop_sequence": "<string>",
  "usage": {
    "input_tokens": 2095,
    "output_tokens": 503
  }
}

Authorizations

Authorization
string
header
required

Use the following format for authentication: Bearer <your api key>

Body

application/json
model
enum<string>
required

Corresponding Model Name. To better enhance service quality, we will make periodic changes to the models provided by this service, including but not limited to model on/offlining and adjustments to model service capabilities. We will notify you of such changes through appropriate means such as announcements or message pushes where feasible.

Available options:
deepseek-ai/DeepSeek-V3.1,
Pro/moonshotai/Kimi-K2-Instruct,
moonshotai/Kimi-K2-Instruct,
Pro/deepseek-ai/DeepSeek-V3,
deepseek-ai/DeepSeek-V3,
moonshotai/Kimi-Dev-72B,
baidu/ERNIE-4.5-300B-A47B
Example:

"Pro/moonshotai/Kimi-K2-Instruct"

messages
object[]
required

A list of messages comprising the conversation so far.

Required array length: 1 - 10 elements
max_tokens
integer
required

The maximum number of tokens to generate before stopping.

Note that our models may stop before reaching this maximum. This parameter only specifies the absolute maximum number of tokens to generate.

Different models have different maximum values for this parameter. See models for details.

Example:

8192

system

System prompt.

A system prompt is a way of providing context and instructions to llm, such as specifying a particular goal or role.

stop_sequences
string[]

Custom text sequences that will cause the model to stop generating.

Our models will normally stop when they have naturally completed their turn, which will result in a response stop_reason of "end_turn".

If you want the model to stop generating when it encounters custom strings of text, you can use the stop_sequences parameter. If the model encounters one of the custom sequences, the response stop_reason value will be "stop_sequence" and the response stop_sequence value will contain the matched stop sequence.

stream
boolean

If set, tokens are returned as Server-Sent Events as they are made available. Stream terminates with data: [DONE]

Example:

true

temperature
number

Determines the degree of randomness in the response.

Required range: 0 <= x <= 2
Example:

0.7

top_p
number

The top_p (nucleus) parameter is used to dynamically adjust the number of choices for each predicted token based on the cumulative probabilities.

Required range: 0.1 <= x <= 1
Example:

0.7

top_k
number
Required range: 0 <= x <= 50
Example:

50

tools
object[]

Each tool definition includes:

  • name: Name of the tool.

  • description: Optional, but strongly-recommended description of the tool.

  • input_schema: JSON schema for the tool input shape that the model will produce in tool_use output content blocks.

tool_choice
object

How the model should use the provided tools. The model can use a specific tool, any available tool, decide by itself, or not use tools at all. The model will automatically decide whether to use tools.

Response

200

id
string
type
enum<string>
default:message

Object type.

For Messages, this is always "message".

Available options:
message
role
enum<string>
default:assistant

Conversational role of the generated message.

This will always be "assistant".

Available options:
assistant
content
Tool use · object[]

Content generated by the model.

This is an array of content blocks, each of which has a type that determines its shape.

Example:

[{"type": "text", "text": "Hi"}]

If the request input messages ended with an assistant turn, then the response content will continue directly from that last turn. You can use this to constrain the model's output.

For example, if the input messages were:

[
{"role": "user", "content": "What's the Greek name for Sun? (A) Sol (B) Helios (C) Sun"},
{"role": "assistant", "content": "The best answer is ("}
]

Then the response content might be:

[{"type": "text", "text": "B)"}]
model
string

The model that handled the request.

stop_reason
enum<string>

The reason that we stopped.

This may be one the following values:

  • "end_turn": the model reached a natural stopping point or one of your provided custom stop_sequences was generated
  • "max_tokens": we exceeded the requested max_tokens or the model's maximum
  • "tool_use": the model invoked one or more tools
  • "refusal": when streaming classifiers intervene to handle potential policy violations

In non-streaming mode this value is always non-null. In streaming mode, it is null in the message_start event and non-null otherwise.

Available options:
end_turn,
max_tokens,
tool_use,
refusal
stop_sequence
string

Which custom stop sequence was generated, if any.

This value will be a non-null string if one of your custom stop sequences was generated.

usage
object

Billing and rate-limit usage.