Authorizations
Use the following format for authentication: Bearer <your api key>
Body
Corresponding Model Name. To better enhance service quality, we will make periodic changes to the models provided by this service, including but not limited to model on/offlining and adjustments to model service capabilities. We will notify you of such changes through appropriate means such as announcements or message pushes where feasible.
deepseek-ai/DeepSeek-V3.1
, Pro/moonshotai/Kimi-K2-Instruct
, moonshotai/Kimi-K2-Instruct
, Pro/deepseek-ai/DeepSeek-V3
, deepseek-ai/DeepSeek-V3
, moonshotai/Kimi-Dev-72B
, baidu/ERNIE-4.5-300B-A47B
"Pro/moonshotai/Kimi-K2-Instruct"
A list of messages comprising the conversation so far.
1 - 10
elementsThe maximum number of tokens to generate before stopping.
Note that our models may stop before reaching this maximum. This parameter only specifies the absolute maximum number of tokens to generate.
Different models have different maximum values for this parameter. See models for details.
8192
System prompt.
A system prompt is a way of providing context and instructions to llm, such as specifying a particular goal or role.
Custom text sequences that will cause the model to stop generating.
Our models will normally stop when they have naturally completed their turn, which will result in a response stop_reason
of "end_turn"
.
If you want the model to stop generating when it encounters custom strings of text, you can use the stop_sequences
parameter. If the model encounters one of the custom sequences, the response stop_reason
value will be "stop_sequence"
and the response stop_sequence
value will contain the matched stop sequence.
If set, tokens are returned as Server-Sent Events as they are made available. Stream terminates with data: [DONE]
true
Determines the degree of randomness in the response.
0 <= x <= 2
0.7
The top_p
(nucleus) parameter is used to dynamically adjust the number of choices for each predicted token based on the cumulative probabilities.
0.1 <= x <= 1
0.7
0 <= x <= 50
50
Each tool definition includes:
-
name
: Name of the tool. -
description
: Optional, but strongly-recommended description of the tool. -
input_schema
: JSON schema for the toolinput
shape that the model will produce intool_use
output content blocks.
How the model should use the provided tools. The model can use a specific tool, any available tool, decide by itself, or not use tools at all. The model will automatically decide whether to use tools.
Response
200
Object type.
For Messages, this is always "message"
.
message
Conversational role of the generated message.
This will always be "assistant"
.
assistant
Content generated by the model.
This is an array of content blocks, each of which has a type
that determines its shape.
Example:
[{"type": "text", "text": "Hi"}]
If the request input messages
ended with an assistant
turn, then the response content
will continue directly from that last turn. You can use this to constrain the model's output.
For example, if the input messages
were:
[
{"role": "user", "content": "What's the Greek name for Sun? (A) Sol (B) Helios (C) Sun"},
{"role": "assistant", "content": "The best answer is ("}
]
Then the response content
might be:
[{"type": "text", "text": "B)"}]
The model that handled the request.
The reason that we stopped.
This may be one the following values:
"end_turn"
: the model reached a natural stopping point or one of your provided customstop_sequences
was generated"max_tokens"
: we exceeded the requestedmax_tokens
or the model's maximum"tool_use"
: the model invoked one or more tools"refusal"
: when streaming classifiers intervene to handle potential policy violations
In non-streaming mode this value is always non-null. In streaming mode, it is null in the message_start
event and non-null otherwise.
end_turn
, max_tokens
, tool_use
, refusal
Which custom stop sequence was generated, if any.
This value will be a non-null string if one of your custom stop sequences was generated.
Billing and rate-limit usage.