POST
/
chat
/
completions

Authorizations

Authorization
string
headerrequired

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

application/json
model
enum<string>
default: deepseek-ai/DeepSeek-V2.5required

The name of the model to query.

Available options:
deepseek-ai/DeepSeek-V2-Chat,
deepseek-ai/DeepSeek-Coder-V2-Instruct,
deepseek-ai/DeepSeek-V2.5,
Qwen/Qwen2.5-72B-Instruct-128K,
Qwen/Qwen2.5-72B-Instruct,
Qwen/Qwen2.5-32B-Instruct,
Qwen/Qwen2.5-14B-Instruct,
Qwen/Qwen2.5-7B-Instruct,
Qwen/Qwen2.5-Math-72B-Instruct,
Qwen/Qwen2.5-Coder-7B-Instruct,
Qwen/Qwen2-72B-Instruct,
Qwen/Qwen2-7B-Instruct,
Qwen/Qwen2-1.5B-Instruct,
Qwen/Qwen2-57B-A14B-Instruct,
TeleAI/TeleChat2,
01-ai/Yi-1.5-34B-Chat-16K,
01-ai/Yi-1.5-9B-Chat-16K,
01-ai/Yi-1.5-6B-Chat,
THUDM/chatglm3-6b,
THUDM/glm-4-9b-chat,
Vendor-A/Qwen/Qwen2-72B-Instruct,
Vendor-A/Qwen/Qwen2.5-72B-Instruct,
internlm/internlm2_5-7b-chat,
internlm/internlm2_5-20b-chat,
meta-llama/Meta-Llama-3.1-405B-Instruct,
meta-llama/Meta-Llama-3.1-70B-Instruct,
meta-llama/Meta-Llama-3.1-8B-Instruct,
meta-llama/Meta-Llama-3-8B-Instruct,
meta-llama/Meta-Llama-3-70B-Instruct,
google/gemma-2-27b-it,
google/gemma-2-9b-it,
Pro/Qwen/Qwen2.5-7B-Instruct,
Pro/Qwen/Qwen2-7B-Instruct,
Pro/Qwen/Qwen2-1.5B-Instruct,
Pro/01-ai/Yi-1.5-9B-Chat-16K,
Pro/01-ai/Yi-1.5-6B-Chat,
Pro/THUDM/chatglm3-6b,
Pro/THUDM/glm-4-9b-chat,
Pro/internlm/internlm2_5-7b-chat,
Pro/meta-llama/Meta-Llama-3-8B-Instruct,
Pro/meta-llama/Meta-Llama-3.1-8B-Instruct,
Pro/google/gemma-2-9b-it
messages
object[]
required

A list of messages comprising the conversation so far.

messages.role
enum<string>
default: userrequired

The role of the messages author. Choice between: system, user, or assistant.

Available options:
user,
assistant,
system
messages.content
string
default: SiliconCloud推出分层速率方案与免费模型RPM提升10倍,对于整个大模型应用领域带来哪些改变?required

The contents of the message.

stream
boolean
default: false

If set, tokens are returned as Server-Sent Events as they are made available. Stream terminates with data: [DONE]

max_tokens
integer
default: 512

The maximum number of tokens to generate.

stop
string[]

A list of string sequences that will truncate (stop) inference text output.

temperature
number
default: 0.7

Determines the degree of randomness in the response.

top_p
number
default: 0.7

The top_p (nucleus) parameter is used to dynamically adjust the number of choices for each predicted token based on the cumulative probabilities.

top_k
number
default: 50
frequency_penalty
number
default: 0.5
n
integer
default: 1

Number of generations to return

response_format
object

An object specifying the format that the model must output.

response_format.type
string

The type of the response format.

Response

200 - application/json
id
string
choices
object[]
choices.message
object
choices.message.role
string
choices.message.content
string
choices.finish_reason
enum<string>
Available options:
stop,
eos,
length,
tool_calls
usage
object
usage.prompt_tokens
integer
usage.completion_tokens
integer
usage.total_tokens
integer
created
integer
model
string
object
enum<string>
Available options:
chat.completion