创建文本对话请求
Creates a model response for the given chat conversation.
Authorizations
Bearer authentication header of the form Bearer <token>
, where <token>
is your auth token.
Body
The name of the model to query.
deepseek-ai/DeepSeek-V2-Chat
, deepseek-ai/DeepSeek-Coder-V2-Instruct
, deepseek-ai/DeepSeek-V2.5
, Qwen/Qwen2.5-72B-Instruct-128K
, Qwen/Qwen2.5-72B-Instruct
, Qwen/Qwen2.5-32B-Instruct
, Qwen/Qwen2.5-14B-Instruct
, Qwen/Qwen2.5-7B-Instruct
, Qwen/Qwen2.5-Math-72B-Instruct
, Qwen/Qwen2.5-Coder-7B-Instruct
, Qwen/Qwen2-72B-Instruct
, Qwen/Qwen2-7B-Instruct
, Qwen/Qwen2-1.5B-Instruct
, Qwen/Qwen2-57B-A14B-Instruct
, TeleAI/TeleChat2
, 01-ai/Yi-1.5-34B-Chat-16K
, 01-ai/Yi-1.5-9B-Chat-16K
, 01-ai/Yi-1.5-6B-Chat
, THUDM/chatglm3-6b
, THUDM/glm-4-9b-chat
, Vendor-A/Qwen/Qwen2-72B-Instruct
, Vendor-A/Qwen/Qwen2.5-72B-Instruct
, internlm/internlm2_5-7b-chat
, internlm/internlm2_5-20b-chat
, meta-llama/Meta-Llama-3.1-405B-Instruct
, meta-llama/Meta-Llama-3.1-70B-Instruct
, meta-llama/Meta-Llama-3.1-8B-Instruct
, meta-llama/Meta-Llama-3-8B-Instruct
, meta-llama/Meta-Llama-3-70B-Instruct
, google/gemma-2-27b-it
, google/gemma-2-9b-it
, Pro/Qwen/Qwen2.5-7B-Instruct
, Pro/Qwen/Qwen2-7B-Instruct
, Pro/Qwen/Qwen2-1.5B-Instruct
, Pro/01-ai/Yi-1.5-9B-Chat-16K
, Pro/01-ai/Yi-1.5-6B-Chat
, Pro/THUDM/chatglm3-6b
, Pro/THUDM/glm-4-9b-chat
, Pro/internlm/internlm2_5-7b-chat
, Pro/meta-llama/Meta-Llama-3-8B-Instruct
, Pro/meta-llama/Meta-Llama-3.1-8B-Instruct
, Pro/google/gemma-2-9b-it
A list of messages comprising the conversation so far.
The role of the messages author. Choice between: system, user, or assistant.
user
, assistant
, system
The contents of the message.
If set, tokens are returned as Server-Sent Events as they are made available. Stream terminates with data: [DONE]
The maximum number of tokens to generate.
A list of string sequences that will truncate (stop) inference text output.
Determines the degree of randomness in the response.
The top_p
(nucleus) parameter is used to dynamically adjust the number of choices for each predicted token based on the cumulative probabilities.
Number of generations to return
An object specifying the format that the model must output.
The type of the response format.
Response
stop
, eos
, length
, tool_calls
chat.completion