1. Using Stream Mode in Python
1.1 Stream Mode with the OpenAI Library
It is recommended to use the OpenAI library for stream mode in most scenarios.
from openai import OpenAI
client = OpenAI(
base_url='https://api.siliconflow.cn/v1',
api_key='your-api-key'
)
# 发送带有流式输出的请求
response = client.chat.completions.create(
model="deepseek-ai/DeepSeek-V2.5",
messages=[
{"role": "user", "content": "SiliconCloud公测上线,每用户送3亿token 解锁开源大模型创新能力。对于整个大模型应用领域带来哪些改变?"}
],
stream=True # 启用流式输出
)
# 逐步接收并处理响应
for chunk in response:
if not chunk.choices:
continue
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
if chunk.choices[0].delta.reasoning_content:
print(chunk.choices[0].delta.reasoning_content, end="", flush=True)
1.2 Stream Mode with the Requests Library
If you are using the requests library for non-OpenAI scenarios, such as using the SiliconCloud API, you need to ensure that both the payload and the request parameters are set to stream mode.
from openai import OpenAI
import requests
import json
url = "https://api.siliconflow.cn/v1/chat/completions"
payload = {
"model": "deepseek-ai/DeepSeek-V2.5", # 替换成你的模型
"messages": [
{
"role": "user",
"content": "SiliconCloud公测上线,每用户送3亿token 解锁开源大模型创新能力。对于整个大模型应用领域带来哪些改变?"
}
],
"stream": True # 此处需要设置为stream模式
}
headers = {
"accept": "application/json",
"content-type": "application/json",
"authorization": "Bearer your-api-key"
}
response = requests.post(url, json=payload, headers=headers, stream=True) # 此处request需要指定stream模式
# 打印流式返回信息
if response.status_code == 200:
full_content = ""
full_reasoning_content = ""
for chunk in response.iter_lines():
if chunk:
chunk_str = chunk.decode('utf-8').replace('data: ', '')
if chunk_str != "[DONE]":
chunk_data = json.loads(chunk_str)
delta = chunk_data['choices'][0].get('delta', {})
content = delta.get('content', '')
reasoning_content = delta.get('reasoning_content', '')
if content:
print(content, end="", flush=True)
full_content += content
if reasoning_content:
print(reasoning_content, end="", flush=True)
full_reasoning_content += reasoning_content
else:
print(f"请求失败,状态码:{response.status_code}")
2. Using Stream Mode in curl
By default, the processing mechanism of the curl command buffers the output stream, so even if the server sends data in chunks, you will only see the content after the buffer is filled or the connection is closed. Passing the -N
(or --no-buffer
) option disables this buffering, allowing chunks of data to be printed to the terminal immediately, thus achieving streaming output.
curl -N -s \
--request POST \
--url https://api.siliconflow.cn/v1/chat/completions \
--header 'Authorization: Bearer token' \
--header 'Content-Type: application/json' \
--data '{
"model": "Qwen/Qwen2.5-72B-Instruct",
"messages": [
{"role":"user","content":"有诺贝尔数学奖吗?"}
],
"stream": true
}'