1. Using Stream Mode in Python

1.1 Stream Mode with the OpenAI Library

It is recommended to use the OpenAI library for stream mode in most scenarios.

from openai import OpenAI

client = OpenAI(
    base_url='https://api.siliconflow.cn/v1',
    api_key='your-api-key'
)

# 发送带有流式输出的请求
response = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-V2.5",
    messages=[
        {"role": "user", "content": "SiliconCloud公测上线,每用户送3亿token 解锁开源大模型创新能力。对于整个大模型应用领域带来哪些改变?"}
    ],
    stream=True  # 启用流式输出
)

# 逐步接收并处理响应
for chunk in response:
    chunk_message = chunk.choices[0].delta.content
    print(chunk_message, end='', flush=True)

1.2 Stream Mode with the Requests Library

If you are using the requests library for non-OpenAI scenarios, such as using the SiliconCloud API, you need to ensure that both the payload and the request parameters are set to stream mode.

import requests
   
url = "https://api.siliconflow.cn/v1/chat/completions"
   
payload = {
        "model": "deepseek-ai/DeepSeek-V2.5", # 替换成你的模型
        "messages": [
            {
                "role": "user",
                "content": "SiliconCloud公测上线,每用户送3亿token 解锁开源大模型创新能力。对于整个大模型应用领域带来哪些改变?"
            }
        ],
        "stream": True # 此处需要设置为stream模式
}

headers = {
        "accept": "application/json",
        "content-type": "application/json",
        "authorization": "Bearer your-api-key"
    }
   
response = requests.post(url, json=payload, headers=headers, stream=True) # 此处request需要指定stream模式

# 打印流式返回信息
if response.status_code == 200: 
    for chunk in response.iter_content(chunk_size=8192): 
        if chunk:
            decoded_chunk = chunk.decode('utf-8')
            print(decoded_chunk, end='')
else:
    print('Request failed with status code:', response.status_code)