使用 DeepSeek 的 thinking 参数

DeepSeek 推理请求可以使用 thinking.type 控制当前轮是否思考，并可使用 reasoning_effort 表达推理强度要求。工具调用场景还需要保留模型返回的推理字段。

设置当前请求的 thinking.type

当前请求需要思考时，设置 thinking.type: "enabled"。简单请求或延迟敏感请求可以测试关闭思考后的效果。以下 Python 示例使用 OpenAI SDK。

language-python

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["API_KEY"],
    base_url="https://cloud.infini-ai.com/maas/v1",
)

response = client.chat.completions.create(
    model="deepseek-v4-pro",
    messages=[{"role": "user", "content": "What is 2 + 2? Give only the final answer."}],
    max_tokens=256,
    extra_body={"thinking": {"type": "enabled"}},
)

关闭思考时使用：

language-python

extra_body = {"thinking": {"type": "disabled"}}

如果目标模型是专用推理模型，关闭参数可能不是常规支持路径。上线前用目标模型验证一次。

在 GenStudio 中，deepseek-v4-flash 也可使用统一兼容参数 enable_thinking。deepseek-v4-pro 使用 thinking.type。

用 curl 验证 thinking.type 请求体

先用 curl 确认请求体中已经传入 DeepSeek 的 thinking.type。运行前先在当前终端设置 API_KEY 环境变量。以下 curl 命令适用于 bash/zsh 等 POSIX 风格 Shell（macOS/Linux、WSL、Git Bash）。如果使用 Windows PowerShell 或 CMD，请按对应 Shell 的语法调整命令。

language-shell

curl --request POST \
  --url "https://cloud.infini-ai.com/maas/v1/chat/completions" \
  --header "Accept: application/json, text/event-stream" \
  --header "Authorization: Bearer $API_KEY" \
  --header "Content-Type: application/json" \
  --data-raw '{
    "model": "deepseek-v4-pro",
    "messages": [
      {
        "role": "user",
        "content": "What is 2 + 2? Give only the final answer."
      }
    ],
    "max_tokens": 256,
    "thinking": {
      "type": "enabled"
    }
  }'

设置 reasoning_effort 前衡量效果

需要让模型投入更多推理时，可以在支持的模型上设置 reasoning_effort。该参数是否改变质量、延迟和 token 用量，需要用业务 prompt 验证。

language-python

extra_body = {
    "thinking": {"type": "enabled"},
    "reasoning_effort": "high",
}

不要只因为请求被接受就假设推理强度一定发生了业务可见变化。记录延迟、输出 token 和任务质量后再决定默认值。

读取 DeepSeek 的 reasoning_content

DeepSeek 推理内容通常位于 reasoning_content。读取时兼容字段缺失。

language-python

message = response.choices[0].message
reasoning = getattr(message, "reasoning_content", None)

if reasoning:
    print(reasoning)
print(message.content)

流式响应中读取 delta.reasoning_content，并跳过空 choices。

普通多轮只回传用户可见回答

普通多轮对话只需要延续用户可见回答时，可以回传 assistant 的 content。不要为了让历史看起来完整而人工写入 reasoning_content。

如果业务需要审计或展示推理内容，可以在应用侧保存它；是否把它继续传给模型，应按工具调用和目标模型要求决定。

工具调用后保留 reasoning_content

当 assistant 消息发起工具调用且返回了 reasoning_content，继续请求时保留该字段。

language-python

assistant_message = {
    "role": "assistant",
    "content": response.choices[0].message.content or "",
    "tool_calls": [
        {
            "id": tool_call.id,
            "type": "function",
            "function": {
                "name": tool_call.function.name,
                "arguments": tool_call.function.arguments,
            },
        }
        for tool_call in response.choices[0].message.tool_calls
    ],
}

reasoning = getattr(response.choices[0].message, "reasoning_content", None)
if reasoning:
    assistant_message["reasoning_content"] = reasoning

messages.append(assistant_message)

如果 SDK 或框架序列化时丢掉扩展字段，工具调用后的下一次请求可能无法保持原有上下文。

分开排查采样参数和推理参数

推理请求没有按预期变化时，先把采样调参和推理调参分开验证。

先只保留 thinking 和 reasoning_effort，确认请求被接受。
再逐一加入 temperature、top_p 等采样参数。
如果模型文档说明某些采样参数会被忽略或不兼容，以模型文档和实际请求结果为准。

不要在同一个排查请求里同时修改 prompt、采样参数和推理参数，否则很难定位是哪一项导致行为变化。

租户管理

凭证管理

费用管理

开始使用

模型与能力

开发工具与集成

计费、限制与用量

排查与支持

更新与参考

Coding Plan

简介

工作流管理

API 文档

监控与计费

简介

版本与发布

计费

Reasoning

使用 DeepSeek 的 thinking 参数

设置当前请求的 thinking.type

用 curl 验证 thinking.type 请求体

设置 reasoning_effort 前衡量效果

读取 DeepSeek 的 reasoning_content

普通多轮只回传用户可见回答

工具调用后保留 reasoning_content

分开排查采样参数和推理参数

Reasoning

使用 DeepSeek 的 thinking 参数 ​

设置当前请求的 thinking.type ​

用 curl 验证 thinking.type 请求体 ​

设置 reasoning_effort 前衡量效果 ​

读取 DeepSeek 的 reasoning_content ​

普通多轮只回传用户可见回答 ​

工具调用后保留 reasoning_content ​

分开排查采样参数和推理参数 ​

使用 DeepSeek 的 thinking 参数

设置当前请求的 thinking.type

用 curl 验证 thinking.type 请求体

设置 reasoning_effort 前衡量效果

读取 DeepSeek 的 reasoning_content

普通多轮只回传用户可见回答

工具调用后保留 reasoning_content

分开排查采样参数和推理参数