使用 Kimi 的 thinking 参数

Kimi 系列在 OpenAI-compatible Chat Completions 路径上使用 thinking.type 控制当前请求是否思考，并使用 thinking.keep 控制是否保留历史推理内容。两个参数分别对应当前轮控制和历史保留。Anthropic-compatible 路径属于不同协议形态，不要直接套用本页的 reasoning_content 回传规则。

区分强制思考和可切换模型

先判断目标模型能否关闭思考。强制思考模型不应把关闭思考作为常规业务路径。

kimi-k2.6、kimi-k2.5 默认启用思考，可显式关闭。简单任务可关闭，复杂任务保持开启。

如果模型 ID 带有明确的 thinking 后缀，先按强制思考模型处理。

用 thinking.type 控制当前请求

当前请求只需要关闭思考时，设置 thinking.type: "disabled"。开启思考时使用 "enabled"。以下 Python 示例使用 OpenAI SDK。

language-python

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["API_KEY"],
    base_url="https://cloud.infini-ai.com/maas/v1",
)

response = client.chat.completions.create(
    model="kimi-k2.6",
    messages=[{"role": "user", "content": "What is 2 + 2? Give only the final answer."}],
    max_tokens=256,
    extra_body={"thinking": {"type": "disabled"}},
)

如果关闭后仍看到推理字段，先确认目标模型是否属于强制思考模型。

用 curl 验证 thinking.type 请求体

先用 curl 确认可切换 Kimi 模型是否接受 thinking.type: "disabled"。运行前先在当前终端设置 API_KEY 环境变量。以下 curl 命令适用于 bash/zsh 等 POSIX 风格 Shell（macOS/Linux、WSL、Git Bash）。如果使用 Windows PowerShell 或 CMD，请按对应 Shell 的语法调整命令。

language-shell

curl --request POST \
  --url "https://cloud.infini-ai.com/maas/v1/chat/completions" \
  --header "Accept: application/json, text/event-stream" \
  --header "Authorization: Bearer $API_KEY" \
  --header "Content-Type: application/json" \
  --data-raw '{
    "model": "kimi-k2.6",
    "messages": [
      {
        "role": "user",
        "content": "What is 2 + 2? Give only the final answer."
      }
    ],
    "max_tokens": 256,
    "thinking": {
      "type": "disabled"
    }
  }'

终端录制

用 curl 验证 Kimi thinking.type

在演示库中打开

适用于验证 Kimi 推理开关字段是否按预期传入。

用 thinking.keep 保留历史推理

thinking.keep 用于历史保留，不用于开启当前请求思考。需要在工具调用或 agent 流程中保留推理连续性时，和当前请求的 thinking.type 一起设置。

language-python

extra_body = {
    "thinking": {
        "type": "enabled",
        "keep": "all",
    }
}

只关闭当前请求思考时，设置 thinking.type 即可。thinking.keep 留给需要历史推理连续性的工具调用或 agent 流程。

读取 Kimi 的 reasoning_content

Kimi 推理内容通过 reasoning_content 返回。OpenAI SDK 的类型定义可能没有这个字段，应用代码应使用 getattr 或 SDK 提供的扩展字段访问方式。

language-python

message = response.choices[0].message
reasoning = getattr(message, "reasoning_content", None)

if reasoning:
    print(reasoning)
print(message.content)

流式响应中同样读取 delta.reasoning_content。不要假设每个 chunk 都有该字段。

回传发起工具调用的 assistant 消息

当 Kimi 在思考后发起工具调用时，assistant 消息通常同时包含 reasoning_content 和 tool_calls。继续对话时，回传这条 assistant 消息，再追加工具结果。

language-python

assistant_message = {
    "role": "assistant",
    "content": response.choices[0].message.content or "",
    "tool_calls": [
        {
            "id": tool_call.id,
            "type": "function",
            "function": {
                "name": tool_call.function.name,
                "arguments": tool_call.function.arguments,
            },
        }
        for tool_call in response.choices[0].message.tool_calls
    ],
}

reasoning = getattr(response.choices[0].message, "reasoning_content", None)
if reasoning:
    assistant_message["reasoning_content"] = reasoning

messages.append(assistant_message)
messages.append(
    {
        "role": "tool",
        "tool_call_id": assistant_message["tool_calls"][0]["id"],
        "content": '{"weather": "Sunny"}',
    }
)

如果历史中有一轮关闭了思考，该轮 assistant 没有 reasoning_content 是正常状态。保留真实历史，不要补写，也不要用空字符串或摘要替代模型原始返回。

分开检查 thinking.type 和 thinking.keep

排查 Kimi 请求时，把当前轮和历史保留分开检查。

当前轮是否思考：只看 thinking.type。
历史推理是否保留：只看 thinking.keep 和回传的 assistant 消息。
工具调用是否能继续：检查 tool_calls、tool_call_id 和 reasoning_content 是否来自同一条原始 assistant 响应。

如果请求返回 400，先移除人工拼接的推理内容，再用模型原始返回的 assistant 消息重试。

租户管理

凭证管理

费用管理

开始使用

模型与能力

开发工具与集成

计费、限制与用量

排查与支持

更新与参考

Coding Plan

简介

工作流管理

API 文档

监控与计费

简介

版本与发布

计费

Reasoning

使用 Kimi 的 thinking 参数

区分强制思考和可切换模型

用 thinking.type 控制当前请求

用 curl 验证 thinking.type 请求体

用 curl 验证 Kimi thinking.type

用 thinking.keep 保留历史推理

读取 Kimi 的 reasoning_content

回传发起工具调用的 assistant 消息

分开检查 thinking.type 和 thinking.keep

Reasoning

使用 Kimi 的 thinking 参数 ​

区分强制思考和可切换模型 ​

用 thinking.type 控制当前请求 ​

用 curl 验证 thinking.type 请求体 ​

用 thinking.keep 保留历史推理 ​

读取 Kimi 的 reasoning_content ​

回传发起工具调用的 assistant 消息 ​

分开检查 thinking.type 和 thinking.keep ​

使用 Kimi 的 thinking 参数

区分强制思考和可切换模型

用 thinking.type 控制当前请求

用 curl 验证 thinking.type 请求体

用 thinking.keep 保留历史推理

读取 Kimi 的 reasoning_content

回传发起工具调用的 assistant 消息

分开检查 thinking.type 和 thinking.keep