使用 GLM 的 thinking 参数

GLM 系列使用 thinking 对象控制当前请求是否思考。需要在工具调用或长任务中保留历史推理时，再使用 clear_thinking 决定是否清除历史推理上下文。

检查 GLM 默认思考行为

用本节决定是否需要显式传 thinking.type。默认开启的模型不一定需要 thinking.type: "enabled"，但显式传参可以让请求行为更清楚。

glm-5.2、glm-5.1、glm-5、glm-4.7 默认开启思考。简单请求可显式关闭；复杂请求可保持默认或显式开启。
glm-4.6 是混合或自动思考行为。用目标请求重新验证，并按目标模型记录结论。

如果业务对延迟敏感，先用目标模型测试关闭思考后的回答质量和响应时间。

设置当前请求的 thinking.type

当前请求只需要控制这一次生成时，设置 thinking.type。以下 Python 示例使用 OpenAI SDK。

language-python

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["API_KEY"],
    base_url="https://cloud.infini-ai.com/maas/v1",
)

response = client.chat.completions.create(
    model="glm-5.1",
    messages=[{"role": "user", "content": "What is 2 + 2? Give only the final answer."}],
    max_tokens=256,
    extra_body={"thinking": {"type": "enabled"}},
)

关闭思考时只改 type：

language-python

extra_body = {"thinking": {"type": "disabled"}}

设置 GLM-5.2 的 reasoning_effort

reasoning_effort 只适用于 GLM-5.2。不要把这个字段直接复制到 glm-5.1、glm-5 或 glm-4.7 的请求中。

GLM-5.2 的默认推理强度是 max。如果业务希望显式控制推理强度，建议先只使用有实际区分度的取值：

high：较高推理强度。
max：最高推理强度，也是默认值。

其他兼容取值会被模型侧归并或改变思考开关语义：

none、minimal 会跳过思考；需要关闭思考时，优先使用 thinking.type: "disabled"。
low、medium 会映射为 high。
xhigh 会映射为 max。

language-python

extra_body = {
    "thinking": {"type": "enabled"},
    "reasoning_effort": "max",
}

用 curl 验证 thinking.type 请求体

先用 curl 确认请求体包含 GLM 原生 thinking.type。运行前先在当前终端设置 API_KEY 环境变量。以下 curl 命令适用于 bash/zsh 等 POSIX 风格 Shell（macOS/Linux、WSL、Git Bash）。如果使用 Windows PowerShell 或 CMD，请按对应 Shell 的语法调整命令。

language-shell

curl --request POST \
  --url "https://cloud.infini-ai.com/maas/v1/chat/completions" \
  --header "Accept: application/json, text/event-stream" \
  --header "Authorization: Bearer $API_KEY" \
  --header "Content-Type: application/json" \
  --data-raw '{
    "model": "glm-5.1",
    "messages": [
      {
        "role": "user",
        "content": "What is 2 + 2? Give only the final answer."
      }
    ],
    "max_tokens": 256,
    "thinking": {
      "type": "enabled"
    }
  }'

终端录制

用 curl 验证 GLM thinking.type

在演示库中打开

适用于验证 GLM 原生推理字段是否按预期传入。

保留工具调用之间的历史推理

如果希望模型在工具调用前后的同一个任务中延续推理上下文，请设置 clear_thinking: false。这个参数控制历史推理是否保留，不是当前请求的开关。

language-python

extra_body = {
    "thinking": {
        "type": "enabled",
        "clear_thinking": False,
    }
}

只有当请求历史中确实包含模型返回过的 reasoning_content 时，保留历史推理才有意义。不要为没有返回推理内容的 assistant 消息补写字段。

回传带有 reasoning_content 的 assistant 消息

当 assistant 消息包含 tool_calls 和 reasoning_content 时，把这个 assistant 消息原样放回 messages[]，然后追加工具结果。

language-python

assistant_message = {
    "role": "assistant",
    "content": response.choices[0].message.content or "",
    "tool_calls": [
        {
            "id": tool_call.id,
            "type": "function",
            "function": {
                "name": tool_call.function.name,
                "arguments": tool_call.function.arguments,
            },
        }
        for tool_call in response.choices[0].message.tool_calls
    ],
}

reasoning = getattr(response.choices[0].message, "reasoning_content", None)
if reasoning:
    assistant_message["reasoning_content"] = reasoning

messages.append(assistant_message)
messages.append(
    {
        "role": "tool",
        "tool_call_id": assistant_message["tool_calls"][0]["id"],
        "content": '{"weather": "Sunny", "temperature": "25 C"}',
    }
)

回传时不要改写、摘要或重新排序 reasoning_content。这类修改可能导致请求被拒绝，或让后续推理失去连续性。

切换思考开关时保留真实历史

GLM 支持按轮控制思考，但历史消息必须反映真实返回结果。

如果 Turn 0 的 assistant 有 reasoning_content，Turn 1 关闭思考且没有 reasoning_content，Turn 2 再开启思考，则 Turn 2 请求应：

保留 Turn 0 assistant 中原始的 reasoning_content。
不给 Turn 1 assistant 补写 reasoning_content。
在需要 preserved thinking 时设置 thinking.clear_thinking: false。

如果这次请求要有意丢弃历史推理，可以不使用 preserved thinking；但这会牺牲历史推理连续性，并可能影响缓存命中。

读取流式响应中的 reasoning_content

流式响应中，GLM 的推理增量在 delta.reasoning_content。OpenAI SDK 类型不一定声明该字段，读取时使用动态访问。

language-python

for chunk in stream:
    if not chunk.choices:
        continue

    delta = chunk.choices[0].delta
    reasoning = getattr(delta, "reasoning_content", None)
    if reasoning:
        print(reasoning, end="")

    text = getattr(delta, "content", None)
    if text:
        print(text, end="")

有些 chunk 只携带 usage 或结束信息，解析时应跳过空 choices。

修复历史推理导致的 400 错误

出现 400 时，先检查历史消息，而不是先调整 prompt。

assistant 消息里有 tool_calls 时，对应的 tool 消息必须紧跟正确的 tool_call_id。
如果历史 assistant 原本返回了 reasoning_content，回传时不要删除或改写。
如果某轮关闭思考且没有返回 reasoning_content，不要补写空的或人工生成的推理。
如果不需要历史推理连续性，清理历史消息后重新发起新会话，比混合不完整历史更安全。

租户管理

凭证管理

费用管理

开始使用

模型与能力

开发工具与集成

计费、限制与用量

排查与支持

更新与参考

Coding Plan

简介

工作流管理

API 文档

监控与计费

简介

版本与发布

计费

Reasoning

使用 GLM 的 thinking 参数

检查 GLM 默认思考行为

设置当前请求的 thinking.type

设置 GLM-5.2 的 reasoning_effort

用 curl 验证 thinking.type 请求体

用 curl 验证 GLM thinking.type

保留工具调用之间的历史推理

回传带有 reasoning_content 的 assistant 消息

切换思考开关时保留真实历史

读取流式响应中的 reasoning_content

修复历史推理导致的 400 错误

Reasoning

使用 GLM 的 thinking 参数 ​

检查 GLM 默认思考行为 ​

设置当前请求的 thinking.type ​

设置 GLM-5.2 的 reasoning_effort ​

用 curl 验证 thinking.type 请求体 ​

保留工具调用之间的历史推理 ​

回传带有 reasoning_content 的 assistant 消息 ​

切换思考开关时保留真实历史 ​

读取流式响应中的 reasoning_content ​

修复历史推理导致的 400 错误 ​

使用 GLM 的 thinking 参数

检查 GLM 默认思考行为

设置当前请求的 thinking.type

设置 GLM-5.2 的 reasoning_effort

用 curl 验证 thinking.type 请求体

保留工具调用之间的历史推理

回传带有 reasoning_content 的 assistant 消息

切换思考开关时保留真实历史

读取流式响应中的 reasoning_content

修复历史推理导致的 400 错误