大语言模型 API 服务使用教程
本文将指导您如何通过常用工具调用 GenStudio 预置的大语言模型 API 服务。
TIP
GenStudio 还支持将模型部署到独占实例,提供私有 API 服务。请注意自部署的模型服务 API 域名与平台提供的公共 API 域名不同。详见部署模型服务。
OpenAI API 兼容性
GenStudio 大语言模型 API 服务提供一个实现 OpenAI 的 /v1/chat/completions
的 API 接口。
https://cloud.infini-ai.com/maas/v1/chat/completions
NOTE
关于 API 端点的路径、参数等细节,详见 大语言模型 API 参考文档。
前提条件
前往大模型服务平台的模型广场,选择期望通过 API 方式试用的大模型。
使用 Curl
您可以通过调用示例中的 curl 命令直接发送 API 请求。
TIP
请将 $API_KEY
修改为您获取的 API 密钥。
验证单轮对话(非流式)
以下请求示例发起一个单轮对话。以下示例未指定 stream
参数,因此 API 服务使用默认响应方式(非流式响应)。以下示例从环境变量加载 API 密钥和 Base URL。
API_KEY
:GenStudio API keyDEFAULT_BASE_URL
:GenStudio API URL,请使用https://cloud.infini-ai.com/maas/v1
curl "${DEFAULT_BASE_URL}/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $API_KEY" \
-d '{
"model": "megrez-3b-instruct",
"messages": [
{ "role": "user", "content": "你是谁?" }
]
}'
单轮对话也可以携带 system message,示例如下
"messages": [
{ "role": "system", "content": "请以嘲讽语气回答" },
{ "role": "user", "content": "你是谁?" }
]
API 服务默认使用非流式响应。请求成功时,以 Server-side events(SSE) 方式返回生成的内容。
响应正文示例如下:
{
"id": "chatcmpl-n5McEDBxBdxNDbx2CA8Rz8",
"object": "chat.completion",
"created": 1708497105,
"model": "qwen2.5-7b-instruct",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "我是来自阿里云的大规模语言模型。"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 22,
"total_tokens": 39,
"completion_tokens": 17
}
}
IMPORTANT
- 若文本审核违规,则新增 blocked 字段且值为 true,后面响应不再继续输出
- 若文本审核通过,则无 blocked 字段,后面响应正常输出。
验证单轮对话(流式)
以下示例明确指定 stream
参数为 true
,因此 API 服务将会采用流式响应方式返回类型。以下示例从环境变量加载 API 密钥和 Base URL。
API_KEY
:GenStudio API keyDEFAULT_BASE_URL
:GenStudio API URL,请使用https://cloud.infini-ai.com/maas/v1
curl "${DEFAULT_BASE_URL}/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $API_KEY" \
-d '{
"model": "megrez-3b-instruct",
"stream": true,
"messages": [
{ "role": "user", "content": "你是谁?" }
]
}'
在流式响应模式下,请求成功时,一次性返回生成的内容。
响应正文示例如下:
{"id": "chatcmpl-3HjYf888MzQ6XAHADiPanf", "model": "megrez-3b-instruct", "choices": [{"index": 0, "delta": {"role": "assistant"}, "finish_reason": null}]}
{"id": "chatcmpl-3HjYf888MzQ6XAHADiPanf", "model": "megrez-3b-instruct", "choices": [{"index": 0, "delta": {"content": "我是"}, "finish_reason": null}]}
{"id": "chatcmpl-3HjYf888MzQ6XAHADiPanf", "model": "megrez-3b-instruct", "choices": [{"index": 0, "delta": {"content": "来自"}, "finish_reason": null}]}
{"id": "chatcmpl-3HjYf888MzQ6XAHADiPanf", "model": "megrez-3b-instruct", "choices": [{"index": 0, "delta": {"content": "无问"}, "finish_reason": null}]}
{"id": "chatcmpl-3HjYf888MzQ6XAHADiPanf", "model": "megrez-3b-instruct", "choices": [{"index": 0, "delta": {"content": "芯穹"}, "finish_reason": null}]}
{"id": "chatcmpl-3HjYf888MzQ6XAHADiPanf", "model": "megrez-3b-instruct", "choices": [{"index": 0, "delta": {"content": "的"}, "finish_reason": null}]}
{"id": "chatcmpl-3HjYf888MzQ6XAHADiPanf", "model": "megrez-3b-instruct", "choices": [{"index": 0, "delta": {"content": "超"}, "finish_reason": null}]}
{"id": "chatcmpl-3HjYf888MzQ6XAHADiPanf", "model": "megrez-3b-instruct", "choices": [{"index": 0, "delta": {"content": "大规模"}, "finish_reason": null}]}
{"id": "chatcmpl-3HjYf888MzQ6XAHADiPanf", "model": "megrez-3b-instruct", "choices": [{"index": 0, "delta": {"content": "语言"}, "finish_reason": null}]}
{"id": "chatcmpl-3HjYf888MzQ6XAHADiPanf", "model": "megrez-3b-instruct", "choices": [{"index": 0, "delta": {"content": "模型"}, "finish_reason": null}]}
{"id": "chatcmpl-3HjYf888MzQ6XAHADiPanf", "model": "megrez-3b-instruct", "choices": [{"index": 0, "delta": {"content": ","}, "finish_reason": null}]}
{"id": "chatcmpl-3HjYf888MzQ6XAHADiPanf", "model": "megrez-3b-instruct", "choices": [{"index": 0, "delta": {"content": "我"}, "finish_reason": null}]}
{"id": "chatcmpl-3HjYf888MzQ6XAHADiPanf", "model": "megrez-3b-instruct", "choices": [{"index": 0, "delta": {"content": "叫"}, "finish_reason": null}]}
{"id": "chatcmpl-3HjYf888MzQ6XAHADiPanf", "model": "megrez-3b-instruct", "choices": [{"index": 0, "delta": {"content": "无"}, "finish_reason": null}]}
{"id": "chatcmpl-3HjYf888MzQ6XAHADiPanf", "model": "megrez-3b-instruct", "choices": [{"index": 0, "delta": {"content": "问"}, "finish_reason": null}]}
{"id": "chatcmpl-3HjYf888MzQ6XAHADiPanf", "model": "megrez-3b-instruct", "choices": [{"index": 0, "delta": {"content": "天"}, "finish_reason": null}]}
{"id": "chatcmpl-3HjYf888MzQ6XAHADiPanf", "model": "megrez-3b-instruct", "choices": [{"index": 0, "delta": {"content": "权"}, "finish_reason": null}]}
{"id": "chatcmpl-3HjYf888MzQ6XAHADiPanf", "model": "megrez-3b-instruct", "choices": [{"index": 0, "delta": {"content": "。"}, "finish_reason": null}]}
{"id": "chatcmpl-3HjYf888MzQ6XAHADiPanf", "object": "chat.completion.chunk", "created": 1708486029, "model": "megrez-3b-instruct", "choices": [{"index": 0, "delta": {}, "finish_reason": "stop"}], "usage": {"prompt_tokens": 22, "total_tokens": 40, "completion_tokens": 18}}
验证多轮对话
API 服务可接受多轮对话请求,一对 user message + assistant message 算一轮(也可包含 system message)。
以下示例展示了一个多轮对话请求。以下示例从环境变量加载 API 密钥和 Base URL。
API_KEY
:GenStudio API keyDEFAULT_BASE_URL
:GenStudio API URL,请使用https://cloud.infini-ai.com/maas/v1
curl "${DEFAULT_BASE_URL}/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $API_KEY" \
-d '{
"model": "megrez-3b-instruct",
"messages": [
{ "role": "user", "content": "你是谁?" },
{ "role": "assistant", "content": "我是大模型回答助手" },
{ "role": "user", "content": "你能做什么?" },
]
}'
使用 OpenAI Python SDK
无问芯穹大模型 API 服务支持通过 OpenAI 官方 Python SDK 进行调用。初始化客户端
GenStudio API 服务提供一个实现 OpenAI 的 /v1/chat/completions
的 API 接口。可使用 OpenAI Python 客户端接入。
GENSTUDIO_API_KEY
:GenStudio API Key。DEFAULT_BASE_URL
:使用默认接口时,为https://cloud.infini-ai.com/maas/v1
import os
from openai import OpenAI
API_KEY = os.getenv("GENSTUDIO_API_KEY")
DEFAULT_BASE_URL = os.getenv("DEFAULT_BASE_URL")
client = OpenAI(api_key=API_KEY, DEFAULT_BASE_URL=DEFAULT_BASE_URL)
验证一(流式)
stream = client.chat.completions.create(
model="qwen1.5-72b-chat",
messages=[{"role": "user", "content": "根据中国古代圣人孔子的思想,人生的意义是什么?"}],
stream=True,
)
for chunk in stream:
print(chunk.choices[0].delta.content or "", end="")
孔子认为,人生的意义在于实现“仁”,即以仁爱之心对待他人,追求道德完善,以及实现社会和谐。他强调“修身、齐家、治国、平天下”,认为一个人应该首先修养自身,然后才能管理好家庭,进一步治理好国家,最终达到天下和平。此外,孔子也重视学习和知识,他认为“学而时习之,不亦说乎?”通过不断学习和实践,可以提升自我,接近人生的意义。
验证二(非流式)
chat_completion = client.chat.completions.create(
messages=[
{
"role": "user",
"content": "say '谁是卧底。'",
}
],
model="qwen1.5-72b-chat",
)
print(chat_completion.choices[0].message.content)
谁是卧底?
使用 Langchain 的 OpenAI 插件
GenStudio 大模型 API 服务支持通过 Langchain 的 OpenAI 插件进行调用。
验证一(流式)
以下示例从环境变量中加载了 API 路径和 API 密钥。
GENSTUDIO_API_KEY
:GenStudio API KeyDEFAULT_BASE_URL
:GenStudio API 默认接口,为https://cloud.infini-ai.com/maas/v1
from openai import OpenAI
import os
API_KEY = os.getenv("GenStudio_API_KEY")
DEFAULT_BASE_URL = os.getenv("DEFAULT_BASE_URL")
from langchain_openai import ChatOpenAI
from langchain.callbacks.base import BaseCallbackHandler
from typing import Any, Dict, List
# Define a callback handler to process streaming tokens
class StreamHandler(BaseCallbackHandler):
def on_llm_new_token(self, token: str, **kwargs: Any) -> None:
print(token, end="", flush=True)
# Initialize the ChatOpenAI model with streaming enabled
llm_streaming = ChatOpenAI(
openai_api_key=API_KEY,
openai_api_base=DEFAULT_BASE_URL,
streaming=True,
callbacks=[StreamHandler()] # Pass the callback handler
)
# Define your messages
messages = [
{"role": "system", "content": "You are a pedantic anticient Chinese scholar, who always answers in Simplified Chinese."},
{"role": "user", "content": "Tell me a joke."}
]
# Get a response from the chat model
response = llm_streaming.invoke(input=messages)
# Output the response (optional, as it is already printed by the callback)
print("\n\n***********\n\n以下是 LLM 的完整回复:\n\n", response)
有一只深海鱼,每天都自由地游来游去,但它却一点也不开心。因为它压力很大。
***********
以下是 LLM 的完整回复:
content='有一只深海鱼,每天都自由地游来游去,但它却一点也不开心。因为它压力很大。' response_metadata={'finish_reason': 'stop'} id='run-25c3c02b-f027-491b-af88-f3eec5058760-0'
验证二(非流式)
from langchain_openai import ChatOpenAI
from typing import Any, Dict, List
# Initialize the ChatOpenAI model with streaming enabled
llm_non_streaming = ChatOpenAI(
openai_api_key=API_KEY,
openai_api_base=DEFAULT_BASE_URL,
streaming=False
)
# Define your messages
messages = [
{"role": "system", "content": "You are a pedantic anticient Chinese scholar, who always answers in Simplified Chinese."},
{"role": "user", "content": "Tell me a joke."}
]
# Get a response from the chat model
response = llm_non_streaming.invoke(input=messages)
# Output the response (optional, as it is already printed by the callback)
print("\n\n***********\n\n以下是 LLM 的完整回复:\n\n", response)
***********
以下是 LLM 的完整回复:
content='有一只深海鱼,每天都自由地游来游去,但它却一点也不开心。因为它压力很大。' response_metadata={'token_usage': {'completion_tokens': 24, 'prompt_tokens': 36, 'total_tokens': 60}, 'model_name': 'gpt-3.5-turbo', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None} id='run-48bb7543-1312-4f94-8313-dc3b18b2d87e-0'