您可以自定义模型的推理时和加载时参数。推理参数可以按请求设置，而加载参数则在模型加载时设置。

推理参数

设置推理时参数，例如 temperature、maxTokens、topP 等。

const prediction = model.respond(chat, {
  temperature: 0.6,
  maxTokens: 50,
});

有关所有可配置字段，请参阅 LLMPredictionConfigInput。

另一个有用的推理时配置参数是 structured，它允许您使用 JSON 或 Zod 模式严格强制输出结构。

加载参数

设置加载时参数，例如上下文长度、GPU 卸载比等。

.model() 获取对已加载模型的句柄，或按需加载新模型（即时加载）。

注意：如果模型已加载，则配置将被忽略。

const model = await client.llm.model("qwen2.5-7b-instruct", {
  config: {
    contextLength: 8192,
    gpu: {
      ratio: 0.5,
    },
  },
});

有关所有可配置字段，请参阅 LLMLoadModelConfig。

.load() 方法创建一个新的模型实例并使用指定的配置加载它。

const model = await client.llm.load("qwen2.5-7b-instruct", {
  config: {
    contextLength: 8192,
    gpu: {
      ratio: 0.5,
    },
  },
});

有关所有可配置字段，请参阅 LLMLoadModelConfig。