文档
本地运行 LLM
用户界面
高级
LM Studio REST API (beta)
实验性功能
除了 OpenAI 兼容模式外,LM Studio 现在还拥有自己的 REST API(了解更多)。
REST API 包括增强的统计信息,例如 Token / 秒和首个 Token 响应时间 (TTFT),以及关于模型的丰富信息,例如已加载与未加载、最大上下文、量化等等。
GET /api/v0/models
- 列出可用模型GET /api/v0/models/{model}
- 获取关于特定模型的信息POST /api/v0/chat/completions
- 聊天补全(消息 → 助手回复)POST /api/v0/completions
- 文本补全(提示 → 补全)POST /api/v0/embeddings
- 文本嵌入(文本 → 嵌入向量)要启动服务器,请运行以下命令
lms server start
您可以将 LM Studio 作为服务运行,并让服务器在启动时自动启动,而无需启动 GUI。 了解关于无头模式。
GET /api/v0/models
列出所有已加载和已下载的模型
请求示例
curl http://localhost:1234/api/v0/models
响应格式
{ "object": "list", "data": [ { "id": "qwen2-vl-7b-instruct", "object": "model", "type": "vlm", "publisher": "mlx-community", "arch": "qwen2_vl", "compatibility_type": "mlx", "quantization": "4bit", "state": "not-loaded", "max_context_length": 32768 }, { "id": "meta-llama-3.1-8b-instruct", "object": "model", "type": "llm", "publisher": "lmstudio-community", "arch": "llama", "compatibility_type": "gguf", "quantization": "Q4_K_M", "state": "not-loaded", "max_context_length": 131072 }, { "id": "text-embedding-nomic-embed-text-v1.5", "object": "model", "type": "embeddings", "publisher": "nomic-ai", "arch": "nomic-bert", "compatibility_type": "gguf", "quantization": "Q4_0", "state": "not-loaded", "max_context_length": 2048 } ] }
GET /api/v0/models/{model}
获取关于一个特定模型的信息
请求示例
curl http://localhost:1234/api/v0/models/qwen2-vl-7b-instruct
响应格式
{ "id": "qwen2-vl-7b-instruct", "object": "model", "type": "vlm", "publisher": "mlx-community", "arch": "qwen2_vl", "compatibility_type": "mlx", "quantization": "4bit", "state": "not-loaded", "max_context_length": 32768 }
POST /api/v0/chat/completions
聊天补全 API。您提供一个消息数组,并接收聊天中助手的下一个回复。
请求示例
curl http://localhost:1234/api/v0/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "granite-3.0-2b-instruct", "messages": [ { "role": "system", "content": "Always answer in rhymes." }, { "role": "user", "content": "Introduce yourself." } ], "temperature": 0.7, "max_tokens": -1, "stream": false }'
响应格式
{ "id": "chatcmpl-i3gkjwthhw96whukek9tz", "object": "chat.completion", "created": 1731990317, "model": "granite-3.0-2b-instruct", "choices": [ { "index": 0, "logprobs": null, "finish_reason": "stop", "message": { "role": "assistant", "content": "Greetings, I'm a helpful AI, here to assist,\nIn providing answers, with no distress.\nI'll keep it short and sweet, in rhyme you'll find,\nA friendly companion, all day long you'll bind." } } ], "usage": { "prompt_tokens": 24, "completion_tokens": 53, "total_tokens": 77 }, "stats": { "tokens_per_second": 51.43709529007664, "time_to_first_token": 0.111, "generation_time": 0.954, "stop_reason": "eosFound" }, "model_info": { "arch": "granite", "quant": "Q4_K_M", "format": "gguf", "context_length": 4096 }, "runtime": { "name": "llama.cpp-mac-arm64-apple-metal-advsimd", "version": "1.3.0", "supported_formats": ["gguf"] } }
POST /api/v0/completions
文本补全 API。您提供一个提示,并接收一个补全。
请求示例
curl http://localhost:1234/api/v0/completions \ -H "Content-Type: application/json" \ -d '{ "model": "granite-3.0-2b-instruct", "prompt": "the meaning of life is", "temperature": 0.7, "max_tokens": 10, "stream": false, "stop": "\n" }'
响应格式
{ "id": "cmpl-p9rtxv6fky2v9k8jrd8cc", "object": "text_completion", "created": 1731990488, "model": "granite-3.0-2b-instruct", "choices": [ { "index": 0, "text": " to find your purpose, and once you have", "logprobs": null, "finish_reason": "length" } ], "usage": { "prompt_tokens": 5, "completion_tokens": 9, "total_tokens": 14 }, "stats": { "tokens_per_second": 57.69230769230769, "time_to_first_token": 0.299, "generation_time": 0.156, "stop_reason": "maxPredictedTokensReached" }, "model_info": { "arch": "granite", "quant": "Q4_K_M", "format": "gguf", "context_length": 4096 }, "runtime": { "name": "llama.cpp-mac-arm64-apple-metal-advsimd", "version": "1.3.0", "supported_formats": ["gguf"] } }
POST /api/v0/embeddings
文本嵌入 API。您提供一段文本,并返回该文本的嵌入向量表示。
请求示例
curl http://127.0.0.1:1234/api/v0/embeddings \ -H "Content-Type: application/json" \ -d '{ "model": "text-embedding-nomic-embed-text-v1.5", "input": "Some text to embed" }
响应示例
{ "object": "list", "data": [ { "object": "embedding", "embedding": [ -0.016731496900320053, 0.028460891917347908, -0.1407836228609085, ... (truncated for brevity) ..., 0.02505224384367466, -0.0037634256295859814, -0.04341062530875206 ], "index": 0 } ], "model": "text-embedding-nomic-embed-text-v1.5@q4_k_m", "usage": { "prompt_tokens": 0, "total_tokens": 0 } }
请在 Github 上提交 issue 以报告错误。