DeepSeek R1:开源推理模型
上周,中国 AI 公司 DeepSeek 发布了备受期待的开源推理模型,命名为 DeepSeek R1。无论是蒸馏版*还是全尺寸版,DeepSeek R1 模型均可在 Mac、Windows 和 Linux 上的 LM Studio 中本地运行。
蒸馏至 Qwen 7B (MLX, 4-bit) 的 DeepSeek R1 正在 M1 Mac 上 100% 离线解答代数问题。
如果你在过去一周左右上过网,大概率不会错过关于 DeepSeek 的新闻。
DeepSeek R1 模型代表了开源模型的一个重要且令人兴奋的里程碑:现在你可以在本地系统上运行与 OpenAI o1 模型风格类似的“推理”模型。你只需要足够的内存(RAM)。
DeepSeek 的发布包含:
可以,只要你有足够的内存。
操作步骤如下:
推理模型经过训练,在提供最终答案之前会进行“思考”。这是通过一种称为“思维链”(CoT)的技术实现的。CoT 鼓励模型将复杂问题分解为更小、更易处理的步骤。这使得模型能够通过一系列中间步骤得出最终答案,而不是试图一次性解决问题。DeepSeek 的思维链包含在 <think>...</think> 标记中。
当被问及非平凡的问题时,DeepSeek 模型会以 <think> 标记开始回答。随后的内容通常看起来像长篇的心流记录,模型在此逐步解决问题。在闭合标记 </think> 之后,模型开始生成常规内容,即问题的最终答案。 </think> 之后的内容直接受到 <think> 部分内容的影响。
以下是 DeepSeek-R1-Distill-Qwen-7B 的输出示例,展示了它通过“思考”来全面回答“西红柿是水果吗?”这一问题的能力。思考部分被包裹在 <think>...</think> 标签中。
用户问题
西红柿是水果吗?
模型回答
DeepSeek-R1-Distill-Qwen-7B: <think> Okay, so I'm trying to figure out whether tomatoes are fruits or vegetables. I know that in many places like Italy, they're definitely considered fruits because you can eat them raw and they have a sweet taste. But here in the US, it's usually said they're vegetables. Hmm, why is there this difference? I remember learning that fruits come from flowers that aren't pollinated, right? Like roses where bees don't help them reproduce. Tomatoes grow on vines and are typically picked when ripe. They have seeds inside which you eat. On the other hand, vegetables usually come from plants that do get pollinated, like carrots or tomatoes when they're on a plant with flowers. Wait, but tomatoes can be both eaten raw and cooked, just like many fruits. So maybe it's more about how they're grown rather than their natural state. I think the confusion comes from different regions having different classifications. In Italy, they're definitely fruits because they're used as such in dishes without needing to be cooked down or anything. I guess in the US, tomatoes are considered vegetables because they come from plants that do produce flowers and require pollination by insects like bees. So even though you can eat them raw, their classification as a vegetable is based on their origin and how they're typically used in cooking here. But then again, I've seen them used both ways—like people making sauces or just eating them as snacks. But maybe in the culinary sense, they're more often used as vegetables because of their typical preparation methods like slicing or chopping for soups or salads. So it's probably a matter of regional definitions and how they are primarily consumed here versus where they might be considered fruits naturally. </think> Tomatoes can be both fruits and vegetables depending on the context: Fruit Perspective: In many countries, including Italy, tomatoes are classified as fruits because they grow on vines without needing pollination and can be eaten raw with a sweet taste. Vegetable Perspective (US Context): In the United States, tomatoes are considered vegetables due to their origin from plants that do produce flowers and require pollination by insects like bees. They are often used in cooking methods such as slicing or chopping for soups and salads, which aligns more with vegetable preparation. Thus, the classification of tomatoes can vary based on regional definitions and usage.
根据 AIME 2024、MATH-500 和 CodeForces 等多个流行推理基准测试,开源旗舰级 671B 参数的 DeepSeek-R1 模型表现与 OpenAI 的全尺寸 o1 推理模型相当。较小的 DeepSeek-R1 “蒸馏”模型表现与 OpenAI 的 o1-mini 推理模型相当。
“蒸馏” DeepSeek-R1 意味着:采用较小的“稠密模型”(如 Llama3 和 Qwen2.5),并使用大型模型生成的成果对其进行微调,旨在赋予它们类似于大型模型的能力。
DeepSeek 通过从 DeepSeek-R1 中精选约 80 万条(60 万条推理类,20 万条非推理类)高质量生成内容,并据此训练 Llama3 和 Qwen2.5 模型来实现这一目标(来源:DeepSeek R1 论文)。
这是一种“教授”现有小型模型如何像 DeepSeek-R1 一样推理的高效技术。
DeepSeek-R1 主要使用无监督强化学习进行训练。这是一个重要的成就,因为它意味着人类不需要准备大量的标记化监督微调 (SFT) 数据。
DeepSeek-R1 的前身 DeepSeek-R1-Zero 仅 使用强化学习进行微调。然而,它存在可读性差和语言混杂的问题。
DeepSeek 最终为 R1 开发了一套多阶段训练流水线,将 SFT 和 RL 技术结合起来,在解决 DeepSeek-R1-Zero 缺陷的同时,保留了 RL 的新颖性和成本优势。
更多关于训练的详细信息可以在 DeepSeek R1 论文 中找到。
你可以利用 LM Studio 的 API 在代码中调用 DeepSeek R1 模型。
以下是一些相关的文档链接:
lms:LM Studio 的命令行工具 (CLI)