DeepSeek R1:开源推理模型
上周,中国人工智能公司 DeepSeek 发布了备受期待的开源推理模型,名为 DeepSeek R1
。DeepSeek R1 模型,包括蒸馏版*和完整版,均可在 Mac、Windows 和 Linux 上的 LM Studio 中本地运行。
DeepSeek R1 蒸馏为 Qwen 7B (MLX, 4-bit) 模型,在 M1 Mac 上 100% 离线解决了代数问题。
如果您最近一周左右有上网,那您不太可能错过 DeepSeek 的新闻。
DeepSeek R1 模型代表了公开可用模型的一个重要且令人振奋的里程碑:您现在可以在本地系统上运行风格类似于 OpenAI o1 模型的“推理”模型。您所需要的只是足够的内存。
DeepSeek 的发布包括:
可以,如果您的内存足够。
操作步骤如下
推理模型经过训练,在提供最终答案之前会“思考”。这是通过一种称为“思维链”(CoT)的技术实现的。CoT 是一种鼓励模型将复杂问题分解成更小、更易于管理的步骤的技术。这使得模型能够通过一系列中间步骤得出最终答案,而不是试图一次性解决问题。DeepSeek 的 CoT 包含在 <think>...</think>
标记中。
当被问及一个非简单问题时,DeepSeek 模型会以 <think>
标记开始其回复。之后的内容通常看起来像一长串的意识流,模型会逐步解决问题。在结束 </think>
标记之后,模型开始生成常规内容,这是问题的最终答案。</think>
标记之后的内容直接受 <think>
部分内容的影响。
以下是 DeepSeek-R1-Distill-Qwen-7B 的输出,展示了其“思考”能力,以整体性地回答“番茄是水果吗?”这个问题。思考部分包含在 <think>...</think>
标签中。
用户提问
番茄是水果吗?
模型回复
DeepSeek-R1-Distill-Qwen-7B: <think> Okay, so I'm trying to figure out whether tomatoes are fruits or vegetables. I know that in many places like Italy, they're definitely considered fruits because you can eat them raw and they have a sweet taste. But here in the US, it's usually said they're vegetables. Hmm, why is there this difference? I remember learning that fruits come from flowers that aren't pollinated, right? Like roses where bees don't help them reproduce. Tomatoes grow on vines and are typically picked when ripe. They have seeds inside which you eat. On the other hand, vegetables usually come from plants that do get pollinated, like carrots or tomatoes when they're on a plant with flowers. Wait, but tomatoes can be both eaten raw and cooked, just like many fruits. So maybe it's more about how they're grown rather than their natural state. I think the confusion comes from different regions having different classifications. In Italy, they're definitely fruits because they're used as such in dishes without needing to be cooked down or anything. I guess in the US, tomatoes are considered vegetables because they come from plants that do produce flowers and require pollination by insects like bees. So even though you can eat them raw, their classification as a vegetable is based on their origin and how they're typically used in cooking here. But then again, I've seen them used both ways—like people making sauces or just eating them as snacks. But maybe in the culinary sense, they're more often used as vegetables because of their typical preparation methods like slicing or chopping for soups or salads. So it's probably a matter of regional definitions and how they are primarily consumed here versus where they might be considered fruits naturally. </think> Tomatoes can be both fruits and vegetables depending on the context: Fruit Perspective: In many countries, including Italy, tomatoes are classified as fruits because they grow on vines without needing pollination and can be eaten raw with a sweet taste. Vegetable Perspective (US Context): In the United States, tomatoes are considered vegetables due to their origin from plants that do produce flowers and require pollination by insects like bees. They are often used in cooking methods such as slicing or chopping for soups and salads, which aligns more with vegetable preparation. Thus, the classification of tomatoes can vary based on regional definitions and usage.
根据 AIME 2024、MATH-500 和 CodeForces 等多个流行的推理基准测试,开源旗舰版 671B 参数 DeepSeek-R1 模型的性能与 OpenAI 的完整版 o1 推理模型相当。较小的 DeepSeek-R1“蒸馏”模型性能与 OpenAI 的 o1-mini 推理模型相当。
“蒸馏”DeepSeek-R1 意味着:获取 Llama3 和 Qwen2.5 等较小的“密集模型”,并使用由大型模型生成的产物对其进行微调,目的是使其具备类似于大型模型的能力。
DeepSeek 通过从 DeepSeek-R1 中精选大约 80 万个(60 万个推理,20 万个非推理)高质量生成内容,并用这些内容训练 Llama3 和 Qwen2.5 模型来实现这一点(来源:DeepSeek 的 R1 论文)。
这是一种高效的技术,可以“教导”较小的现有模型像 DeepSeek-R1 一样进行推理。
DeepSeek-R1 主要通过无监督强化学习进行训练。这是一项重要成就,因为它意味着人类无需整理大量标注的监督微调(SFT)数据。
DeepSeek-R1 的前身 DeepSeek-R1-Zero 仅使用强化学习进行了微调。然而,它存在可读性和语言混合问题。
DeepSeek 最终为 R1 采用了多阶段训练流程,混合了 SFT 和 RL 技术,以便在解决 DeepSeek-R1-Zero 缺点的同时,保持 RL 的新颖性和成本效益。
有关训练的更详细信息,请参见DeepSeek 的 R1 论文。
您可以利用 LM Studio 的 API 从您的代码中调用 DeepSeek R1 模型。
以下是一些相关的文档链接
lms
:LM Studio 的命令行界面