DeepSeek R1:开源推理模型
上周,中国人工智能公司DeepSeek发布了备受期待的开源推理模型,名为DeepSeek R1
。DeepSeek R1 模型,包括精简版*和完整版,均可在 Mac、Windows 和 Linux 上的 LM Studio 中本地运行。
DeepSeek R1 精简为 Qwen 7B(MLX,4 位),在 M1 Mac 上 100% 离线解决代数问题。
如果你最近一周左右上网,你几乎不可能错过 DeepSeek 的新闻。
DeepSeek R1 模型代表了开放模型的一个重要而令人兴奋的里程碑:你现在可以在本地系统上运行“推理”模型,其风格类似于 OpenAI 的 o1 模型。你所需要的只是足够的 RAM。
DeepSeek 的发布包括
是的,如果你有足够的内存。
以下是操作方法
推理模型经过训练,在提供最终答案之前会“思考”。这是通过一种称为“思维链”(CoT)的技术完成的。CoT 是一种鼓励模型将复杂问题分解为更小、更易于管理步骤的技术。这使得模型能够通过一系列中间步骤得出最终答案,而不是试图一次性解决问题。DeepSeek 的 CoT 包含在<think>...</think>
标记中。
当被问及非琐碎的问题时,DeepSeek 模型将以<think>
标记开始其响应。随后的内容通常看起来像一长串意识流,模型正在逐步解决问题。在结束</think>
标记之后,模型开始生成常规内容,这是问题的最终答案。</think>
标记之后的内容直接受<think>
部分内容的影响。
以下是 DeepSeek-R1-Distill-Qwen-7B 的输出,它展示了其“思考”能力,以全面回答问题“番茄是水果吗?”思考部分包含在<think>...</think>
标签中。
用户问题
番茄是水果吗?
模型的回复
DeepSeek-R1-Distill-Qwen-7B: <think> Okay, so I'm trying to figure out whether tomatoes are fruits or vegetables. I know that in many places like Italy, they're definitely considered fruits because you can eat them raw and they have a sweet taste. But here in the US, it's usually said they're vegetables. Hmm, why is there this difference? I remember learning that fruits come from flowers that aren't pollinated, right? Like roses where bees don't help them reproduce. Tomatoes grow on vines and are typically picked when ripe. They have seeds inside which you eat. On the other hand, vegetables usually come from plants that do get pollinated, like carrots or tomatoes when they're on a plant with flowers. Wait, but tomatoes can be both eaten raw and cooked, just like many fruits. So maybe it's more about how they're grown rather than their natural state. I think the confusion comes from different regions having different classifications. In Italy, they're definitely fruits because they're used as such in dishes without needing to be cooked down or anything. I guess in the US, tomatoes are considered vegetables because they come from plants that do produce flowers and require pollination by insects like bees. So even though you can eat them raw, their classification as a vegetable is based on their origin and how they're typically used in cooking here. But then again, I've seen them used both ways—like people making sauces or just eating them as snacks. But maybe in the culinary sense, they're more often used as vegetables because of their typical preparation methods like slicing or chopping for soups or salads. So it's probably a matter of regional definitions and how they are primarily consumed here versus where they might be considered fruits naturally. </think> Tomatoes can be both fruits and vegetables depending on the context: Fruit Perspective: In many countries, including Italy, tomatoes are classified as fruits because they grow on vines without needing pollination and can be eaten raw with a sweet taste. Vegetable Perspective (US Context): In the United States, tomatoes are considered vegetables due to their origin from plants that do produce flowers and require pollination by insects like bees. They are often used in cooking methods such as slicing or chopping for soups and salads, which aligns more with vegetable preparation. Thus, the classification of tomatoes can vary based on regional definitions and usage.
根据 AIME 2024、MATH-500 和 CodeForces 等几个流行的推理基准测试,开源旗舰 671B 参数 DeepSeek-R1 模型与 OpenAI 的完整 o1 推理模型表现相当。较小的 DeepSeek-R1 “精简”模型与 OpenAI 的 o1-mini 推理模型表现相当。
“蒸馏”DeepSeek-R1 意味着:获取 Llama3 和 Qwen2.5 等较小的“密集模型”,并使用由较大模型生成的工件对它们进行微调,旨在使其具备类似于较大模型的能力。
DeepSeek 通过精心策划约 80 万(60 万推理,20 万非推理)高质量的 DeepSeek-R1 生成数据,并用它们训练 Llama3 和 Qwen2.5 模型来实现这一点(来源:DeepSeek 的 R1 出版物)。
这是一种高效的技术,可以“教导”较小的现有模型像 DeepSeek-R1 一样进行推理。
DeepSeek-R1 主要通过无监督强化学习进行训练。这是一项重要的成就,因为它意味着人类不必整理那么多带标签的监督微调 (SFT) 数据。
DeepSeek-R1 的前身 DeepSeek-R1-Zero 仅使用强化学习进行微调。然而,它存在可读性和语言混合问题。
DeepSeek 最终为 R1 采用了多阶段训练流程,结合了 SFT 和 RL 技术,以在解决 DeepSeek-R1-Zero 缺陷的同时,保持 RL 的新颖性和成本优势。
有关训练的更多详细信息,请参阅DeepSeek 的 R1 出版物。
您可以利用 LM Studio 的 API 从您自己的代码中调用 DeepSeek R1 模型。
以下是一些相关的文档链接
lms
:LM Studio 的 CLI