DeepSeek R1:开源推理模型

2025-01-29

上周,中国人工智能公司 DeepSeek 发布了其备受期待的开源推理模型,名为 DeepSeek R1。DeepSeek R1 模型,包括精馏版*和完整版,都可以在 Mac、Windows 和 Linux 系统的 LM Studio 中本地运行。

* 阅读下文了解精馏模型及其制作方法

DeepSeek R1 精馏到 Qwen 7B (MLX, 4-bit) 模型,解决代数问题,100% 离线运行在 M1 Mac 上。

DeepSeek R1 模型,精馏版和完整版

如果你最近一周左右上网,你很可能已经错过了关于 DeepSeek 的新闻。

DeepSeek R1 模型代表了公开可用模型的一个重要且令人兴奋的里程碑:你现在可以在本地系统上运行“推理”模型,其风格类似于 OpenAI 的 o1 模型。你只需要足够的 RAM。

DeepSeek 的发布包括

  • DeepSeek-R1 - 旗舰版 671B 参数推理模型
  • DeepSeek-R1 精馏模型:一系列较小的预训练模型,使用 DeepSeek-R1 生成的数据进行微调(1.5B、7B、8B、14B、32B、70B 参数)。例如 DeepSeek-R1-Distill-Qwen-7B
  • DeepSeek-R1-Zero - 仅使用无监督强化学习 (RL) 微调的 R1 原型

我可以在本地运行 DeepSeek R1 模型吗?

可以,如果你有足够的 RAM。

以下是如何操作

  • 这里下载适用于你操作系统的 LM Studio
  • 点击侧边栏上的 🔎 图标,搜索 “DeepSeek”
  • 选择一个适合你系统配置的选项。例如,如果你有 16GB RAM,你可以运行 7B 或 8B 参数的精馏模型。如果你有 ~192GB+ RAM,你可以运行完整的 671B 参数模型。
  • 在聊天中加载模型,开始提问吧!

推理模型,它们是什么?

推理模型经过训练,可以在提供最终答案之前“思考”。这使用了一种称为“思维链”(CoT) 的技术。CoT 是一种鼓励模型将复杂问题分解为更小、更易于管理的步骤的技术。这使得模型能够通过一系列中间步骤得出最终答案,而不是试图一步到位地解决问题。DeepSeek 的 CoT 包含在 <think>...</think> 标记中。

当被问及一个非简单的问题时,DeepSeek 模型将以 <think> 标记开始其响应。之后的内容通常看起来像一段长长的意识流,模型在其中逐步解决问题。在结束标记 </think> 之后,模型开始生成常规内容,即问题的最终答案。</think> 标记之后的内容直接受到 <think> 部分内容的影响。

以下是 DeepSeek-R1-Distill-Qwen-7B 的输出,演示了其“思考”能力,以全面回答问题 “西红柿是水果吗?” 思考部分包裹在 <think>...</think> 标签中。

用户的问题

西红柿是水果吗?

模型的回答

点击展开/折叠
DeepSeek-R1-Distill-Qwen-7B:
<think>
Okay, so I'm trying to figure out whether tomatoes are fruits or vegetables. I know that in many places like Italy, they're definitely considered fruits because you can eat them raw and they have a sweet taste. But here in the US, it's usually said they're vegetables. Hmm, why is there this difference?
I remember learning that fruits come from flowers that aren't pollinated, right? Like roses where bees don't help them reproduce. Tomatoes grow on vines and are typically picked when ripe. They have seeds inside which you eat. On the other hand, vegetables usually come from plants that do get pollinated, like carrots or tomatoes when they're on a plant with flowers.

Wait, but tomatoes can be both eaten raw and cooked, just like many fruits. So maybe it's more about how they're grown rather than their natural state. I think the confusion comes from different regions having different classifications. In Italy, they're definitely fruits because they're used as such in dishes without needing to be cooked down or anything.

I guess in the US, tomatoes are considered vegetables because they come from plants that do produce flowers and require pollination by insects like bees. So even though you can eat them raw, their classification as a vegetable is based on their origin and how they're typically used in cooking here.

But then again, I've seen them used both ways—like people making sauces or just eating them as snacks. But maybe in the culinary sense, they're more often used as vegetables because of their typical preparation methods like slicing or chopping for soups or salads.

So it's probably a matter of regional definitions and how they are primarily consumed here versus where they might be considered fruits naturally.
</think>

Tomatoes can be both fruits and vegetables depending on the context:

Fruit Perspective: In many countries, including Italy, tomatoes are classified as fruits because they grow on vines without needing pollination and can be eaten raw with a sweet taste.

Vegetable Perspective (US Context): In the United States, tomatoes are considered vegetables due to their origin from plants that do produce flowers and require pollination by insects like bees. They are often used in cooking methods such as slicing or chopping for soups and salads, which aligns more with vegetable preparation.

Thus, the classification of tomatoes can vary based on regional definitions and usage.

性能

根据 AIME 2024、MATH-500 和 CodeForces 等几个流行的推理基准测试,开源旗舰版 671B 参数 DeepSeek-R1 模型的性能与 OpenAI 的全尺寸 o1 推理模型相当。较小的 DeepSeek-R1 “精馏”模型的性能与 OpenAI 的 o1-mini 推理模型相当。

精馏

“精馏” DeepSeek-R1 的意思是:采用较小的“密集模型”,如 Llama3 和 Qwen2.5,并使用较大模型生成的工件对其进行微调,目的是将类似于较大模型的能力灌输到它们中。

DeepSeek 通过整理大约 80 万个(60 万个推理,20 万个非推理)来自 DeepSeek-R1 的高质量生成数据来实现这一点,并在这些数据上训练 Llama3 和 Qwen2.5 模型(来源:DeepSeek 的 R1 出版物)。

这是一种有效的技术,可以“教”较小的预训练模型如何像 DeepSeek-R1 一样进行推理。

训练

DeepSeek-R1 主要使用无监督强化学习进行训练。这是一项重要的成就,因为它意味着人类不必整理那么多标记的监督微调 (SFT) 数据。

DeepSeek-R1-Zero,DeepSeek-R1 的前身,使用强化学习进行微调。但是,它在可读性和语言混合方面存在问题。

DeepSeek 最终为 R1 制定了一个多阶段训练管道,该管道混合了 SFT 和 RL 技术,以保持 RL 的新颖性和成本优势,同时解决 DeepSeek-R1-Zero 的缺点。

有关训练的更多详细信息,请参见 DeepSeek 的 R1 出版物

从你自己的代码本地使用 DeepSeek R1 模型

你可以利用 LM Studio 的 API 从你自己的代码中调用 DeepSeek R1 模型。

以下是一些相关的文档链接

更多内容