DeepSeek R1:开源推理模型

2025-01-29

上周,中国 AI 公司 DeepSeek 发布了备受期待的开源推理模型,命名为 DeepSeek R1。无论是蒸馏版*还是全尺寸版,DeepSeek R1 模型均可在 Mac、Windows 和 Linux 上的 LM Studio 中本地运行。

* 请阅读下文了解蒸馏模型及其制作过程

蒸馏至 Qwen 7B (MLX, 4-bit) 的 DeepSeek R1 正在 M1 Mac 上 100% 离线解答代数问题。

DeepSeek R1 模型:蒸馏版与全尺寸版

如果你在过去一周左右上过网,大概率不会错过关于 DeepSeek 的新闻。

DeepSeek R1 模型代表了开源模型的一个重要且令人兴奋的里程碑:现在你可以在本地系统上运行与 OpenAI o1 模型风格类似的“推理”模型。你只需要足够的内存(RAM)。

DeepSeek 的发布包含:

  • DeepSeek-R1 - 拥有 671B 参数的旗舰推理模型
  • DeepSeek-R1 蒸馏模型:一组使用 DeepSeek-R1 生成的数据进行微调的小型预训练模型(包含 1.5B, 7B, 8B, 14B, 32B, 70B 参数)。例如 DeepSeek-R1-Distill-Qwen-7B
  • DeepSeek-R1-Zero - 仅使用无监督强化学习 (RL) 微调的 R1 原型模型

我可以在本地运行 DeepSeek R1 模型吗?

可以,只要你有足够的内存。

操作步骤如下:

  • 此处 下载适用于你操作系统的 LM Studio
  • 点击侧边栏的 🔎 图标,搜索 "DeepSeek"。
  • 选择一个适合你系统的选项。例如,如果你有 16GB 内存,可以运行 7B 或 8B 参数的蒸馏模型。如果你有 192GB 以上的内存,可以运行完整的 671B 参数模型。
  • 在聊天窗口加载模型,开始提问!

什么是推理模型?

推理模型经过训练,在提供最终答案之前会进行“思考”。这是通过一种称为“思维链”(CoT)的技术实现的。CoT 鼓励模型将复杂问题分解为更小、更易处理的步骤。这使得模型能够通过一系列中间步骤得出最终答案,而不是试图一次性解决问题。DeepSeek 的思维链包含在 <think>...</think> 标记中。

当被问及非平凡的问题时,DeepSeek 模型会以 <think> 标记开始回答。随后的内容通常看起来像长篇的心流记录,模型在此逐步解决问题。在闭合标记 </think> 之后,模型开始生成常规内容,即问题的最终答案。 </think> 之后的内容直接受到 <think> 部分内容的影响。

以下是 DeepSeek-R1-Distill-Qwen-7B 的输出示例,展示了它通过“思考”来全面回答“西红柿是水果吗?”这一问题的能力。思考部分被包裹在 <think>...</think> 标签中。

用户问题

西红柿是水果吗?

模型回答

点击展开/折叠
DeepSeek-R1-Distill-Qwen-7B:
<think>
Okay, so I'm trying to figure out whether tomatoes are fruits or vegetables. I know that in many places like Italy, they're definitely considered fruits because you can eat them raw and they have a sweet taste. But here in the US, it's usually said they're vegetables. Hmm, why is there this difference?
I remember learning that fruits come from flowers that aren't pollinated, right? Like roses where bees don't help them reproduce. Tomatoes grow on vines and are typically picked when ripe. They have seeds inside which you eat. On the other hand, vegetables usually come from plants that do get pollinated, like carrots or tomatoes when they're on a plant with flowers.

Wait, but tomatoes can be both eaten raw and cooked, just like many fruits. So maybe it's more about how they're grown rather than their natural state. I think the confusion comes from different regions having different classifications. In Italy, they're definitely fruits because they're used as such in dishes without needing to be cooked down or anything.

I guess in the US, tomatoes are considered vegetables because they come from plants that do produce flowers and require pollination by insects like bees. So even though you can eat them raw, their classification as a vegetable is based on their origin and how they're typically used in cooking here.

But then again, I've seen them used both ways—like people making sauces or just eating them as snacks. But maybe in the culinary sense, they're more often used as vegetables because of their typical preparation methods like slicing or chopping for soups or salads.

So it's probably a matter of regional definitions and how they are primarily consumed here versus where they might be considered fruits naturally.
</think>

Tomatoes can be both fruits and vegetables depending on the context:

Fruit Perspective: In many countries, including Italy, tomatoes are classified as fruits because they grow on vines without needing pollination and can be eaten raw with a sweet taste.

Vegetable Perspective (US Context): In the United States, tomatoes are considered vegetables due to their origin from plants that do produce flowers and require pollination by insects like bees. They are often used in cooking methods such as slicing or chopping for soups and salads, which aligns more with vegetable preparation.

Thus, the classification of tomatoes can vary based on regional definitions and usage.

性能

根据 AIME 2024、MATH-500 和 CodeForces 等多个流行推理基准测试,开源旗舰级 671B 参数的 DeepSeek-R1 模型表现与 OpenAI 的全尺寸 o1 推理模型相当。较小的 DeepSeek-R1 “蒸馏”模型表现与 OpenAI 的 o1-mini 推理模型相当。

蒸馏

“蒸馏” DeepSeek-R1 意味着:采用较小的“稠密模型”(如 Llama3 和 Qwen2.5),并使用大型模型生成的成果对其进行微调,旨在赋予它们类似于大型模型的能力。

DeepSeek 通过从 DeepSeek-R1 中精选约 80 万条(60 万条推理类,20 万条非推理类)高质量生成内容,并据此训练 Llama3 和 Qwen2.5 模型来实现这一目标(来源:DeepSeek R1 论文)。

这是一种“教授”现有小型模型如何像 DeepSeek-R1 一样推理的高效技术。

训练

DeepSeek-R1 主要使用无监督强化学习进行训练。这是一个重要的成就,因为它意味着人类不需要准备大量的标记化监督微调 (SFT) 数据。

DeepSeek-R1 的前身 DeepSeek-R1-Zero 使用强化学习进行微调。然而,它存在可读性差和语言混杂的问题。

DeepSeek 最终为 R1 开发了一套多阶段训练流水线,将 SFT 和 RL 技术结合起来,在解决 DeepSeek-R1-Zero 缺陷的同时,保留了 RL 的新颖性和成本优势。

更多关于训练的详细信息可以在 DeepSeek R1 论文 中找到。

在代码中本地调用 DeepSeek R1 模型

你可以利用 LM Studio 的 API 在代码中调用 DeepSeek R1 模型。

以下是一些相关的文档链接:

更多内容