DeepSeek R1:开源推理模型

2025-01-29

上周,中国人工智能公司DeepSeek发布了备受期待的开源推理模型,名为DeepSeek R1。DeepSeek R1 模型,包括精简版*和完整版,均可在 Mac、Windows 和 Linux 上的 LM Studio 中本地运行。

* 阅读下文了解精简模型及其制作方式

DeepSeek R1 精简为 Qwen 7B(MLX,4 位),在 M1 Mac 上 100% 离线解决代数问题。

DeepSeek R1 模型,精简版和完整版

如果你最近一周左右上网,你几乎不可能错过 DeepSeek 的新闻。

DeepSeek R1 模型代表了开放模型的一个重要而令人兴奋的里程碑:你现在可以在本地系统上运行“推理”模型,其风格类似于 OpenAI 的 o1 模型。你所需要的只是足够的 RAM。

DeepSeek 的发布包括

  • DeepSeek-R1 - 旗舰级 671B 参数推理模型
  • DeepSeek-R1 精简模型:一系列较小的现有模型,使用 DeepSeek-R1 生成的数据进行微调(1.5B、7B、8B、14B、32B、70B 参数)。一个例子是DeepSeek-R1-Distill-Qwen-7B
  • DeepSeek-R1-Zero - 仅使用无监督强化学习(RL)进行微调的 R1 原型

我可以在本地运行 DeepSeek R1 模型吗?

是的,如果你有足够的内存。

以下是操作方法

  • 这里下载适用于你的操作系统的LM Studio
  • 点击侧边栏上的🔎图标,搜索“DeepSeek”
  • 选择一个适合你系统的选项。例如,如果你有 16GB 的 RAM,你可以运行 7B 或 8B 参数的精简模型。如果你有大约 192GB+ 的 RAM,你可以运行完整的 671B 参数模型。
  • 在聊天中加载模型,然后开始提问!

推理模型是什么?

推理模型经过训练,在提供最终答案之前会“思考”。这是通过一种称为“思维链”(CoT)的技术完成的。CoT 是一种鼓励模型将复杂问题分解为更小、更易于管理步骤的技术。这使得模型能够通过一系列中间步骤得出最终答案,而不是试图一次性解决问题。DeepSeek 的 CoT 包含在<think>...</think>标记中。

当被问及非琐碎的问题时,DeepSeek 模型将以<think>标记开始其响应。随后的内容通常看起来像一长串意识流,模型正在逐步解决问题。在结束</think>标记之后,模型开始生成常规内容,这是问题的最终答案。</think>标记之后的内容直接受<think>部分内容的影响。

以下是 DeepSeek-R1-Distill-Qwen-7B 的输出,它展示了其“思考”能力,以全面回答问题“番茄是水果吗?”思考部分包含在<think>...</think>标签中。

用户问题

番茄是水果吗?

模型的回复

点击展开/折叠
DeepSeek-R1-Distill-Qwen-7B:
<think>
Okay, so I'm trying to figure out whether tomatoes are fruits or vegetables. I know that in many places like Italy, they're definitely considered fruits because you can eat them raw and they have a sweet taste. But here in the US, it's usually said they're vegetables. Hmm, why is there this difference?
I remember learning that fruits come from flowers that aren't pollinated, right? Like roses where bees don't help them reproduce. Tomatoes grow on vines and are typically picked when ripe. They have seeds inside which you eat. On the other hand, vegetables usually come from plants that do get pollinated, like carrots or tomatoes when they're on a plant with flowers.

Wait, but tomatoes can be both eaten raw and cooked, just like many fruits. So maybe it's more about how they're grown rather than their natural state. I think the confusion comes from different regions having different classifications. In Italy, they're definitely fruits because they're used as such in dishes without needing to be cooked down or anything.

I guess in the US, tomatoes are considered vegetables because they come from plants that do produce flowers and require pollination by insects like bees. So even though you can eat them raw, their classification as a vegetable is based on their origin and how they're typically used in cooking here.

But then again, I've seen them used both ways—like people making sauces or just eating them as snacks. But maybe in the culinary sense, they're more often used as vegetables because of their typical preparation methods like slicing or chopping for soups or salads.

So it's probably a matter of regional definitions and how they are primarily consumed here versus where they might be considered fruits naturally.
</think>

Tomatoes can be both fruits and vegetables depending on the context:

Fruit Perspective: In many countries, including Italy, tomatoes are classified as fruits because they grow on vines without needing pollination and can be eaten raw with a sweet taste.

Vegetable Perspective (US Context): In the United States, tomatoes are considered vegetables due to their origin from plants that do produce flowers and require pollination by insects like bees. They are often used in cooking methods such as slicing or chopping for soups and salads, which aligns more with vegetable preparation.

Thus, the classification of tomatoes can vary based on regional definitions and usage.

性能

根据 AIME 2024、MATH-500 和 CodeForces 等几个流行的推理基准测试,开源旗舰 671B 参数 DeepSeek-R1 模型与 OpenAI 的完整 o1 推理模型表现相当。较小的 DeepSeek-R1 “精简”模型与 OpenAI 的 o1-mini 推理模型表现相当。

蒸馏

“蒸馏”DeepSeek-R1 意味着:获取 Llama3 和 Qwen2.5 等较小的“密集模型”,并使用由较大模型生成的工件对它们进行微调,旨在使其具备类似于较大模型的能力。

DeepSeek 通过精心策划约 80 万(60 万推理,20 万非推理)高质量的 DeepSeek-R1 生成数据,并用它们训练 Llama3 和 Qwen2.5 模型来实现这一点(来源:DeepSeek 的 R1 出版物)。

这是一种高效的技术,可以“教导”较小的现有模型像 DeepSeek-R1 一样进行推理。

训练

DeepSeek-R1 主要通过无监督强化学习进行训练。这是一项重要的成就,因为它意味着人类不必整理那么多带标签的监督微调 (SFT) 数据。

DeepSeek-R1 的前身 DeepSeek-R1-Zero 使用强化学习进行微调。然而,它存在可读性和语言混合问题。

DeepSeek 最终为 R1 采用了多阶段训练流程,结合了 SFT 和 RL 技术,以在解决 DeepSeek-R1-Zero 缺陷的同时,保持 RL 的新颖性和成本优势。

有关训练的更多详细信息,请参阅DeepSeek 的 R1 出版物

从您自己的代码本地使用 DeepSeek R1 模型

您可以利用 LM Studio 的 API 从您自己的代码中调用 DeepSeek R1 模型。

以下是一些相关的文档链接

更多内容