当前位置：首页 > 资源 > 论文 > 正文内容

【论文】DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

RWYQ阿伟2025-11-12论文1020

摘要

We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrates remarkable reasoning capabilities. Through RL, DeepSeek-R1-Zero naturally emerges with numerous powerful and intriguing reasoning behaviors. However, it encounters challenges such as poor readability, and language mixing. To address these issues and further enhance reasoning performance, we introduce DeepSeek-R1, which incorporates multi-stage training and cold-start data before RL. DeepSeekR1 achieves performance comparable to OpenAI-o1-1217 on reasoning tasks. To support the research community, we open-source DeepSeek-R1-Zero, DeepSeek-R1, and six dense models (1.5B, 7B, 8B, 14B, 32B, 70B) distilled from DeepSeek-R1 based on Qwen and Llama.

关键词

论文；AI；DeepSeek；

作者

DeepSeek-AI；

时间

未知；

语言

英文；

格式

PDF；

大小

1.26MB；

页数

P-22；

截图

下载

百度网盘夸克网盘

解压密码

www.awnotebook.com

声明

本站部分图片、资源、书籍、软件等内容来源于网络，本站所供资料仅供学习之用，任何人不得将之他用或者进行传播，否则应当自行向实际权利人承担法律责任。因本站部分资料来源于其他媒介，如存在没有标注来源或来源标注错误导致侵犯阁下权利之处，敬请告知，我将立即予以处理。请支持正版。

扫描二维码推送至手机访问。

本文链接：https://www.awnotebook.com/post/789.html

标签: 论文 AI

分享给朋友：

“【论文】DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning” 的相关文章

【网站】Scenario2025-07-23

【网站】D.DESIGN堆友2025-09-22

【论文】大型风力发电机组控制系统设计2025-11-11

【论文】电厂SIS监控信息系统的分析与设计2025-11-11

【论文】物联网的关键技术2025-11-12

【论文】一种新型欠驱动机械手爪的抓取分析和优化设计2025-11-12

发表评论

文章信息

标题：【论文】DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
作者：RWYQ阿伟
阅读：102
创作：2025-11-12
更新：2025-11-12
分类：论文
标签：论文 AI

搜索: Search

: 312025年05月
【网站】Piskel

【论文】DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

摘要

关键词

作者

时间

语言

格式

大小

页数

截图

下载

解压密码

声明

“【论文】DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning” 的相关文章

发表评论

晋公网安备14030302000174号 | 晋ICP备18012902号-3 | Sitemap
Copyright 2020-2025 阿伟的笔记本 by RWYQ阿伟 . Some Rights Reserved.

Powered By Z-BlogPHP. Theme by TOYEAN.

【论文】DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

摘要

关键词

作者

时间

语言

格式

大小

页数

截图

下载

解压密码

声明

“【论文】DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning” 的相关文章

发表评论取消回复

晋公网安备14030302000174号 | 晋ICP备18012902号-3 | Sitemap Copyright 2020-2025 阿伟的笔记本 by RWYQ阿伟 . Some Rights Reserved.

Powered By Z-BlogPHP. Theme by TOYEAN.

发表评论

晋公网安备14030302000174号 | 晋ICP备18012902号-3 | Sitemap
Copyright 2020-2025 阿伟的笔记本 by RWYQ阿伟 . Some Rights Reserved.