diff --git a/docs/深度生成模型/大语言模型/CMU11-667.en.md b/docs/深度生成模型/大语言模型/CMU11-667.en.md new file mode 100644 index 00000000..89fe193a --- /dev/null +++ b/docs/深度生成模型/大语言模型/CMU11-667.en.md @@ -0,0 +1,31 @@ +# CMU11-667: Large Language Models: Methods and Applications + +## Course Overview + +- University: Carnegie Mellon University +- Prerequisites: Solid background in machine learning (equivalent to CMU 10-301/10-601) and natural language processing (equivalent to 11-411/11-611); proficiency in Python and familiarity with PyTorch or similar deep learning frameworks. +- Programming Language: Python +- Course Difficulty: 🌟🌟🌟🌟 +- Estimated Study Hours: 100+ hours + +This graduate-level course provides a comprehensive overview of methods and applications of Large Language Models (LLMs), covering a wide range of topics from core architectures to cutting-edge techniques. Course content includes: + +1. **Foundations**: Neural network architectures for language modeling, training procedures, inference, and evaluation metrics. +2. **Advanced Topics**: Model interpretability, alignment methods, emergent capabilities, and applications in both textual and non-textual domains. +3. **System & Optimization Techniques**: Large-scale pretraining strategies, deployment optimization, and efficient training/inference methods. +4. **Ethics & Safety**: Addressing model bias, adversarial attacks, and legal/regulatory concerns. + +The course blends lectures, readings, quizzes, interactive exercises, assignments, and a final project to offer students a deep and practical understanding of LLMs, preparing them for both research and real-world system development. + +**Self-Study Tips**: + +- Thoroughly read all assigned papers and materials before each class. +- Become proficient with PyTorch and implement core models and algorithms by hand. +- Complete the assignments diligently to build practical skills and reinforce theoretical understanding. + +## Course Resources + +- Course Website: +- Course Videos: Selected lecture slides and materials are available on the website; full lecture recordings may require CMU internal access. +- Course Materials: Curated research papers and supplementary materials, with the full reading list available on the course site. +- Assignments: Six programming assignments covering data preparation, Transformer implementation, retrieval-augmented generation, model evaluation and debiasing, and training efficiency. Details at diff --git a/docs/深度生成模型/大语言模型/CMU11-667.md b/docs/深度生成模型/大语言模型/CMU11-667.md new file mode 100644 index 00000000..88938ceb --- /dev/null +++ b/docs/深度生成模型/大语言模型/CMU11-667.md @@ -0,0 +1,31 @@ +# CMU11-667: Large Language Models: Methods and Applications + +## 课程简介 + +- 所属大学:Carnegie Mellon University +- 先修要求:具备机器学习基础(相当于 CMU 的 10-301/10-601)和自然语言处理基础(相当于 11-411/11-611);熟练掌握 Python,熟悉 PyTorch 等深度学习框架。 +- 编程语言:Python +- 课程难度:🌟🌟🌟🌟 +- 预计学时:100 学时以上 + +该研究生课程全面介绍了大型语言模型(LLM)的方法与应用,涵盖从基础架构到前沿技术的广泛主题。课程内容包括: + +1. **基础知识**:语言模型的网络架构、训练、推理和评估方法。 +2. **进阶主题**:模型解释性、对齐方法、涌现能力,以及在语言任务和非文本任务中的应用。 +3. **扩展技术**:大规模预训练技术、模型部署优化,以及高效的训练和推理方法。 +4. **伦理与安全**:模型偏见、攻击方法、法律问题等。 + +课程采用讲座、阅读材料、小测验、互动活动、作业和项目相结合的方式进行,旨在为学生提供深入理解 LLM 的机会,并为进一步的研究或应用打下坚实基础。 + +**自学建议**: + +- 认真阅读每次课前指定的论文和材料。 +- 熟悉 PyTorch 等深度学习框架,动手实现模型和算法。 +- 扎实完成课程作业。 + +## 课程资源 + +- 课程网站: +- 课程视频:部分讲座幻灯片和材料可在课程网站获取,完整视频可能需通过 CMU 内部平台访问。 +- 课程教材:精选论文和资料,具体阅读列表详见课程网站。 +- 课程作业:共六次作业,涵盖预训练数据准备、Transformer 实现、检索增强生成、模型比较与偏见缓解、训练效率提升等主题,详情见 diff --git a/docs/深度生成模型/大语言模型/CMU11-711.en.md b/docs/深度生成模型/大语言模型/CMU11-711.en.md new file mode 100644 index 00000000..3de7e84a --- /dev/null +++ b/docs/深度生成模型/大语言模型/CMU11-711.en.md @@ -0,0 +1,27 @@ +# CMU 11-711: Advanced Natural Language Processing (ANLP) + +## Course Overview + +* University: Carnegie Mellon University +* Prerequisites: No strict prerequisites, but students should have experience with Python programming, as well as a background in probability and linear algebra. Prior experience with neural networks is recommended. +* Programming Language: Python +* Course Difficulty: 🌟🌟🌟🌟 +* Estimated Workload: 100 hours + +This is a graduate-level course covering both foundational and advanced topics in Natural Language Processing (NLP). The syllabus spans word representations, sequence modeling, attention mechanisms, Transformer architectures, and cutting-edge topics such as large language model pretraining, instruction tuning, complex reasoning, multimodality, and model safety. Compared to similar courses, this course stands out for the following reasons: + +1. **Comprehensive and research-driven content**: In addition to classical NLP methods, it offers in-depth discussions of recent trends and state-of-the-art techniques such as LLaMa and GPT-4. +2. **Strong practical component**: Each lecture includes code demonstrations and online quizzes, and the final project requires reproducing and improving upon a recent research paper. +3. **Highly interactive**: Active engagement is encouraged through Piazza discussions, Canvas quizzes, and in-class Q&A, resulting in an immersive and well-paced learning experience. + +Self-study tips: + +* Read the recommended papers before class and follow the reading sequence step-by-step. +* Set up a Python environment and become familiar with PyTorch and Hugging Face, as many hands-on examples are based on these frameworks. + +## Course Resources + +* Course Website: [https://www.phontron.com/class/anlp-fall2024/](https://www.phontron.com/class/anlp-fall2024/) +* Course Videos: Lecture recordings are available on Canvas (CMU login required) +* Course Texts: Selected classical and cutting-edge research papers + chapters from *A Primer on Neural Network Models for Natural Language Processing* by Yoav Goldberg +* Course Assignments: [https://www.phontron.com/class/anlp-fall2024/assignments/](https://www.phontron.com/class/anlp-fall2024/assignments/) diff --git a/docs/深度生成模型/大语言模型/CMU11-711.md b/docs/深度生成模型/大语言模型/CMU11-711.md new file mode 100644 index 00000000..d6406d06 --- /dev/null +++ b/docs/深度生成模型/大语言模型/CMU11-711.md @@ -0,0 +1,28 @@ +# CMU 11-711: Advanced Natural Language Processing (ANLP) + +## 课程简介 + +* 所属大学:Carnegie Mellon University +* 先修要求:无硬性先修要求,但需具备 Python 编程经验,以及概率论和线性代数基础;有神经网络使用经验者更佳。 +* 编程语言:Python +* 课程难度:🌟🌟🌟🌟 +* 预计学时:100 学时 + +该课程为研究生级别的 NLP 入门与进阶课程,覆盖从词表征、序列建模,到注意力机制、Transformer 架构,再到大规模语言模型预训练、指令微调与复杂推理、多模态和安全性等前沿主题。与其他同类课程相比,本课程: + +1. **内容全面且紧跟最新研究**:除经典算法外,深入讲解近年热门的大模型方法(如 LLaMa、GPT-4 等)。 +2. **实践性强**:每次课配套代码演示与在线小测,学期末项目需复现并改进一篇前沿论文。 +3. **互动良好**:Piazza 讨论、Canvas 测验及现场答疑,学习体验沉浸而有节奏。 + +自学建议: + +* 提前阅读课前推荐文献,跟着阅读顺序循序渐进。 +* 准备好 Python 环境并熟悉 PyTorch/Hugging Face,因为大量实战代码示例基于此。 +* 扎实完成课程作业。 + +## 课程资源 + +* 课程网站:[https://www.phontron.com/class/anlp-fall2024/](https://www.phontron.com/class/anlp-fall2024/) +* 课程视频:课堂讲座录制并上传至 Canvas(需 CMU 帐号登录) +* 课程教材:各类经典与前沿论文+Goldberg《A Primer on Neural Network Models for Natural Language Processing》章节阅读 +* 课程作业:[https://www.phontron.com/class/anlp-fall2024/assignments/](https://www.phontron.com/class/anlp-fall2024/assignments/) diff --git a/docs/深度生成模型/大语言模型/CMU11-868.en.md b/docs/深度生成模型/大语言模型/CMU11-868.en.md new file mode 100644 index 00000000..e221399d --- /dev/null +++ b/docs/深度生成模型/大语言模型/CMU11-868.en.md @@ -0,0 +1,40 @@ +# CMU 11-868: Large Language Model Systems + +## Course Overview + +- University: Carnegie Mellon University +- Prerequisites: Strongly recommended to have taken Deep Learning (11-785) or Advanced NLP (11-611 or 11-711) +- Programming Language: Python +- Course Difficulty: 🌟🌟🌟🌟 +- Estimated Workload: 120 hours + +This graduate-level course focuses on the full stack of large language model (LLM) systems — from algorithms to engineering. The curriculum covers, but is not limited to: + +1. **GPU Programming and Automatic Differentiation**: Master CUDA kernel calls, fundamentals of parallel programming, and deep learning framework design. +2. **Model Training and Distributed Systems**: Learn efficient training algorithms, communication optimizations (e.g., ZeRO, FlashAttention), and distributed training frameworks like DDP, GPipe, and Megatron-LM. +3. **Model Compression and Acceleration**: Study quantization (GPTQ), sparsity (MoE), compiler technologies (JAX, Triton), and inference-time serving systems (vLLM, CacheGen). +4. **Cutting-Edge Topics and Systems Practice**: Includes retrieval-augmented generation (RAG), multimodal LLMs, RLHF systems, and end-to-end deployment, monitoring, and maintenance. + +Compared to similar courses, this one stands out for its **tight integration with recent papers and open-source implementations** (hands-on work expanding CUDA support in the miniTorch framework), a **project-driven assignment structure** (five programming assignments + a final project), and **guest lectures from industry experts**, offering students real-world insights into LLM engineering challenges and solutions. + +**Self-Study Tips**: + +- Set up a CUDA-compatible environment in advance (NVIDIA GPU + CUDA Toolkit + PyTorch). +- Review fundamentals of parallel computing and deep learning (autograd, tensor operations). +- Carefully read the assigned papers and slides before each lecture, and follow the assignments to extend the miniTorch framework from pure Python to real CUDA kernels. + +This course assumes a solid understanding of deep learning and is **not suitable for complete beginners**. See the [FAQ](https://llmsystem.github.io/llmsystem2024spring/docs/FAQ) for more on prerequisites. + +The assignments are fairly challenging and include: + +1. **Assignment 1**: Implement an autograd framework + custom CUDA ops + basic neural networks +2. **Assignment 2**: Build a GPT2 model from scratch +3. **Assignment 3**: Accelerate training with custom CUDA kernels for Softmax and LayerNorm +4. **Assignment 4**: Implement distributed model training (difficult to configure independently for self-study) + +## Course Resources + +- Course Website: +- Syllabus: +- Assignments: +- Course Texts: Selected research papers + selected chapters from *Programming Massively Parallel Processors (4th Edition)* diff --git a/docs/深度生成模型/大语言模型/CMU11-868.md b/docs/深度生成模型/大语言模型/CMU11-868.md new file mode 100644 index 00000000..420e8bc6 --- /dev/null +++ b/docs/深度生成模型/大语言模型/CMU11-868.md @@ -0,0 +1,39 @@ +# CMU 11-868: Large Language Model Systems + +## 课程简介 + +- 所属大学:Carnegie Mellon University +- 先修要求:强烈建议已修读 Deep Learning (11785) 或 Advanced NLP (11-611 或 11-711) +- 编程语言:Python +- 课程难度:🌟🌟🌟🌟 +- 预计学时:120 学时 + +该课程面向研究生开设,聚焦“从算法到工程”的大语言模型系统构建全过程。课程内容包括但不限于: + +1. **GPU 编程与自动微分**:掌握 CUDA kernel 调用、并行编程基础,以及深度学习框架设计原理。 +2. **模型训练与分布式系统**:学习高效的训练算法、通信优化(ZeRO、FlashAttention)、分布式训练框架(DDP、GPipe、Megatron-LM)。 +3. **模型压缩与加速**:量化(GPTQ)、稀疏化(MoE)、编译技术(JAX、Triton)、以及推理时的服务化设计(vLLM、CacheGen)。 +4. **前沿技术与系统实践**:涵盖检索增强生成(RAG)、多模态 LLM、RLHF 系统,以及端到端的在线维护和监控。 + +与同类课程相比,本课程的优势在于**紧密结合最新论文与开源实现**(通过 miniTorch 框架动手扩展 CUDA 支持);**项目驱动**的作业体系(五次编程作业 + 期末大项目);以及**工业嘉宾讲座**,能让学生近距离了解真实世界中 LLM 工程实践的挑战与解决方案。 + +**自学建议**: + +- 提前配置好支持 CUDA 的开发环境(NVIDIA GPU + CUDA Toolkit + PyTorch)。 +- 复习并行计算和深度学习基础(自动微分、张量运算)。 +- 阅读每次课前指定的论文与幻灯片,跟着作业把 miniTorch 框架从纯 Python 拓展到真实 CUDA 内核。 + +该课程要求你对深度学习有一定的预备知识,不适合纯小白入手,可见 [FAQ](https://llmsystem.github.io/llmsystem2024spring/docs/FAQ) 的先修要求。 +实验总体来说是有难度的,主要内容如下: + +1. Assignment1: 自动微分框架 + CUDA 手写算子 + 基础神经网络构建 +2. Assignmant2: GPT2 模型构建 +3. Assignment3: 通过手写 CUDA 的 Softmax 和 LayerNorm 算子优化模型训练速度 +4. Assignment4: 分布式模型训练,自学的话可能不太好配置环境 + +## 课程资源 + +- 课程网站: +- 课程大纲: +- 课程作业: +- 课程教材:精选论文 + 《Programming Massively Parallel Processors, 4th Ed》 部分章节 diff --git a/mkdocs.yml b/mkdocs.yml index 9288c0e5..aef35d18 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -114,6 +114,7 @@ plugins: "国立台湾大学: 李宏毅机器学习": NTU Machine Learning 深度生成模型: Deep Generative Models 学习路线图: Roadmap + "大语言模型": Large Language Models 机器学习进阶: Advanced Machine Learning 学习路线图: Roadmap 后记: Postscript @@ -282,6 +283,10 @@ nav: - "UCB CS285: Deep Reinforcement Learning": "深度学习/CS285.md" - 深度生成模型: - "学习路线图": "深度生成模型/roadmap.md" + - "大语言模型": + - "CMU 11-868: Large Language Model System": "深度生成模型/大语言模型/CMU11-868.md" + - "CMU 11-667: Large Language Models: Methods and Applications": "深度生成模型/大语言模型/CMU11-667.md" + - "CMU 11-711: Advanced Natural Language Processing": "深度生成模型/大语言模型/CMU11-711.md" - 机器学习进阶: - "学习路线图": "机器学习进阶/roadmap.md" - "CMU 10-708: Probabilistic Graphical Models": "机器学习进阶/CMU10-708.md"