Phase 10: 从零构建 LLM
本阶段包含 24 课时。
原始课程来源:AI Engineering from Scratch (MIT License)
- Tokenizers: BPE, WordPiece, SentencePiece
- Building a Tokenizer from Scratch
- Data Pipelines for Pre-Training
- Pre-Training a Mini GPT (124M Parameters)
- Scaling: Distributed Training, FSDP, DeepSpeed
- Instruction Tuning (SFT)
- RLHF: Reward Model + PPO
- DPO: Direct Preference Optimization
- Constitutional AI and Self-Improvement
- Evaluation: Benchmarks, Evals, LM Harness
- Quantization: Making Models Fit
- Inference Optimization
- Building a Complete LLM Pipeline
- Open Models: Architecture Walkthroughs
- Speculative Decoding and EAGLE-3
- Differential Attention (V2)
- Native Sparse Attention (DeepSeek NSA)
- Multi-Token Prediction (MTP)
- DualPipe Parallelism
- DeepSeek-V3 Architecture Walkthrough
- Jamba — Hybrid SSM-Transformer
- Async and Hogwild! Inference
- Speculative Decoding and EAGLE
- Gradient Checkpointing and Activation Recomputation