Phase 07: Transformer 深入
本阶段包含 16 课时。
原始课程来源:AI Engineering from Scratch (MIT License)
- Why Transformers — The Problems with RNNs
- Self-Attention from Scratch
- Multi-Head Attention
- Positional Encoding — Sinusoidal, RoPE, ALiBi
- The Full Transformer — Encoder + Decoder
- BERT — Masked Language Modeling
- GPT — Causal Language Modeling
- T5, BART — Encoder-Decoder Models
- Vision Transformers (ViT)
- Audio Transformers — Whisper Architecture
- Mixture of Experts (MoE)
- KV Cache, Flash Attention & Inference Optimization
- Scaling Laws
- Build a Transformer from Scratch — The Capstone
- Attention Variants — Sliding Window, Sparse, Differential
- Speculative Decoding — Draft, Verify, Repeat