CMU Advanced NLP Spring 2025 (20): Advanced Post-Training

This lecture (by Sean Welleck) for CMU CS 11-711, Advanced NLP covers: Supervised Fine-tuning Reward Modeling Reinforcement Learning Direct Preference Optimization