"Efficient Finetuning of Large Language Models via Large-Width Analysis" - Soufiane Hayou

Abstract: Finetuning Large Language Models (LLMs) enhances their performance on downstream tasks — a desirable outcome if the model is used for a specific task. Parameter-efficient finetuning methods such as LoRA (Low-Rank Adaptation) are popular because they allow finetuning large models with relatively low cost. When using LoRA, two hyperparameters critically shape learning: learning rates and initialization. In this talk, I’ll present several results on the role of initialization and learning rate in LoRA finetuninf and distill these insights into practical defaults. Bio: Soufiane Hayou is currently an assistant professor at Johns Hopkins in the department of Applied Mathematics and Statistics with a secondary appointment at the Computer Science department. He is also a member of the Data Science and AI Institute. Previously, he was a research fellow at Simons Institute, UC Berkeley, and a visiting assistant professor of mathematics at the National University of Singapore. He obtained his PhD in statistics and machine learning in 2021 from the University of Oxford and graduated from Ecole Polytechnique in 2018 before joining Oxford. His research is mainly focused on the theory and practice of learning at scale: theoretical analysis of large-scale neural networks with the goal of obtaining principled methods for training/finetuning.