Fall 2024 GRASP on Robotics: Ruslan Salakhutdinov, Carnegie Mellon University
“Multimodal AI Agents” ABSTRACT In recent years, the rise of Large Language Models (LLMs) with advanced general capabilities has paved the way towards building language-guided agents that can perform complex, multi-step tasks on behalf of users, much like human assistants. Building agents that can perceive, plan, and act autonomously has long been a central goal of artificial intelligence research. In this talk I will introduce Multimodal AI agents capable of planning, reasoning, and executing actions on the web, that can not only comprehend textual information but also effectively navigate and interact with visual settings I will next present an inference-time search algorithm for agents to explicitly perform exploration and multi-step planning in interactive web environments. Our approach is a form of best-first tree search that operates within the actual environment space, and is complementary with most existing state-of-the-art agents. Finally, I will introduce VisualWebArena, a novel framework for evaluating multimodal autonomous language agents, and offer insights towards building stronger autonomous agents for both digital and physical environments. Presenter Russ Salakhutdinov earned his PhD in computer science from the University of Toronto, where he was advised by Nobel Laureate Geoffrey Hinton. After spending two post-doctoral years at MIT, he joined the University of Toronto and later moved to CMU. He also served as a director of AI research at Apple. Russ’s primary interests lie in deep learning, machine learning, and generative AI. He is an action editor of the Journal of Machine Learning Research, served on the senior programme committee of several top-tier machine learning conferences including NeurIPS, ICLR, and ICML, was a program co-chair for ICML 2019 and general chair for ICML 2024. He has authored over 250 research papers and his work has received over 200,000 citations according to Google Scholar. He is an Alfred P. Sloan Research Fellow, Microsoft Research Faculty Fellow, a recipient of the Early Researcher Award, Google Faculty Award, and Nvidia’s Pioneers of AI award.

Spring 2026 GRASP on Robotics - Mingmin Zhao, University of Pennsylvania

Fall 2024 GRASP on Robotics: Zak Kassas, IEEE AESS DL & The Ohio State University

RL for Agents Workshop - Deep Dive on Training Agents with RL and Open Source

Spring 2026 GRASP SFI - Levi Cai, University of Colorado, Boulder

Adv. LLM Agents MOOC | UC Berkeley Sp25 | Multimodal Autonomous AI Agents by Ruslan Salakhutdinov

Europe Has Become a War Project — Can It Be Stopped? | Yanis Varoufakis & Jeffrey Sachs

Nvidia CEO Jensen Huang Interview| Bloomberg Technology Special
![Data Modeling for Power BI [Full Course] 📊](https://i.ytimg.com/vi/MrLnibFTtbA/hqdefault.jpg?sqp=-oaymwEjCNACELwBSFryq4qpAxUIARUAAAAAGAElAADIQj0AgKJDeAE=&rs=AOn4CLASQdyWMIppxB5x-w51fuei9wE8xw)
Data Modeling for Power BI [Full Course] 📊

What is SonarQube | Introduction SonarQube | SonarQube Tutorial | SonarQube Basics | Intellipaat

Putin's Army Is Running Out Of LOYALTY

AI Is Creating A Rare Opportunity For Investors. How Jim Roppel Is Playing It. | Investing With IBD

Ask Brian Greene LIVE Q&A | World Science Festival

How to Program Allen Bradley PLC Training for Beginners

Fall 2024 GRASP Seminar Anand Bhattad, Toyota Technological Institute at Chicago

Robotics: why now? - Quan Vuong and Jost Tobias Springberg, Physical Intelligence

Abdul Hannan Bin Zulkarnain's PhD Defence

Lecture 1.2 - Multimodal Research Task (CMU Multimodal Machine Learning, Fall 2023)

Python Variables | Python Operators | Python Tutorial For Beginners | Intellipaat

Webinar | Introduction to parallel performance engineering

