[CMU VASC Seminar] Foundation Models for Robotic Manipulation: Opportunities and Challenges

Abstract: Foundation models, such as GPT-4 Vision, have marked significant achievements in the fields of natural language and vision, demonstrating exceptional abilities to adapt to new tasks and scenarios. However, physical interaction—such as cooking, cleaning, or caregiving—remains a frontier where foundation models and robotic systems have yet to achieve the desired level of adaptability and generalization. In this talk, I will discuss the opportunities for incorporating foundation models into classic robotic pipelines to endow robots with capabilities beyond those achievable with traditional robotic tools. The talk will focus on three key improvements in (1) task specification, (2) low-level, and (3) high-level scene modeling. The core idea behind this series of research is to introduce novel representations and integrate structural priors into robot learning systems, incorporating the commonsense knowledge learned from foundation models to achieve the best of both worlds. I will demonstrate how such integration allows robots to interpret instructions given in free-form natural language and perform few- or zero-shot generalizations for challenging manipulation tasks. Additionally, we will explore how foundation models can enable category-level generalization for free and how this can be augmented with an action-conditioned scene graph for a wide range of real-world manipulation tasks involving rigid, articulated, and nested objects (e.g., Matryoshka dolls), and deformable objects. Towards the end of the talk, I will discuss challenges that still lie ahead and potential avenues to address these challenges. Bio: Yunzhu Li is an Assistant Professor of Computer Science at the University of Illinois Urbana-Champaign (UIUC). Before joining UIUC, he collaborated with Fei-Fei Li and Jiajun Wu during his Postdoc at Stanford. Yunzhu earned his PhD from MIT under the guidance of Antonio Torralba and Russ Tedrake. His work stands at the intersection of robotics, computer vision, and machine learning, with the goal of helping robots perceive and interact with the physical world as dexterously and effectively as humans do. Yunzhu’s work has been recognized through the Best Systems Paper Award and the Finalist for Best Paper Award at the Conference on Robot Learning (CoRL). Yunzhu is also the recipient of the Adobe Research Fellowship and was selected as the First Place Recipient of the Ernst A. Guillemin Master’s Thesis Award in Artificial Intelligence and Decision Making at MIT. His research has been published in top journals and conferences, including Nature, NeurIPS, CVPR, and RSS, and featured by major media outlets, including CNN, BBC, The Wall Street Journal, Forbes, The Economist, and MIT Technology Review. Homepage: https://yunzhuli.github.io/

U of T Robotics Institute Seminar: Sergey Levine (UC Berkeley)

U of T Robotics Institute Seminar: Sergey Levine (UC Berkeley)

[NUS Robotics Seminar] Foundation Models for Robotic Manipulation: Opportunities and Challenges

[NUS Robotics Seminar] Foundation Models for Robotic Manipulation: Opportunities and Challenges

RI Seminar: Jitendra Malik : Robot Learning, With Inspiration From Child Development

RI Seminar: Jitendra Malik : Robot Learning, With Inspiration From Child Development

[ICLR-21 simDL] [Invited Talk] Compositional Dynamics Modeling for Physical Inference and Control

[ICLR-21 simDL] [Invited Talk] Compositional Dynamics Modeling for Physical Inference and Control

Microsoft Fabric and Power BI - Developer of the Future⚡ [Full Course]

Microsoft Fabric and Power BI - Developer of the Future⚡ [Full Course]

Debate: Is Scaling Enough to Deploy General Purpose Robots

Debate: Is Scaling Enough to Deploy General Purpose Robots

MIT Robotics - Dieter Fox - Toward Foundational Robot Manipulation Skills

MIT Robotics - Dieter Fox - Toward Foundational Robot Manipulation Skills

Yunzhu Li - Scaling Robotic Manipulation via Structured World Models and Tactile Sensing

Yunzhu Li - Scaling Robotic Manipulation via Structured World Models and Tactile Sensing

RI Seminar: Dieter Fox: Where's RobotGPT?

RI Seminar: Dieter Fox: Where's RobotGPT?

What do tech pioneers think about the AI revolution? - The Engineers, BBC World Service

What do tech pioneers think about the AI revolution? - The Engineers, BBC World Service

OpenVLA: LeRobot Research Presentation #5 by Moo Jin Kim

OpenVLA: LeRobot Research Presentation #5 by Moo Jin Kim

PASS Your Phlebotomy Exam! 💉 Must-Know Terms + Practice Questions

PASS Your Phlebotomy Exam! 💉 Must-Know Terms + Practice Questions

Robotics: why now? - Quan Vuong and Jost Tobias Springberg, Physical Intelligence

Robotics: why now? - Quan Vuong and Jost Tobias Springberg, Physical Intelligence

Stanford Seminar - Multitask Transfer in TRI’s Large Behavior Models for Dexterous Manipulation

Stanford Seminar - Multitask Transfer in TRI’s Large Behavior Models for Dexterous Manipulation

A New Era for Generalist Robotics: The Rise of Humanoids | NVIDIA GTC 2025

A New Era for Generalist Robotics: The Rise of Humanoids | NVIDIA GTC 2025

Fully autonomous robots are much closer than you think – Sergey Levine

Fully autonomous robots are much closer than you think – Sergey Levine

Prof. Sergey Levine: Robotic Foundation Models

Prof. Sergey Levine: Robotic Foundation Models

RI Seminar: Yuke Zhu : Toward Generalist Humanoid Robots

RI Seminar: Yuke Zhu : Toward Generalist Humanoid Robots

🫀 2025 BLS Practice Test | CPR & AED Practice Test with Detailed Answers

🫀 2025 BLS Practice Test | CPR & AED Practice Test with Detailed Answers

MIT Robotics - Andrew Davison - From SLAM to Spatial AI

MIT Robotics - Andrew Davison - From SLAM to Spatial AI