About the talk
Computer vision has recently excelled on a wide range of tasks such as image classification, segmentation and captioning. This impressive progress now powers many applications of internet imaging and yet, current methods still fall short in addressing embodied understanding of visual scenes. What will happen if pushing an object over a table border? What precise actions are required to plant a tree? Building systems that can answer such questions from visual inputs will empower future applications of robotics and personal visual assistants while enabling methods to operate in unstructured real-world environments.
Following this motivation, in this talk we will address models and learning methods that derive procedural knowledge from instructional videos. I will then describe our recent work on visual manipulation and will present a new dataset for long-term story-level video understanding.
---
On a monthly basis, Munich AI Lectures invite top-level AI researchers to give a glimpse into their work and the future of AI. The Munich AI Lectures are a joint initiative of the baiosphere, Bavarian Academy of Science and Humanities (BAdW), Helmholtz Munich, Ludwig Maximilian University of Munich (LMU), Technical University of Munich (TUM), AI-HUB@LMU, ELLIS Chapter Munich, Konrad Zuse School of Excellence in Reliable AI (relAI), Munich Center for Machine Learning (MCML), Munich Data Science Institute (MDSI) at TUM, and Munich Institute of Robotics and Machine Intelligence (MIRMI).
The lectures consist of a short presentation followed by a Q&A to enable a lively discussion with our speakers. Each lecture lasts about one hour and will be streamed live on Munich AI Lectures' YouTube Channel. Recordings will be available afterwards.