Events

Cordelia Schmid is a pioneer in AI research. She invented procedures in the field of image recognition that enabled computers to semantically interpret image and video content. Her computer vision algorithms are key for the development of robotic assistants that can, in the future, recognize their surroundings and respond to spoken commands. Her work has been honored with important awards, including the Körber Prize, endowed with one million euros, her most recent achievement. Join her talk on 25. January 2024, 17:00 - 18:00 CET.

Advances in Dense Video Captioning, Vision-Guided Navigation and Robot Manipulation
Dr. Cordelia Schmid (Inria Institute and Google Research)| ONE MUNICH Distinguished Lectures on AI & Robotics - Next Generation Human-Centered Robotics (Inria Institute and Google Research) | 25. January 2024, 17:00 - 18:00 CET

Location on site: Karl Max von Bauernfeind Hörsaal. Room number 0507.03.750. (3. Obergeschoß). Arcisstr. 21, 80333 München. See map
Location online: https://tum-conf.zoom-x.de/j/64026708312?pwd=ZStuNEl1dlR0Skt0S2NZSlZtcHM3QT09
Meeting-ID: 640 2670 8312. | Password: mirmi | Add to your calendar

In this talk, we first present recent progress in large-scale learning of multimodal video representations. We present Vid2Seq, a model for dense video captioning that takes as input video and speech and predicts both temporal boundaries and textual descriptions simultaneously. We then present an approach for video question answering and image captioning that relies on a retrieval-augmented visual language model that learns to encode world knowledge into a large-scale memory and to retrieve from it to answer knowledge-intensive queries. We show that our approach achieves state-of-the-art results in visual question answering and image captioning.

In the second part of the talk, we introduce recent work on vision-guided navigation and robot manipulation given language instructions. This work builds on and extends vision-language transformers by integrating action history and predicting actions. The History Aware Multimodal Transformer (HAMT) outperforms the state of the art on different vision-language-navigation benchmarks. Further improvements are achieved by integrating map information into the transformer architecture. We show object goal navigation in the real world, here on the Tiago robot. Next, we demonstrate that such a transformer-based approach can also be used for manipulation and evidence of the importance of 3D visual representation. Our approach achieves excellent real-world performance on a UR5 arm.

Speaker

Dr. Cordelia Schmid

Cordelia Schmid holds a M.S. degree in Computer Science from the University of Karlsruhe and a Doctorate in Computer Science, from the Institut National Polytechnique de Grenoble (INPG). Her doctoral thesis on "Local Greyvalue Invariants for Image Matching and Retrieval" received the best thesis award from INPG in 1996. She received the Habilitation degree in 2001 for her thesis entitled "From Image Matching to Learning Visual Models". Dr. Schmid was a post-doctoral research assistant in the Robotics Research Group of Oxford University from 1996--1997. Since 1997 she has held a permanent research position at Inria, where she is a research director. Dr. Schmid is a member of the German National Academy of Sciences, Leopoldina and a fellow of IEEE and the ELLIS society. She was awarded the Longuet-Higgins Prize in 2006, 2014 and 2016, the Koenderink Prize in 2018, and the Helmholtz Prize in 2023, for fundamental contributions in computer vision that have withstood the test of time. She received an ERC advanced grant in 2013, the Humboldt Research Award in 2015, the Inria & French Academy of Science Grand Prix in 2016, the Royal Society Milner Award in 2020, the PAMI Distinguished Researcher Award in 2021, and the Körber European Science Prize in 2023. Dr. Schmid has been an Associate Editor for IEEE PAMI (2001--2005) and for IJCV (2004--2012), an editor-in-chief for IJCV (2013-2018), a program chair of IEEE CVPR 2005 and ECCV 2012 as well as a general chair of IEEE CVPR 2015, ECCV 2020 and ICCV 2023. Since 2018 she has held a joint appointment with Google Research.

ONE MUNICH Distinguished Lectures on AI & Robotics is an initiative of the ONE MUNICH Strategy Forum - Next Generation Human-Centered Robotics, which unites three leading Munich research institutes Technical University of Munich (TUM), Ludwig Maximilian University of Munich (LMU) and Helmholtz Zentrum München (HMGU). Together we aim to address societal health challenges of today and the future by connecting our highly complementary expertise and efforts in the fundamental, theoretical, and applied machine intelligence, system sciences, and translational medicine. The realization of these talks is supported by a network of initiatives and institutions dedicated to robotics and AI in Germany, which includes, among others, baiosphere – The Bavarian AI network and the Munich Data Science Institute (MDSI).

◄ Back to: Events

To top

Events

ONE MUNICH Distinguished Lectures on AI & Robotics with Dr. Cordelia Schmid (Inria Institute and Google Research)

Dr. Cordelia Schmid

Links: