AutoSens Academy: Instructor spotlight on Prof. Dr. Abhinav Valada

April, 2021

Prof. Dr. Abhinav Valada is a visionary who has transformed the autonomous vehicle perception landscape with numerous innovative deep learning techniques for some of the most fundamental problems. From state-of-the-art scene understanding algorithms to multimodal perception and visual state estimation techniques, his work has enabled autonomous systems to perceive the real world more accurately, more robustly and more efficiently.

Prof. Valada received his bachelor’s degree in Electronics and Instrumentation Engineering from VIT University in India and his master’s degree in Robotics from the prestigious Robotics Institute of Carnegie Mellon University, USA.

Subsequently, he worked for three years as a Systems/Software Engineer at the world renowned Field Robotics Center and the National Robotics Engineering Center of Carnegie Mellon University, where he advanced robotic perception. The several techniques that he developed during this time can be found integrated into commercial systems today including the caterpillar haulage trucks, DARPA robotics challenge robot CHIMP (won third place), DARPA subterranean robots, amongst others. He then co-founded and served as the Director of Operations of Platypus LLC for three years, a company that develops autonomous robotic boats.

Prof. Valada then moved to Germany to pursue his PhD in robotics and machine learning in the University of Freiburg. He received his PhD with his thesis titled “Discovering and Leveraging Deep Multimodal Structure for Reliable Robot Perception and Localization”, which was also a finalist for the Georges Giralt PhD award for the best robotics thesis in Europe. He subsequently worked as a Postdoctoral Research Scientist at the Autonomous Intelligent Systems lab, and was then appointed as an Assistant Professor and Director of the Robot Learning Lab in University of Freiburg, Germany, where he currently works. Prof. Valada is also a core faculty in the European Laboratory for Learning and Intelligent Systems (ELLIS) unit, an Associate Editor for IEEE Robotics and Automation Letters journal, ICRA, IROS, and Area Chair of CoRL.

Prof. Valada has developed several state-of-the-art semantic segmentation algorithms that have significantly boosted the performance on standard benchmarks including Cityscapes, Mapillary Vistas, KITTI, SUN RGB-D, and ScanNet. A key challenge addressed here focuses on enabling networks to effectively learn multiscale features and ensuring that they have a large enough effective receptive field to capture large context. Previous techniques that achieve this consume a substantial amount of parameters that make them infeasible to employ in real world applications that require fast inference times. Prof. Valada addressed these problems by introducing two novel fully-convolutional encoder-decoder architectures, AdapNet and AdapNet++. These models have defined the state-of-the-art in semantic segmentation while being more than twice as fast as competing methods. Although state-of-the-art scene understanding models perform exceedingly well in ideal perceptual conditions, they often perform very poorly in adverse conditions such as rain, snow and fog. In order to alleviate this problem, Prof. Valada has proposed techniques to adaptively leverage features from complementary modalities such as depth and infrared, and fuse them with visual RGB information. The major challenge in this problem is to enable the network to effectively exploit complementary features as several factors influence this decision including the spatial location of the objects in the world, the semantic category of the objects and the scene context. Most existing multimodal networks directly add or concatenate modality-specific features which does not enable the networks to account for these factors as they change from scene to scene. Prof. Valada introduced two novel adaptive multimodal fusion mechanisms that are widely used today. The first CMoDE fusion approach learns to probabilistically weigh features of individual modality streams according to the semantic class. During inference, the model adaptively fuses the features depending on the semantic objects in the scene as well as the information contained in the individual modalities. The second SSMA approach dynamically fuses the features from individual modality streams according to the object classes, their spatial locations in the scene, the scene context as well as the information contained in the modalities. These techniques have significantly improved the robustness of scene understanding models in adverse perceptual conditions such as rain and fog.

More recently, Prof. Valada proposed the EfficientPS model for panoptic segmentation that broke multiple records in the image recognition ability of self-driving cars. The model effectively addresses the panoptic segmentation task that enables reasoning about object instances as well as the general scene layout in an efficient manner. The EfficientPS model is not only the best performing approach for panoptic segmentation till date, it is also the most computationally efficient and fastest approach. Furthermore, EfficientPS also won the ECCV Robust Vision Challenge 2020 panoptic segmentation track, which demonstrates the impact of this technique. Prof. Valada further pushed the boundaries by introducing a new perception task called Multi-Object Panoptic Tracking (MOPT). MOPT allows for modeling dynamic scenes holistically by unifying panoptic segmentation and multi-object tracking which are two essential tasks for self-driving cars, into a single coherent scene understanding problem. Prof. Valada proposed PanopticTrackNet, an end-to-end learning model that addresses the MOPT task. This new MOPT task and the PanopticTrackNet approach that addresses it, has the potential to redefine how scene understanding and object tracking are being performed in the self-driving car industry which are currently tackled by two disjoint large teams. If indeed these two essential tasks can be performed in a single end-to-end learning model, this will revolutionize how we have been addressing these two tasks all these years.

In summary, Prof. Valada has made a large number of highly innovative contributions by introducing learning methods to effectively address some of the most fundamental problems in autonomous perception. These techniques exceed the performance of current state-of-the-art methods both in benchmarking performance as well as computational efficiency, in addition to being practical solutions that are being used in the industry today. The contributions in aforementioned works go beyond existing paradigms by enabling research in new directions whose impact can be seen for years to come.

Don’t miss this opportunity to learn from Prof. Valada directly in two of our AutoSens Academy modules.