Hear from:
Efficient visual detection is a crucial component in self-driving perception and lays the foundation for later planning and control stages. Deep-networks-based visual systems achieve state-of-the-art performance, but they are usually cumbersome and computationally infeasible for embedded devices (e.g., dash cams). Knowledge distillation is an effective way to derive more efficient models. However, most existing works target classification tasks and treat all instances equally. In this presentation, we first present our Adaptive Instance Distillation (AID) method for self-driving visual detection. It can selectively impart the teacher’s knowledge to the student by re-weighing each instance and each scale for distillation based on the teacher’s loss. In addition, to enable the student to effectively digest knowledge from multiple sources, we also propose a Multi-Teacher Adaptive Instance Distillation (M-AID) method. Our M-AID helps the student to learn the best knowledge from each teacher w.r.t. certain instances and scales. Unlike previous KD methods, our M-AID adjusts the distillation weights in an instance, scale, and teacher adaptive manner. Experiments on the KITTI, COCO-Traffic, and SODA10M datasets show that our methods improve the performance of a wide variety of state-of-the-art KD methods on different detectors in self-driving scenarios. Compared to the baseline, our AID leads to an average of 2.28% and 2.98% mAP increases for single-stage and two-stage detectors, respectively. By strategically integrating knowledge from multiple teachers, our M-AID method achieves an average of 2.92% mAP improvement.