Finetuning for Object Level Open Vocabulary Image Retrieval

You must be logged in to watch this session

First Name*

Last Name*

Job Title

Company Name

Country of Residence*

Work E-mail Address*

Password*

Confirm Password*

Your personal data will be used to support your experience throughout this website, to manage access to your account, and for other purposes described in our privacy policy.

Sense Media, on behalf of AutoSens, needs the contact information you provide to us to update you with information about AutoSens and our products. You may unsubscribe from these communications at anytime. For information on how to unsubscribe, as well as our privacy practices and commitment to protecting your privacy, check out our privacy policy.

Only fill in if you are not human

Finetuning for Object Level Open Vocabulary Image Retrieval

Event: AutoSens Europe

| Session date: Thursday 9th October

Session date: Thursday 9th October

, 2025

Hear from:

Guy Heller

Researcher,

General Motors

Guy Heller

Researcher,

General Motors

Modern ADAS and autonomous driving systems generate terabytes of sensor data, creating a need for intelligent systems that can retrieve images containing objects of interest described by natural language queries. Applications of such systems are varied, including rare-object mining from large-scale unlabeled datasets, targeted data annotation, and streamlining system evaluation processes. The previous leading approach relies on aggregating OpenAI CLIP features without any adaptation to the target domain, ultimately limiting its performance. Our work “FOR: Finetuning for Object Level Open Vocabulary Image Retrieval”, WACV 2025, addresses this limitation through fine-tuning on a target dataset using closed-set labels, while preserving the visual-language association that is crucial for open-set retrieval. FOR is based on a combination of dedicated architecture elements based on CLIP, coupled with a multi-objective training framework. Together, these design choices result in a significant accuracy improvement over previous SoTA across multiple datasets.