CSE5519 Advances in Computer Vision (Topic H: 2025: Safety, Robustness, and Evaluation of CV Models)

Incorporating Geo-Diverse Knowledge into Prompting for Increased Geographic Robustness in Object Recognition

Does adding geographical context to CLIP prompts improve recognition across geographies?

Yes, about 1%

Can an LLM provide useful geographic descriptive knowledge to improve recognition?

Yes

How can we optimize soft prompts for CLIP using an accessible data source with consideration of target geographies not represented in the training set?

Where can soft prompts enhanced with geographical knowledge provide the most benefits?

Tip

This model proposed an effective way to improve the model performance by self-querying geographical data.

I wonder what might be the ultimate boundary of the LLM-generated context and performance improvement. Theoretically, it seems that we can use LLM to generate the majority of possible contexts before making predictions and use the context to improve the performance. However, introducing additional (might be irrelevant) information may generate hallucinations. I wonder if we can find a general approach to let LLM generate a decent context for the task and use the context to improve the performance.