Why are Visually-Grounded Language Models Bad at Image Classification?

Yuhui Zhang, Alyssa Unell, Xiaohan Wang, Dhruba Ghosh, Yuchang Su, Ludwig Schmidt, Serena Yeung-Levy. NeurIPS 2024.

[Paper]