Are larger pretrained language models uniformly better? Comparing performance at the instance level

Ruiqi Zhong, Dhruba Ghosh, Dan Klein, Jacob Steinhardt. ACL Findings 2021.

[Paper]