Improving few-shot performance of language models via nearest neighbor calibration

F Nie, M Chen, Z Zhang, X Cheng�- arXiv preprint arXiv:2212.02216, 2022 - arxiv.org
F Nie, M Chen, Z Zhang, X Cheng
arXiv preprint arXiv:2212.02216, 2022arxiv.org
Pre-trained language models (PLMs) have exhibited remarkable few-shot learning
capabilities when provided a few examples in a natural language prompt as demonstrations
of test instances, ie, in-context learning. However, the performance of in-context learning is
susceptible to the choice of prompt format, training examples and the ordering of the training
examples. In this paper, we propose a novel nearest-neighbor calibration framework for in-
context learning to ease this issue. It is inspired by a phenomenon that the in-context�…
Pre-trained language models (PLMs) have exhibited remarkable few-shot learning capabilities when provided a few examples in a natural language prompt as demonstrations of test instances, i.e., in-context learning. However, the performance of in-context learning is susceptible to the choice of prompt format, training examples and the ordering of the training examples. In this paper, we propose a novel nearest-neighbor calibration framework for in-context learning to ease this issue. It is inspired by a phenomenon that the in-context learning paradigm produces incorrect labels when inferring training instances, which provides a useful supervised signal to calibrate predictions. Thus, our method directly augments the predictions with a -nearest-neighbor (NN) classifier over a datastore of cached few-shot instance representations obtained by PLMs and their corresponding labels. Then adaptive neighbor selection and feature regularization modules are introduced to make full use of a few support instances to reduce the NN retrieval noise. Experiments on various few-shot text classification tasks demonstrate that our method significantly improves in-context learning, while even achieving comparable performance with state-of-the-art tuning-based approaches in some sentiment analysis tasks.
arxiv.org
Showing the best result for this search. See all results