Can Large Language Models Understand Context?

Understanding context is key to understanding human language, an ability that Large-Scale Language Models (LLMs) continue to demonstrate to a remarkable degree. However, although the evaluation of LLMs covers various domains within the field of Natural Language Processing, limited attention has been paid to evaluating their linguistic ability to understand contextual features. This paper presents a benchmark for contextual understanding by adapting existing datasets to suit the testing of generative models. This benchmark includes four different tasks and nine datasets, all of which contain information designed to test the models' ability to understand context. First, we examine the performance of LLMs under the condition of prior learning of the content. Experimental results show that pre-trained dense models face difficulties in understanding subtle context features compared to state-of-the-art fine-tuned models. Second, as LLM compression has increasing importance in both research and real-world applications, we examine the contextual understanding of value models under in-contextual learning settings. We find that 3-bit post-training quantization leads to varying degrees of performance degradation in our benchmark. We perform extensive analysis of these conditions to confirm our experimental results.
- † Georgetown University
- ** Work done while at Apple



