LaCy: What Small Language Models Can and Should Learn It's Not Just a Question of Loss

This paper was accepted at the Workshop on Memory for LLM-Based Agetic Systems at ICLR.
Linguistic models have grown continuously to squeeze more information about the world into their parameters, but the information that can be pre-trained on them is limited by the size of their parameters. In particular the power of small language models (SLMs) is limited, leading to incorrect generations. This problem is often mitigated by giving the SLM access to an external source: the ability to query a larger model, documentation, or database. Under this setting, we study the important question of which are the tokens SLM can and should read during training, against which ones to entrust by using a
- † University of Cambridge
- ** Work done while at Apple



