LaCy: What Small Language Models Can and Should Learn It's Not Just a Question of Loss

0 3 1 minute read

LaCy: What Small Language Models Can and Should Learn It's Not Just a Question of Loss

This paper was accepted at the Workshop on Memory for LLM-Based Agetic Systems at ICLR.

Linguistic models have grown continuously to squeeze more information about the world into their parameters, but the information that can be pre-trained on them is limited by the size of their parameters. In particular the power of small language models (SLMs) is limited, leading to incorrect generations. This problem is often mitigated by giving the SLM access to an external source: the ability to query a larger model, documentation, or database. Under this setting, we study the important question of which are the tokens SLM can and should read during training, against which ones to entrust by using a sign. We find that this is not just a question of loss: although a loss is a prediction that the predicted token does not correspond to the ground truth, some tokens acceptable in that they are a separate authentic continuation of the training text, and should not trigger a even if their loss is high. We find that the attacker of the spaCy program can help amplify the loss signal to determine which tokens the SLM should learn to entrust to others in order to prevent true and safe errors to learn and predict even under heavy losses. We propose LaCy, a new training method based on this token selection philosophy. Our tests show that LaCy models successfully learn which tokens to predict and where to transfer help. This results in higher FactScores when generated in cascade with a large model and outperforms the trained SLMs of Rho or LLM-judge, while being simple and cheap.