What Does Your Text Know? (The Answer May Surprise You!)

0 1 1 minute read

What Does Your Text Know? (The Answer May Surprise You!)

Recent work has shown that experimental internal models can reveal a wealth of information that is not apparent in model generations. This creates the risk of unintended or malicious information leakage, where model users are able to read information that the model owner thinks is inaccessible. Using visual language models as a testing ground, we present the first systematic comparison of information stored at different “levels of representation” as it is compressed into the rich information encoded in the residual stream by using two natural constraints: a low dimensional projection of the residual stream obtained using a tuned lens, and a final high $k$

Source link

nimda 3 weeks ago

0 1 1 minute read