CMU Researchers Propose QueRE: An AI Method for Extracting Useful Features from LLM

nimda January 17, 2025

0 9 3 minutes read

CMU Researchers Propose QueRE: An AI Method for Extracting Useful Features from LLM

Large-scale Language Models (LLMs) are part of various artificial intelligence systems, demonstrating capabilities in natural language processing, decision making, and compositional tasks. However, serious challenges remain in understanding and predicting their behavior. Treating LLMs as black boxes complicates efforts to assess their reliability, especially in situations where errors can have significant consequences. Traditional methods often rely on internal model conditions or gradients to interpret behavior, which are not available in closed-source, API-based models. This limitation raises an important question: how can we effectively evaluate LLM behavior with only black-box access? The problem is further compounded by conflicting influences and possible distortions of models through APIs, highlighting the need for robust and unconventional solutions.

To address these challenges, researchers at Carnegie Mellon University have made progress QueRE (Question of Representation). This method is designed for black-box LLMs and generates low-dimensional, non-visual representations of questionable models with follow-up instructions about their results. These representations, based on the probabilities associated with the responses requested, are used to train the model's performance predictions. Notably, QueRE performs comparable to or better than other white-box techniques in reliability and overall performance.

Unlike methods that rely on internal model conditions or full output distributions, QueRE relies on accessible outputs, such as top-k probabilities available through many APIs. If such probabilities are not available, they can be estimated using sampling. QueRE features also enable experiments such as detecting enemy-induced models and distinguishing between structures and sizes, making it a versatile tool for understanding and implementing LLMs.

Technical Details and Benefits of QueRE

QueRE works by building feature vectors based on query questions asked in LLM. With a specific perspective and model response, these questions assess factors such as confidence and accuracy. Questions like “Are you confident in your answer?” or “Can you explain your answer?” allow the extraction of probabilities that reflect the model's assumptions.

The extracted features are then used to train linear predictors for various functions:

Performance Prediction: Checking that the output of the model is correct at the instance level.
Adversarial detection: Identifying where responses are influenced by malicious commands.
Model Differences: Differentiating between different structures or configurations, such as identifying small distorted models as larger ones.

By relying on low-dimensional operations, QueRE supports strong generalization across operations. Its simplicity ensures robustness and reduces the risk of overfitting, making it an effective tool for evaluating and extracting LLMs from different programs.

Results and details

Experimental tests demonstrate the effectiveness of QueRE across several dimensions. In predicting LLM performance in question answering (QA) tasks, QueRE continued to outperform baselines that rely on internal states. For example, in open QA benchmarks such as SQuAD and Natural Queries (NQ), QueRE achieved an Area Under the Receiver Operating Characteristic Curve (AUROC) of more than 0.95. Likewise, it was successful in detecting enemy-influenced models, outperforming other black-box methods.

QueRE also proved to be robust and transferable. Its features have been successfully used in non-distribution activities and in different LLM settings, which ensures its flexibility. Low-dimensional representations facilitate efficient training of simple models, ensuring computational feasibility and robust convergence parameters.

Another notable result was QueRE's ability to use random sequences of natural language as query commands. These sequences often match or exceed the performance of structured queries, highlighting the method's flexibility and potential for cross-functional applications without extensive manual information engineering.

The conclusion

QueRE provides an efficient and effective way to understand and optimize black box LLMs. By converting persuasive responses into actionable features, QueRE provides an intuitive and robust framework for predicting model behavior, detecting counterintuitive influences, and classifying structures. Its success in testing equipment suggests that it is an important tool for researchers and practitioners aiming to improve the reliability and safety of LLMs.

As AI systems evolve, methods like QueRE will play a key role in ensuring transparency and trust. Future work may explore extending the performance of QueRE to other methods or modifying its techniques to elicit improved performance. Currently, QueRE represents a thoughtful response to the challenges posed by modern AI systems.

Check it out Paper and GitHub page. All credit for this study goes to the researchers of this project. Also, don't forget to follow us Twitter and join our Telephone station again LinkedIn Grup. Don't forget to join our 65k+ ML SubReddit.

🚨 Recommend Open Source Platform: Parlant is a framework that is changing the way AI agents make decisions in customer-facing situations. ^(Promoted)

Sajjad Ansari is a final year graduate of IIT Kharagpur. As a Tech Enthusiast, he examines the applications of AI with a focus on understanding the impact of AI technologies and their real-world implications. He aims to convey complex AI concepts in a clear and accessible manner.

📄 Meet 'Height': Independent project management tool (Sponsored)

Source link

nimda January 17, 2025

0 9 3 minutes read