Virtual Personas for Language Models with An Anthology of Backstories – Berkeley Artificial Intelligence Research Blog
Introducing An Anthologyhow to position LLMs to be representative, consistent, and diverse individuals by creating and using natural backstories rich in individual values and experiences.
What does it mean that large language models (LLMs) are trained on large texts, produced by millions and billions of different human authors?
In “Language Models as Agent Models”, overwhelming evidence suggests that recent linguistic models can be considered linguistic models. ambassadors: given a text context, LLMs are able to generate a conditional text that represents the properties of the agent that might generate that context. This suggests that, with proper configuration, LLMs can be directed to measure responses to a specific human voice, rather than a combination of words that otherwise appears. If possible, this capability of LLMs could have important implications for user research and the social sciences—conditional linguistic models such as virtual people of human subjects can serve as cost-effective experimental studies and support best practices in human subjects, eg the Belmont principles of equity and beneficence.
In this work, we present An Anthologyhow to direct LLMs to representative, consistent, and diverse visual populations by providing richly detailed life narratives of individuals as context that shapes the models.
In doing so, we also present methods for generating background stories from the LLMs themselves as a way to successfully generate large datasets that cover a variety of demographics. By basing language models on natural backstories, Anthology allows LLMs to simulate individual samples with improved fidelity, measured by matching the distribution and consistency of people's responses.
Our method: An Anthology
Conditioning Language Model Generation through Individual Life Narratives
An important limitation of previous approaches to direct LLMs to real populations has been the inability to reliably measure them. each person population samples. Previous ways to inform LLMs about general demographic information, eg, “I'm 25 years old from California. My highest level of education is less than high school,” which is a body of text generated from a group of demographic variables. With these methods, we are only able to measure human samples in a the level of peoplenot at each level, resulting in:
- Responses to variable LLMs tend to be subjective and/or subjective, as they depend only on demographic variables (eg, race and gender).
- Inability to provide key metrics of interest such as covariance and statistical significance, as individual responses are required for such aggregation.
The anthology enables the limitation of individual subjects by preparing them with detailed backstories. With these background stories, the model captures clear and distinct markers of personal identity, including demographic characteristics and automatic indicators of culture, socioeconomic background, and life philosophies. Our approach involves generating a large collection of background stories that represent a variety of human attributes by using language models that are asked with non-restrictive, open-ended prompts such as, “Tell me about yourself.” We then match the virtual people featured by each backstory to real-world survey samples.
Results: Approximate Estimates of Public Opinion Surveys
In order to analyze, we compare the effectiveness of different methods of positioning the virtual people in the context of measuring three surveys of the Pew Research Center ATP: Waves 34, 92, and 99.
Results on people's estimated responses to the Pew Research Center ATP survey. Bold and underlined results indicate the closest and second closest human values, respectively.
As measures of success in measuring human samples and virtual people, we consider the following metrics:
- Average Wasserstein distance (WD) between response distributions as a measure of representativeness
- Frobenius norm (Fro.) between correlation matrices as a measure of consistency
- Cronbach's alpha as an additional measure of internal consistency
Before analyzing the physical studies, we estimate the lower bounds of each test metric by repeatedly dividing the population into two groups of equal size at random and calculating these metrics between the subgroups. We take the average values from 100 repetitions to represent the minimum binding estimates.
We are always aware of that An Anthology outperforms other benchmarks in all metrics, both Llama-3-70B and Mixtral-8x22B. When comparing two similar methods, the greedy matching method tends to show better performance at the average Wasserstein distance for all waves. We put the difference in the matching methods in the case of one-to-one correspondence for large weight matching and the limited number of virtual users available. In particular, the weights assigned to physical subjects matched in mass matching are inevitably lower than those in selfish matching, since the latter relaxes the constraints on individual correspondence. This difference can lead to lower demographic similarity between real users compared to their counterparts from greedy matching. These results suggest that the richness of the backstories generated in our method yields subtle responses compared to the basics.
Final thoughts
The Anthology marks a promising new direction in situating real people in LLMs that may have the potential to reshape the way we conduct user research, public opinion surveys, and other social science applications by offering an alternative, and sometimes, ethical approach to traditional population surveys. However, the use of Anthology, as in any other use of linguistic models in the social sciences, also brings several considerations to the fore: although the generated backstories help to create a more representative population, there is still a risk of perpetuating bias or breaching privacy, so the results should be used again interpreted with caution.
In terms of future steps, we envision our approach to capitalizing on the most expansive and diverse set of backstories, each representing a consistent life narrative of individuals. Additionally, an important extension of the work would be to consider free-form response generation, which allows for natural and subtle human simulations that go beyond formal survey formats such as multiple choice. Finally, the next exciting aspect of using LLMs in behavioral studies could involve simulating long-term outcomes, allowing virtual personas to model and assess changes over time.
All these directions present many technical challenges; Please let us know if you would like to collaborate or want to discuss more about our work!
Read more about our work: link to full paper
@article{moon2024virtual,
title={Virtual personas for language models via an anthology of backstories},
author={Moon, Suhong and Abdulhai, Marwa and Kang, Minwoo and Suh, Joseph and Soedarmadji, Widyadewi and Behar, Eran Kohen and Chan, David M},
journal={arXiv preprint arXiv:2407.06576},
year={2024}
}