ProText: A Benchmark Dataset for Measuring (Mis)gendering in Long Form Texts

Introducing ProText, a dataset for measuring gender and gender bias in English texts of various styles. ProText includes three dimensions: Thematic Nouns (nouns, functions, subjects, genitives), Thematic Class (typically masculine, stereotypically feminine, neutral/gender neutral), and Pronoun class (masculine, feminine, neutral). The dataset is designed to examine (mis)gendering in text transformations such as summarization and paraphrasing using state-of-the-art Big Linguistic Models, going beyond standard pronoun approximations and beyond the gender binary. We validated ProText with a small study, showing that even with just two notices and two models, we can draw different insights about gender bias, stereotyping, misunderstanding, and gender. We reveal a systematic gender bias, especially if the input contains explicit gender cues or if the models default to heteronormative assumptions.



