research | Jennifer Mickel

My research interests lie in AI and algorithmic fairness, and natural language processing (NLP). The goal of my work, anchored in intersectionality and a multicultural interdisciplinary perspective, is to

understand the social impact of generative AI on users and society (such as understanding social biases and representation)
develop robust algorithms, frameworks, and evaluations for addressing and understanding the social impact of generative AI in varying contexts

2025

More of the Same: Persistent Representational Harms Under Increased Representation

Jennifer Mickel, Maria De-Arteaga, Leqi Liu, and 1 more author

2025

Abs Bib PDF

To recognize and mitigate the harms of generative AI systems, it is crucial to consider who is represented in the outputs of generative AI systems and how people are represented. A critical gap emerges when naively improving who is represented, as this does not imply bias mitigation efforts have been applied to address how people are represented. We critically examined this by investigating gender representation in occupation across state-of-the-art large language models. We first show evidence suggesting that over time there have been interventions to models altering the resulting gender distribution, and we find that women are more represented than men when models are prompted to generate biographies or personas. We then demonstrate that representational biases persist in how different genders are represented by examining statistically significant word differences across genders. This results in a proliferation of representational harms, stereotypes, and neoliberalism ideals that, despite existing interventions to increase female representation, reinforce existing systems of oppression.
@misc{mickel2025more, title = {More of the Same: Persistent Representational Harms Under Increased Representation}, author = {Mickel, Jennifer and De-Arteaga, Maria and Liu, Leqi and Tian, Kevin}, journal = {arXiv preprint arXiv:2503.00333}, year = {2025}, eprint = {2503.00333}, archiveprefix = {arXiv}, url = {https://arxiv.org/pdf/2503.00333}, }

2024

Evaluating the Social Impact of Generative AI Systems in Systems and Society

Irene Solaiman, Zeerak Talat, William Agnew, and 28 more authors

2024

Abs Bib PDF

Generative AI systems across modalities, ranging from text (including code), image, audio, and video, have broad social impacts, but there is no official standard for means of evaluating those impacts or for which impacts should be evaluated. In this paper, we present a guide that moves toward a standard approach in evaluating a base generative AI system for any modality in two overarching categories: what can be evaluated in a base system independent of context and what can be evaluated in a societal context. Importantly, this refers to base systems that have no predetermined application or deployment context, including a model itself, as well as system components, such as training data. Our framework for a base system defines seven categories of social impact: bias, stereotypes, and representational harms; cultural values and sensitive content; disparate performance; privacy and data protection; financial costs; environmental costs; and data and content moderation labor costs. Suggested methods for evaluation apply to listed generative modalities and analyses of the limitations of existing evaluations serve as a starting point for necessary investment in future evaluations. We offer five overarching categories for what can be evaluated in a broader societal context, each with its own subcategories: trustworthiness and autonomy; inequality, marginalization, and violence; concentration of authority; labor and creativity; and ecosystem and environment. Each subcategory includes recommendations for mitigating harm.
@misc{solaiman2024evaluatingsocialimpactgenerative, title = {Evaluating the Social Impact of Generative AI Systems in Systems and Society}, author = {Solaiman, Irene and Talat, Zeerak and Agnew, William and Ahmad, Lama and Baker, Dylan and Blodgett, Su Lin and Chen, Canyu and au2, Hal Daumé III and Dodge, Jesse and Duan, Isabella and Evans, Ellie and Friedrich, Felix and Ghosh, Avijit and Gohar, Usman and Hooker, Sara and Jernite, Yacine and Kalluri, Ria and Lusoli, Alberto and Leidinger, Alina and Lin, Michelle and Lin, Xiuzhu and Luccioni, Sasha and Mickel, Jennifer and Mitchell, Margaret and Newman, Jessica and Ovalle, Anaelia and Png, Marie-Therese and Singh, Shubham and Strait, Andrew and Struppek, Lukas and Subramonian, Arjun}, year = {2024}, eprint = {2306.05949}, archiveprefix = {arXiv}, primaryclass = {cs.CY}, url = {https://arxiv.org/abs/2306.05949}, }
Racial/Ethnic Categories in AI and Algorithmic Fairness: Why They Matter and What They Represent

Jennifer Mickel

In Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency, 2024

Abs Bib PDF Slides

Racial diversity has become increasingly discussed within the AI and algorithmic fairness literature, yet little attention is focused on justifying the choices of racial categories and understanding how people are racialized into these chosen racial categories. Even less attention is given to how racial categories shift and how the racialization process changes depending on the context of a dataset or model. An unclear understanding of who comprises the racial categories chosen and how people are racialized into these categories can lead to varying interpretations of these categories. These varying interpretations can lead to harm when the understanding of racial categories and the racialization process is misaligned from the actual racialization process and racial categories used. Harm can also arise if the racialization process and racial categories used are irrelevant or do not exist in the context they are applied. In this paper, we make two contributions. First, we demonstrate how racial categories with unclear assumptions and little justification can lead to varying datasets that poorly represent groups obfuscated or unrepresented by the given racial categories and models that perform poorly on these groups. Second, we develop a framework, CIRCSheets, for documenting the choices and assumptions in choosing racial categories and the process of racialization into these categories to facilitate transparency in understanding the processes and assumptions made by dataset or model developers when selecting or using these racial categories.
@inproceedings{mickel2024racial, author = {Mickel, Jennifer}, title = {Racial/Ethnic Categories in AI and Algorithmic Fairness: Why They Matter and What They Represent}, year = {2024}, isbn = {9798400704505}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3630106.3659050}, doi = {10.1145/3630106.3659050}, booktitle = {Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency}, pages = {2484–2494}, numpages = {11}, keywords = {algorithmic fairness, race and ethnicity, racial categories, racialization}, location = {Rio de Janeiro, Brazil}, series = {FAccT '24}, }
Intersectional Insights for Robust Models: Introducing FOG 😶‍🌫️ for Improving Worst Case Performance Without Group Information

Jennifer Mickel

Turing Scholars Honors Thesis, 2024

Abs Bib PDF Slides

Standard training through empirical risk minimization (ERM) can result in seemingly well-performing models that reach high accuracy on average but achieve low accuracy on specific groups. Group-specific low accuracy is especially of concern in cases in which groups are underrepresented in the training data or when spurious correlations are present within data. Furthermore, instances can be a part of multiple groups, as in the case of demographic groups. Previous approaches, such as group distributional robust optimization (Group DRO), achieve high worst-group accuracy yet require group information. Group information is not always available due to legal, data quality, or cost constraints. Other approaches not requiring group information exist, but gaps between these approaches and group DRO persist and seldom consider overlapping groups. We develop a model development cycle and algorithm Fog to improve the performance of the worst-performing group without group information that accounts for overlapping groups. We first train a model using ERM and utilize the model features corresponding with the data to identify groups. We use these identified groups with group DRO to train a new model. This process can be repeated to improve performance. Using our method, we find that we can improve the performance of the worst-performing group compared to ERM and other algorithms not requiring group information, such as JTT.
@article{mickel2024intersectional, title = {Intersectional Insights for Robust Models: Introducing FOG 😶‍🌫️ for Improving Worst Case Performance Without Group Information}, author = {Mickel, Jennifer}, school = {The University of Texas at Austin}, journal = {Turing Scholars Honors Thesis}, year = {2024}, }

2023

The Importance of Multi-Dimensional Intersectionality in Algorithmic Fairness and AI Model Development

Jennifer Mickel

Polymathic Scholars Honors Thesis, 2023

Abs Bib PDF Slides

People are increasingly interacting with artificial intelligence (AI) systems and algorithms, but oftentimes, these models are embedded with unfair biases. These biases can lead to harm when an AI system’s output is implicitly or explicitly racist, sexist, or derogatory. If the output is offensive to a person interacting with it, it can cause the person emotional harm that may manifest physically. Alternatively, if a person agrees with the model’s output, the person’s negative biases may be reinforced, inciting the person to engage in discriminatory behavior. Researchers have recognized the harm AI systems can lead to, and they have worked to develop fairness definitions and methodologies for mitigating unfair biases in machine learning models. Unfortunately, these definitions (typically binary) and methodologies are insufficient for preventing AI models from learning unfair biases. To address this, fairness definitions and methodologies must account for intersectional identities in multicultural contexts. The limited scope of fairness definitions allows for models to develop biases against people with intersectional identities that are unaccounted for in the fairness definition. Existing frameworks and methodologies for model development are based in the US cultural context, which may be insufficient for fair model development in different cultural contexts. To assist machine learning practitioners in understanding the intersectional groups affected by their models, a database should be constructed detailing the intersectional identities, cultural contexts, and relevant model domains in which people may be affected. This can lead to fairer model development, for machine learning practitioners will be better adept at testing their model’s performance on intersectional groups.
@article{mickel2023importance, title = {The Importance of Multi-Dimensional Intersectionality in Algorithmic Fairness and AI Model Development}, author = {Mickel, Jennifer}, school = {The University of Texas at Austin}, journal = {Polymathic Scholars Honors Thesis}, year = {2023}, }