SeeGULL
收藏arXiv2023-05-20 更新2024-06-21 收录
下载链接:
https://github.com/google-research-datasets/seegull
下载链接
链接失效反馈官方服务:
资源简介:
SeeGULL数据集是由弗吉尼亚理工大学的研究团队开发的一个广泛覆盖地理文化的刻板印象基准数据集。该数据集包含179个国家的身份群体的刻板印象,涵盖8个不同的地缘政治区域和6大洲,以及美国和印度的州级身份。数据集不仅捕捉了全球层面的国家性刻板印象,还包括了美国和印度的州级刻板印象。SeeGULL数据集的创建过程涉及利用大型语言模型的少样本学习能力生成潜在的刻板印象,并通过全球多样化的评价者池进行社会定位验证。该数据集的应用领域主要在于检测和减轻自然语言处理模型中的社会刻板印象问题,旨在解决模型在全球部署时可能反映和传播的不良社会偏见和刻板印象。
The SeeGULL dataset is a geographically and culturally comprehensive stereotype benchmark developed by the research team at Virginia Tech. It contains stereotypes of identity groups from 179 countries, spanning 8 distinct geopolitical regions across 6 continents, as well as subnational identities of U.S. and Indian states. The dataset not only captures global national-level stereotypes but also includes subnational stereotypes for states in the United States and India. The development of the SeeGULL dataset leverages the few-shot learning capability of large language models (LLMs) to generate potential stereotypes, followed by social positioning validation through a globally diverse pool of evaluators. The primary application of this dataset is to detect and mitigate social stereotype issues in natural language processing (NLP) models, with the goal of addressing harmful social biases and stereotypes that may be reflected and disseminated when models are deployed globally.
提供机构:
弗吉尼亚理工大学
创建时间:
2023-05-20



