Classifying Gender Biased Language in University of Edinburgh Heritage Collections Archival Metadata Descriptions

Scottish Government Open Data Portal2023-11-10 更新2026-03-28 收录

下载链接：

https://doi.org/10.7488/ds/7539

下载链接

链接失效反馈

官方服务：

资源简介：

These datasets were used to create discriminative text classification models to identify potentially gender biased language. There are datasets for three types of classification models: multilabel document classifiers, multiclass sequence classifiers, and multilabel token classifiers. The data source is the Archives catalog of the University of Edinburgh's Heritage Collections. The archival metadata descriptions extracted from the catalog were labeled according to the Taxonomy of Gendered and Gender Biased Language (published in Havens et al., 2021, linked to as a related paper). Details of the datasets' creation and contents are documented in the Ph.D. thesis by Lucy Havens titled, 'Recalibrating Machine Learning for Social Biases: Demonstrating a New Methodology through a Case Study Classifying Gender Biases in Archival Documentation,' as well as the related papers and GitHub repositories linked to this record.

创建时间：

2023-11-10

5,000+

优质数据集

54 个

任务类型

进入经典数据集