好莱坞身份偏见数据集(Hollywood Identity Bias Dataset)

Name: 好莱坞身份偏见数据集(Hollywood Identity Bias Dataset)
Creator: 印度理工学院孟买分校的语言技术中心
Published: 2022-06-01 13:43:53
License: 暂无描述

arXiv2022-06-01 更新2024-06-21 收录

下载链接：

https://github.com/sahoonihar/HIBD_LREC_2022

下载链接

链接失效反馈

官方服务：

资源简介：

好莱坞身份偏见数据集(Hollywood Identity Bias Dataset)是由印度理工学院孟买分校的语言技术中心和Accenture Labs共同创建的，旨在分析电影对话中的身份偏见。该数据集包含35部好莱坞电影的剧本，共计49117个句子，其中1181个句子被标注为存在偏见，约占总数的2.5%。数据集的创建过程涉及详细的标注流程，包括对敏感性、刻板印象和多种身份偏见类别的标注。该数据集的应用领域主要集中在电影产业，帮助识别和分类电影剧本中的偏见对话，以避免发布后可能引起的争议和损失。

The Hollywood Identity Bias Dataset was co-created by the Language Technology Centre at the Indian Institute of Technology Bombay and Accenture Labs, with the aim of analyzing identity bias in film dialogue. This dataset includes scripts from 35 Hollywood movies, comprising a total of 49,117 sentences, among which 1,181 sentences are annotated as biased, accounting for approximately 2.5% of the entire dataset. The creation of this dataset involved a rigorous annotation process covering sensitivity, stereotypes, and multiple categories of identity bias. Its main application scenarios are within the film industry, where it facilitates the identification and classification of biased dialogue in film scripts to prevent potential controversies and losses that may occur after release.

提供机构：

印度理工学院孟买分校的语言技术中心

创建时间：

2022-06-01

搜集汇总

数据集介绍