FAIR-Compliant Dataset

Name: FAIR-Compliant Dataset
Creator: Vector Institute for Artificial Intelligence
Published: 2024-04-03 18:34:10
License: 暂无描述

arXiv2024-04-03 更新2024-06-21 收录

下载链接：

https://huggingface.co/collections/newsmediabias/biasscan-659d681ed7a5bc9d98cde11b

下载链接

链接失效反馈

官方服务：

资源简介：

FAIR-Compliant Dataset是由Vector Institute for Artificial Intelligence开发的一个全面数据集，包含超过50,000条从2023年1月至5月的新闻源和网站中筛选出的条目。该数据集通过使用如#MediaBias, #SocialJustice等标签和话题，确保了数据在社会议题上的广泛代表性。数据集的详细元数据包括标题、描述、作者、创建日期、版本和关键词，这些关键词如'LLMs, Training, Biases, News Media, NLP'有助于在研究门户中高效检索。此外，数据集遵循FAIR原则，确保数据的高质量和组织性，以最大化其在模型训练中的效用，从而增强模型的性能和可靠性。该数据集特别关注于在训练大型语言模型之前识别和减轻偏见，特别是在针对保护群体的语言偏见方面。

The FAIR-Compliant Dataset is a comprehensive dataset developed by the Vector Institute for Artificial Intelligence, comprising over 50,000 entries curated from news sources and websites between January and May 2023. This dataset ensures broad representativeness across social issues by utilizing tags and topics such as #MediaBias and #SocialJustice. The detailed metadata of the dataset includes title, description, author, creation date, version, and keywords—such as "LLMs, Training, Biases, News Media, NLP"—which facilitate efficient retrieval in research portals. Furthermore, the dataset adheres to the FAIR Principles, ensuring high data quality and organizational structure to maximize its utility in model training, thereby enhancing model performance and reliability. This dataset specifically focuses on identifying and mitigating biases, particularly linguistic biases targeting protected groups, prior to large language model training.

提供机构：

Vector Institute for Artificial Intelligence

创建时间：

2024-01-20

5,000+

优质数据集

54 个

任务类型

进入经典数据集