twinviews-13k
收藏TwinViews-13k 数据集概述
数据集描述
TwinViews-13k 是一个包含 13,855 对左倾和右倾政治声明的数据集,每对声明按主题匹配。该数据集旨在研究奖励模型和语言模型中的政治偏见,特别关注真实性与政治观点之间的关系。数据集使用 GPT-3.5 Turbo 生成,并经过严格审核以确保质量和意识形态平衡。
数据集详情
数据集结构
- 数据量: 13,855 对左倾和右倾政治声明。
- 字段:
l: 左倾政治声明。r: 右倾政治声明。topic: 声明对的主题(如税收、气候、教育)。
数据集用途
-
直接用途:
- 研究奖励模型和大语言模型(LLMs)中的政治偏见。
- 评估 LLMs 的对齐技术,特别是关于真实性和政治偏见。
- 在政治话语分析的背景下训练和/或评估模型。
- 研究政治观点和对齐目标在 AI 系统中的相互作用。
-
超出范围的用途:
- 不适合需要非常细粒度或人工标注的政治倾向的任务。
- 数据集中的“左”和“右”概念可能因国家和时间而异,用户应检查数据是否捕捉到感兴趣的意识形态维度。
数据集创建
- 数据生成: 使用 GPT-3.5 Turbo 生成。
- 审核: 确保相关性、意识形态对齐和质量。
- 最终数据集: 经过过滤和结构化,确保左右声明的平衡。
数据集来源
- 数据生产者: GPT-3.5 Turbo 生成,由 MIT 的数据集创建者进行广泛审核。
- 数据集存储库: https://github.com/sfulay/truth_politics
- 相关论文: https://arxiv.org/abs/2409.05283
数据集限制
- 来源背景: 数据集的政治和意识形态概念来自 2020 年代初的美国,可能不适用于其他文化或其他时期。
- 生成内容: 由于声明由 GPT-3.5 Turbo 生成,可能无法完全捕捉现实世界政治话语的细微差别或复杂性。
数据集引用
BibTeX
@inproceedings{fulayRelationshipTruthPolitical2024, author = {Fulay, Suyash and Brannon, William and Mohanty, Shrestha and Overney, Cassandra and Poole-Dayan, Elinor and Roy, Deb and Kabbara, Jad}, title = {On the Relationship between Truth and Political Bias in Language Models}, booktitle = {Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP 24)}, year = {2024}, month = nov, publisher = {Association for Computational Linguistics}, note = {arXiv:2409.05283}, abstract = {Language model alignment research often attempts to ensure that models are not only helpful and harmless, but also truthful and unbiased. However, optimizing these objectives simultaneously can obscure how improving one aspect might impact the others. In this work, we focus on analyzing the relationship between two concepts essential in both language model alignment and political science: extit{truthfulness} and extit{political bias}. We train reward models on various popular truthfulness datasets and subsequently evaluate their political bias. Our findings reveal that optimizing reward models for truthfulness on these datasets tends to result in a left-leaning political bias. We also find that existing open-source reward models (i.e. those trained on standard human preference datasets) already show a similar bias and that the bias is larger for larger models. These results raise important questions about both the datasets used to represent truthfulness and what language models capture about the relationship between truth and politics.} }
APA
Fulay, S., Brannon, W., Mohanty, S., Overney, C., Poole-Dayan, E., Roy, D., & Kabbara, J. (2024). On the Relationship between Truth and Political Bias in Language Models. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP 24). Association for Computational Linguistics.
数据集作者
- William Brannon, wbrannon@mit.edu
- Suyash Fulay, sfulay@mit.edu




