未明确提及
收藏arXiv2024-06-16 更新2024-08-06 收录
下载链接:
http://arxiv.org/abs/2406.10773v1
下载链接
链接失效反馈官方服务:
资源简介:
本研究利用一个包含2100篇人类撰写的新闻文章的数据集,通过九种大型语言模型(LLMs)生成了56700篇合成文章。数据集主要用于分析和比较人类作者与机器生成文章之间的政治偏差。该数据集的创建旨在为新闻领域中LLMs的政治偏差提供量化实验的基础,特别是在生成新闻文章时。数据集的构建涉及从高质量的新闻摘要数据集中选择文章,并确保这些文章在长度、摘要长度和摘要指标上符合特定标准,以便于生成和比较。此数据集的应用领域主要集中在新闻生成和政治偏差的检测,旨在解决LLMs在新闻内容生成中可能出现的政治偏差问题。
This study employed a dataset consisting of 2100 human-written news articles, and generated 56700 synthetic articles using nine large language models (LLMs). This dataset is primarily designed to analyze and compare political biases between human-authored and machine-generated news articles. The creation of this dataset aims to establish a quantitative experimental basis for investigating political biases of LLMs in the news domain, specifically during news article generation. The dataset construction involved selecting articles from high-quality news summary datasets, and ensuring that these articles meet predefined criteria regarding article length, summary length, and summary metrics to support subsequent generation and comparison tasks. The main application areas of this dataset are news generation and political bias detection, with the objective of addressing potential political bias issues arising when LLMs generate news content.
提供机构:
伦敦大学学院
创建时间:
2024-06-16



