five

Historical Texts Dataset

收藏
arXiv2025-09-30 收录
下载链接:
https://github.com/PKU-Alignment/ProgressGym
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集是一个包含格式化、清理过的数据综合集,源自13至21世纪的历史文本资源,涵盖了公共领域书籍、学术文章、法律文本、报纸档案以及历史演讲的转录。该数据集经过多轮筛选和精炼,以解决诸如误标和OCR错误等质量问题。它全面覆盖了跨越九个世纪的历史文本,旨在应对包括追踪演变价值观、预判道德进步以及调节人类与人工智能价值观转变之间的反馈循环等进度对齐挑战。

This dataset is a comprehensive collection of formatted and cleaned data, sourced from historical textual resources spanning the 13th to 21st centuries, covering transcriptions of public-domain books, academic articles, legal documents, newspaper archives, and historical speeches. This dataset has undergone multiple rounds of filtering and refinement to resolve quality issues such as mislabeling and OCR errors. It comprehensively covers historical texts across nine centuries, with the goal of addressing progress alignment challenges including tracking evolving values, anticipating moral progress, and regulating feedback loops between shifts in human and artificial intelligence values.
提供机构:
Not explicitly mentioned in the provided text
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作