dzxr/chinese-classical-corpus
收藏Hugging Face2026-04-29 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/dzxr/chinese-classical-corpus
下载链接
链接失效反馈官方服务:
资源简介:
中国古典文献结构化语料集,含完整十三经 + 说文解字 + 资治通鉴 + 二十四史前 15 部,以及 197 万条古译今/今译古/断句指令对。全部 CC0 公有领域,可商用、可改用、无附加限制。数据集分为三个配置:corpus(源语料,12,005 条章节级记录, 17.2M 字)、translate(古译今 / 今译古 双向指令数据,1,924,378 条)和 punctuate(断句加标点指令,46,546 条)。
Chinese Classical Corpus is a structured corpus of Chinese classical literature, including the complete Thirteen Classics, Shuowen Jiezi, Zizhi Tongjian, and the first 15 volumes of the Twenty-Four Histories, as well as 1.97 million instruction pairs for ancient-to-modern/modern-to-ancient translation and punctuation. All data is in the CC0 public domain, free for commercial use, modification, and without additional restrictions. The dataset is divided into three configurations: corpus (source corpus, 12,005 chapter-level records, 17.2M characters), translate (ancient-to-modern/modern-to-ancient bidirectional instruction data, 1,924,378 records), and punctuate (punctuation instruction, 46,546 records).
提供机构:
dzxr



