five

dibao-research/wanli-dibao-corpus

收藏
Hugging Face2026-04-05 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/dibao-research/wanli-dibao-corpus
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - zh - lzh license: cc-by-4.0 task_categories: - text-generation - token-classification - text-classification tags: - classical-chinese - historical-documents - Ming-dynasty - dibao - imperial-gazette - digital-humanities - East-Asian-studies - NLP - corpus pretty_name: "Wanli Dibao Corpus (萬曅邸鈔)" size_categories: - 10K<n<100K source_datasets: [] --- # Wanli Dibao Corpus / 萬曆邸鈔校訂語料庫 ## Dataset Description ### Summary **English:** The **Wanli Dibao Corpus** is a structured, proofread digital corpus of the *Wanli Dichao* (萬曆邸鈔), a collection of manuscript copies of official gazettes (*dibao* 邸報) from the Wanli reign (1573–1620) of the Ming dynasty. The *dibao* system was the primary channel of official communication in imperial China, transmitting memorials, edicts, personnel appointments, and policy decisions from the capital to provincial officials across the empire. The original manuscripts are held by the **National Central Library of Taiwan (國家圖書館)** and are in the public domain. This corpus represents the first publicly available, digitally structured edition of these documents, with historical annotations and cross-references to the *Ming Shilu* (明實錄). **中文:** **萬曆邸鈔語料庫**是對萬曄朝(1573–1620)邸鈔抄本的結構化校訂數位語料庫。邸報制度是明代官方信息傳播的核心渠道,負責將奏疏、詔令、人事任免、政策決定從京師傳遞至各省官員。 原始抄本藏於**臺灣國家圖書館**,屬公有領域。本語料庫是這批文獻首次以結構化數位形式公開發布的版本,附有歷史註釋及與《明實錄》的交叉參照。 ### Research Background / 研究背景 The *dibao* has been studied as a precursor to modern journalism and a key institution of political communication in the late imperial Chinese state. Theoretical frameworks applied to this corpus include: - **Habermas's public sphere theory** — examining the *dibao* as a semi-public channel of political information - **Luhmann's systems theory** — analyzing the *dibao* as a functional subsystem of political communication - **Innis's communication theory** — investigating the time-space bias of the *dibao* medium - **Latour's Actor-Network Theory** — tracing the network of human and non-human actors in *dibao* production and circulation 邸報研究涉及政治傳播、新聞史、明代制度史的交叉領域。本語料庫的理論框架包括哈貝馬斯的公共領域理論、盧曼的系統理論、英尼斯的傳播理論,以及拉圖爾的行動者網絡理論。 ### Languages - Classical Chinese (文言文 / Literary Chinese, ISO 639-3: `lzh`) - Annotations in Modern Chinese (`zh`) and English (`en`) ## Dataset Structure ### Data Fields | Field | Type | Description (EN) | 說明 (中文) | |---|---|---|---| | `id` | string | Unique entry identifier | 條目唯一識別碼 | | `year` | integer | Year (Wanli reign year + CE) | 年份(萬曆紀年 + 公元) | | `month` | integer | Lunar month | 月份(農曆) | | `day` | string | Day (if available) | 日期(如有) | | `wanli_year` | integer | Wanli reign year (1–48) | 萬曆紀年(1–48) | | `category` | string | Content category | 內容分類 | | `title` | string | Document title / heading | 文書標題 | | `raw_text` | string | Original proofread text | 校訂原文 | | `persons` | list[string] | Person names mentioned | 涉及人名 | | `offices` | list[string] | Official titles mentioned | 涉及官職名 | | `locations` | list[string] | Place names mentioned | 涉及地名 | | `institutions` | list[string] | Government institutions mentioned | 涉及機構名 | | `mingshilu_ref` | list[string] | Cross-reference to Ming Shilu entries | 明實錄對應條目 | | `notes` | string | Editorial and historical notes | 校注與歷史註釋 | ### Content Categories / 內容分類 | Category | Description | 說明 | |---|---|---| | `memorial` | Memorials to the throne (奏疏) | 臣僚上奏文書 | | `edict` | Imperial edicts (詔令) | 皇帝詔令 | | `appointment` | Personnel appointments (除授) | 官員任免 | | `impeachment` | Impeachment cases (彈劾) | 彈劾案件 | | `military` | Military affairs (軍事) | 軍事相關 | | `disaster` | Natural disasters and relief (災異) | 災害與賑濟 | | `fiscal` | Fiscal and taxation matters (財政) | 財政稅收 | | `ritual` | Ritual and ceremonial matters (禮儀) | 禮制儀典 | | `other` | Other | 其他 | ### Data Splits | Split | Entries | Description | |---|---|---| | `full` | TBD | Complete corpus | | By reign year | TBD | Partitioned by Wanli year (1–48) | ## Dataset Creation ### Curation Rationale This dataset was created to support computational analysis of Ming dynasty political communication, enabling large-scale studies that were previously impossible with manual methods alone. The structured format allows cross-referencing with other historical corpora (Ming Shilu, Chinese literary collections, Joseon Wangjo Sillok). 本數據集旨在支持對明代政治傳播的計算分析,實現以往僅靠人工方法無法進行的大規模研究。結構化格式允許與其他歷史語料庫(明實錄、中國文集、朝鮮王朝實錄)進行交叉參照。 ### Source Data **Primary Source:** Manuscript copies of the *Wanli Dichao* (萬曆邸鈔), held by the National Central Library of Taiwan (臺灣國家圖書館). Downloaded as public domain digital images. **Proofreading:** All texts have been manually proofread and corrected against the original manuscript images by domain experts with specialized knowledge in Ming dynasty paleography and administrative terminology. ### Annotations - Named entities (persons, offices, locations, institutions) identified through expert annotation - Cross-references to the *Ming Shilu* (明實錄) verified against the Academia Sinica digital edition - Content categories assigned based on the nature of each entry ### Personal and Sensitive Information This dataset contains historical records from the 16th–17th century. All persons mentioned are historical figures deceased for over 400 years. No modern personal or sensitive information is included. ## Considerations for Using the Data ### Licensing The original manuscript images are in the **public domain** (held by the National Central Library of Taiwan). The structured dataset, annotations, and editorial notes are released under **CC BY 4.0**. ### Known Limitations 1. The corpus is based on manuscript copies (*chaoben* 鈔本), not the original *dibao* documents themselves, which are lost. Copying errors may exist in the source material. 2. Coverage is uneven across the Wanli reign — some years have significantly more entries than others. 3. Proofread text accuracy depends on the legibility of the manuscript; some characters remain uncertain and are marked accordingly. ### Bias The *dibao* system inherently reflects the perspective of the central government. Provincial and local perspectives are underrepresented. The selection of entries in the surviving *dichao* collections may not represent the full range of topics covered in the original gazette. ## Additional Information ### Dataset Curators - **Kwanyong Kim** (humanet2) — Digital Humanities, Ming-Qing political communication studies - [Additional curator information to be added] ### Citation If you use this dataset, please cite: ```bibtex @dataset{wanli_dibao_corpus_2026, title={Wanli Dibao Corpus: A Structured Digital Edition of Ming Dynasty Official Gazettes}, author={Kim, Kwanyong}, year={2026}, publisher={Hugging Face}, url={https://huggingface.co/datasets/dibao-research/wanli-dibao-corpus} } ``` ### Related Resources | Resource | Description | Link | |---|---|---| | Ming Shilu (明實錄) | Veritable Records of the Ming Dynasty | [Academia Sinica](https://hanchi.ihp.sinica.edu.tw/) | | Joseon Wangjo Sillok | Annals of the Joseon Dynasty | [sillok.history.go.kr](https://sillok.history.go.kr/) | | National Central Library Taiwan | Original manuscript source | [ncl.edu.tw](https://www.ncl.edu.tw/) | ### Contact For questions, corrections, or collaboration inquiries, please use the **Discussions** tab on this repository or contact via the `dibao-research` organization page.
提供机构:
dibao-research
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作