South Korean Election Campaign Booklet and Party Statements Corpora
收藏DataCite Commons2026-04-19 更新2025-04-16 收录
下载链接:
https://osf.io/rct9y/
下载链接
链接失效反馈官方服务:
资源简介:
This project contains two comprehensive text datasets from South Korean politics: the South Korean Election Campaign Booklet Corpus and the South Korean Party Statements Corpus.
The Election Campaign Booklet Corpus comprises manifesto pamphlets filed by individual candidates who ran for office in presidential, National Assembly, and local elections in South Korea between 2000 and 2022. The corpus contains 49,678 observations.
The Party Statements Corpus contains official statements and leadership meeting records released by the two major political parties in South Korea from 2003 to 2022. The corpus comprises 35,115 entries from the Conservative Party and 48,086 entries from the Progressive Party, for a total of 83,201 observations.
For the campaign booklet corpus, this project now distributes two public variants. The original files, `sk_election_campaign_booklet_v2022.csv` and `sk_election_campaign_booklet_v2022.parquet`, are the original campaign booklet corpus artifacts. The enriched files, `sk_election_campaign_booklet_enriched_v2022.csv` and `sk_election_campaign_booklet_enriched_v2022.parquet`, use the same document-row universe as the original CSV source, but add conservative NEC linkage fields such as `huboid`, `sg_id`, `sg_typecode`, `link_status`, `matcher_version`, and `nec_snapshot_id` to improve interoperability with `kr-elections-mcp` and related NEC-aligned workflows.
Important note: `huboid` is a linked NEC identifier, not a native krpoltext identifier. In the enriched variant, rows with `link_status == "resolved"` are expected to have a non-null `huboid`. Some rows have missing `code` values, so row identity should not be inferred from `code` alone.
The dataset was first made publicly available on September 13, 2024, and was subsequently updated on March 17, 2025, with new variables and previously missing observations. The accompanying data descriptor was corrected on March 20, 2025. As of April 19, 2026, both datasets are available in CSV and Parquet format through the `krpoltext` R package, and the campaign booklet corpus is available in both original and enriched variants.
The data descriptor is available here: https://www.nature.com/articles/s41597-025-05220-4.
The `krpoltext` R package and documentation are available at https://taehyun-lim.github.io/krpoltext/, and the source code is available at https://github.com/taehyun-lim/krpoltext/.
提供机构:
OSF
创建时间:
2024-09-26



