Raw frequency data: Thoughts on "Reliable" Learner's Vocabularies for Classical and Literary Chinese
收藏NIAID Data Ecosystem2026-03-13 收录
下载链接:
https://zenodo.org/record/5638881
下载链接
链接失效反馈官方服务:
资源简介:
This dataset includes the raw frequency counts (classical_chinese_learners_vocabularies_raw_frequencies.zip) used in the article Thoughts on “Reliable” Learner’s Vocabularies for Classical and Literary Chinese.
Corpus I – Micheal Loewe (1993)’s Early Chinese Texts
Corpus II – Official Histories (zhengshi 正史)
Corpus III Six Novels (xiaoshuo 小說), as defined in Hsia 1968
The download includes one folder per corpus, structured as follows:
xx_corpus.csv > list of texts and sources / used versions, token and type counts
xx_freq_1-1.csv > unigram / character frequencies and counts
xx_freq_1-4.csv > 1 to 4 character word frequencies and counts, "words" according to Hanyu da cidian 漢語大詞典 (Luo 1986–1994))
xx_freq_2-4.csv > 2 to 4 character words
Additionally, pca_zhengshi_vs_loewe_vs_xiaoshuo.html is an interactive version of the Principal Component Analysis (PCA) presented in the article, texts from the three corpora are represented using the 1.000 most frequent 1–4 character combinations from the dataset.
创建时间:
2021-11-03



