five

Raw frequency data: Thoughts on "Reliable" Learner's Vocabularies for Classical and Literary Chinese

收藏
NIAID Data Ecosystem2026-03-13 收录
下载链接:
https://zenodo.org/record/5638881
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset includes the raw frequency counts (classical_chinese_learners_vocabularies_raw_frequencies.zip) used in the article Thoughts on “Reliable” Learner’s Vocabularies for Classical and Literary Chinese.  Corpus I – Micheal Loewe (1993)’s Early Chinese Texts Corpus II – Official Histories (zhengshi 正史) Corpus III Six Novels (xiaoshuo 小說), as defined in Hsia 1968 The download includes one folder per corpus, structured as follows: xx_corpus.csv > list of texts and sources / used versions, token and type counts xx_freq_1-1.csv > unigram / character frequencies and counts xx_freq_1-4.csv > 1 to 4 character word frequencies and counts, "words" according to Hanyu da cidian 漢語大詞典 (Luo 1986–1994)) xx_freq_2-4.csv > 2 to 4 character words Additionally, pca_zhengshi_vs_loewe_vs_xiaoshuo.html is an interactive version of the Principal Component Analysis (PCA) presented in the article, texts from the three corpora are represented using the 1.000 most frequent 1–4 character combinations from the dataset.
创建时间:
2021-11-03
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作