频率最高的9933个最常用汉字数据集
收藏帕依提提2024-03-04 收录
下载链接:
https://www.payititi.com/opendatasets/show-2007.html
下载链接
链接失效反馈官方服务:
资源简介:
数据的收集源于reddit用户areyde的一个简单的问题:“学习所有汉字意味着什么?”可以简化为“您可以为学习汉字制定什么目标?” 在他看来,似乎最有用的是汉字出现的的频率。因此,他根据语料库 http://lingua.mtsu.edu/chinese-computing/statistics/char/list.php?Which=MO ,列出了所有的9,933个字符 。在本数据集中的每个汉字字符,其实都存储了以下信息:语料库中的出现次数,占该语料库的计算百分比,部首和字典代码,笔划数,发音和含义(如果存在)。
The data collection originated from a simple question posed by Reddit user areyde: "What does it mean to learn all Chinese characters?", which can be simplified to "What goals can you set for learning Chinese characters?". In his view, the most useful metric was the frequency of Chinese character occurrences. Therefore, he compiled all 9,933 Chinese characters using the corpus hosted at http://lingua.mtsu.edu/chinese-computing/statistics/char/list.php?Which=MO. For each Chinese character in this dataset, the following information is stored: its occurrence count in the corpus, its calculated percentage of the total corpus, its radical, dictionary code, stroke count, pronunciation, and meaning (if available).
提供机构:
帕依提提
搜集汇总
数据集介绍

背景与挑战
背景概述
该数据集收集了频率最高的9933个常用汉字,每个汉字包含出现频率、部首、笔划数等详细信息,适用于汉字学习和自然语言处理研究。
以上内容由遇见数据集搜集并总结生成



