National Institute of the Korean Language Corpus 国家韩语语料库研究所

Name: National Institute of the Korean Language Corpus 国家韩语语料库研究所
Creator: 阿里云天池
Published: 2026-06-08 23:34:10
License: 暂无描述

阿里云天池2026-06-08 更新2024-03-07 收录

下载链接：

https://tianchi.aliyun.com/dataset/90062

下载链接

链接失效反馈

官方服务：

资源简介：

对于自然语言处理和语言学家来说，单词在语言中出现的频率是重要的信息。在自然语言处理中，非常频繁的单词往往比不太频繁的单词具有更少的信息，并且在预处理过程中经常被删除。该数据集包含有关韩语的频率信息，有8000万人使用。对于每个项目，都提供了频率（它在语料库中出现的次数）及其相对于其他引理的相对等级。

For researchers in natural language processing (NLP) and linguists, the frequency of word occurrence in a language constitutes critical information. In the field of NLP, extremely frequent words generally carry less informational value than infrequent counterparts, and are often removed during text preprocessing. This dataset provides frequency information for the Korean language, which is spoken by 80 million people. For each entry, both the frequency (the number of times it appears in the corpus) and its relative rank compared to other lemmas are provided.

提供机构：

阿里云天池

创建时间：

2021-02-02

搜集汇总

数据集介绍