Geonwoohong/aihub-webcorpus-morph-train-tokenized-ko
收藏Hugging Face2025-10-21 更新2025-11-15 收录
下载链接:
https://hf-mirror.com/datasets/Geonwoohong/aihub-webcorpus-morph-train-tokenized-ko
下载链接
链接失效反馈官方服务:
资源简介:
这是一个经过形态分析的韩文网络语料库,基于AIHub韩文网络语料库构建而成,包含原始句子以及分为内容承载的语义 morphemes 和语法风格的风格学 morphemes 两个子集。数据集经过清洗和形态分析,以Apache Arrow shards格式存储,并支持高效的流式传输和加载。
This dataset is a morphologically analyzed Korean web corpus based on the AIHub Korean Web Corpus, containing original sentences and two subsets: content-bearing semantic morphemes and grammatical stylistic morphemes. The dataset has been cleaned and morphologically analyzed, stored in Apache Arrow shards format, and supports efficient streaming and loading.
提供机构:
Geonwoohong



