中文语料库
收藏arXiv2016-11-15 更新2024-08-06 收录
下载链接:
http://arxiv.org/abs/1611.04358v2
下载链接
链接失效反馈官方服务:
资源简介:
本研究构建了一个大规模的中文数据集,用于探索字符级卷积神经网络在中文语料库上的文本分类应用。数据集包含中文字符及其对应的拼音格式,总计约577,314条数据。该数据集的创建旨在解决中文文本分类中缺乏大型字符级数据集的问题,并通过实验证明字符级卷积网络在中文字符数据集上的表现优于其对应的拼音格式数据集。
This study constructs a large-scale Chinese dataset for investigating the application of character-level convolutional neural networks in text classification tasks on Chinese corpora. The dataset includes Chinese characters and their corresponding pinyin formats, with a total of approximately 577,314 data entries. This dataset is developed to address the shortage of large-scale character-level datasets for Chinese text classification, and experimentally demonstrate that character-level convolutional neural networks achieve better performance on Chinese character datasets than on their corresponding pinyin-format datasets.
提供机构:
伦敦大学学院
创建时间:
2016-11-14



