中文语料库

Name: 中文语料库
Creator: 伦敦大学学院
Published: 2016-11-15 22:41:23
License: 暂无描述

arXiv2016-11-15 更新2024-08-06 收录

下载链接：

http://arxiv.org/abs/1611.04358v2

下载链接

链接失效反馈

官方服务：

资源简介：

本研究构建了一个大规模的中文数据集，用于探索字符级卷积神经网络在中文语料库上的文本分类应用。数据集包含中文字符及其对应的拼音格式，总计约577,314条数据。该数据集的创建旨在解决中文文本分类中缺乏大型字符级数据集的问题，并通过实验证明字符级卷积网络在中文字符数据集上的表现优于其对应的拼音格式数据集。

This study constructs a large-scale Chinese dataset for investigating the application of character-level convolutional neural networks in text classification tasks on Chinese corpora. The dataset includes Chinese characters and their corresponding pinyin formats, with a total of approximately 577,314 data entries. This dataset is developed to address the shortage of large-scale character-level datasets for Chinese text classification, and experimentally demonstrate that character-level convolutional neural networks achieve better performance on Chinese character datasets than on their corresponding pinyin-format datasets.

提供机构：

伦敦大学学院

创建时间：

2016-11-14

5,000+

优质数据集

54 个

任务类型

进入经典数据集