five

Data set for Chinese text automatic generation task

收藏
科学数据银行2022-02-09 更新2026-04-23 收录
下载链接:
https://www.scidb.cn/en/detail?dataSetId=1abfe11c5b0246f1bb16e29f0f179148
下载链接
链接失效反馈
官方服务:
资源简介:
The dataset is stored in the excel table format of CSV attribute, which mainly describes the information of the restaurant. It is composed of 17457 key value pairs and 17246 human language references. Each MR is composed of 3-8 Chinese key value pairs, such as name, food or region and their values, as shown in Table 3. Among them, 15568 texts were used for training, 1678 texts were used for verification, and the remaining 211 texts were used for testing. Each set of key value pairs in the training set and verification set has multiple human language reference texts, which aims to create more natural, informative and diverse human references than Mr. After a series of data processing, including collection, cleaning, translation, screening and sorting, the parallel corpus of Chinese key value pairs is finally constructed manually.The dataset includes three data files, including: (1) trainset CSV is the training set data, with a data volume of 15568 cases; (2) devset. CSV is the validation set data, with a data volume of 1678 cases; (3) testset. CSV is the test set data, with 211 cases of dataEach instance of training set and verification set consists of key value pair group and human reference text, and the instances of test set only have key value pair group.
提供机构:
高原科学与可持续发展研究院; 省部共建藏语智能信息处理及应用国家重点实验室; Qinghai Normal University; 国家青藏高原科学数据中心青海分中心
创建时间:
2021-12-31
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作