five

CMU-SE dataset

收藏
arXiv2025-09-30 收录
下载链接:
https://github.com/clab/sp2016.11-731/tree/master/hw4/data
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集是一个经过预处理的简单英语句子集合,包含44,016个句子,词汇量为3,122个词种。该数据集旨在生成连贯的句子。在处理过程中,少于七个单词的句子被忽略,而超过七个单词的句子则被截断。这一任务的目标是句子生成。

This dataset is a preprocessed collection of simple English sentences, consisting of 44,016 sentences with a vocabulary size of 3,122 unique word types. It is designed for coherent sentence generation. During preprocessing, sentences with fewer than seven words were discarded, while those exceeding seven words were truncated. The goal of this task is sentence generation.
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作