five

yujin31/model_generated_summaries

收藏
Hugging Face2025-12-14 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/yujin31/model_generated_summaries
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集是一个用于摘要生成任务的英文数据集,规模在10万到100万之间。包含训练集(100,000个样本)、验证集(20,000个样本)和测试集(20,000个样本),均为模型生成的摘要。数据来源于四个领域:新闻(来自cnn_dailymail数据集)、Arxiv研究文章(来自arxiv-summarization数据集)、Reddit帖子(来自reddit_ds_479243数据集)和Enron电子邮件(来自enron-email数据集)。每个领域和每个模型有5,000个训练样本、1,000个验证样本和1,000个测试样本。使用的模型包括Llama-3.1-8B-Instruct、Mistral-7B-Instruct-v0.3、Qwen2.5-7B-Instruct、granite-3.3-8b-instruct和glm-4-9b-chat。标签对应模型名称。

This dataset is an English dataset for summarization tasks, with a size between 100K and 1M. It includes a training set (100,000 samples), validation set (20,000 samples), and test set (20,000 samples), all of which are model-generated summaries. The data comes from four domains: news (from cnn_dailymail dataset), Arxiv research articles (from arxiv-summarization dataset), Reddit posts (from reddit_ds_479243 dataset), and Enron emails (from enron-email dataset). For each domain and each model, there are 5,000 training samples, 1,000 validation samples, and 1,000 test samples. The models used include Llama-3.1-8B-Instruct, Mistral-7B-Instruct-v0.3, Qwen2.5-7B-Instruct, granite-3.3-8b-instruct, and glm-4-9b-chat. Labels correspond to model names.
提供机构:
yujin31
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作