five

yuyang/distil_cnndm

收藏
Hugging Face2023-05-14 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/yuyang/distil_cnndm
下载链接
链接失效反馈
官方服务:
资源简介:
# Distilled CNN/DailyMail Dataset This folder contains the distilled data and dataset loading script to build a dataset on top of it. - `cnn_bart_pl` is downloaded from [Saved Pseudo-Labels](https://github.com/huggingface/transformers/blob/main/examples/research_projects/seq2seq-distillation/precomputed_pseudo_labels.md), which is generated by facebook/bart-large-cnn, this corresponds to version "1.0.0". It contains train/validataion/test splits. - `pegasus_cnn_cnn_pls` is also downloaded from [Saved Pseudo-Labels](https://github.com/huggingface/transformers/blob/main/examples/research_projects/seq2seq-distillation/precomputed_pseudo_labels.md). It is generated by sshleifer/pegasus-cnn-ft-v2, and it corresponds to version "2.0.0". It only includes the train split. ## Updates - 03/16/2023 1. Remove "(CNN)" in the beginning of articles.
提供机构:
yuyang
原始信息汇总

Distilled CNN/DailyMail Dataset 概述

数据集组成

  • cnn_bart_pl:

    • 来源: Saved Pseudo-Labels
    • 生成模型: facebook/bart-large-cnn
    • 版本: "1.0.0"
    • 包含内容: 训练集、验证集、测试集
  • pegasus_cnn_cnn_pls:

    • 来源: Saved Pseudo-Labels
    • 生成模型: sshleifer/pegasus-cnn-ft-v2
    • 版本: "2.0.0"
    • 包含内容: 训练集

更新记录

  • 2023年3月16日
    • 移除了文章开头的"(CNN)"字样。
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作