five

MateuszW/spoiler_generation

收藏
Hugging Face2023-07-08 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/MateuszW/spoiler_generation
下载链接
链接失效反馈
官方服务:
资源简介:
--- task_categories: - question-answering - text-generation language: - en tags: - spoiler generation - clickbait spoiling --- # Datasets used for spoiler generation task ## Dataset Description This dataset contains multiple datasets used for spoiler generation task for Clickbait spoiling competition for training models based on Question Answering, text generation, or learn-to-rank problems. ## Dataset Structure This dataset has 5 main directories: - clf_data - this dataset was used to train a classifier for two generated spoilers to decide which spoiler better matches a clickbait post. This model makes a binary classification, class 1 corresponds to the situation when the first spoiler is "better" than the second, and class 0 corresponds to the opposite situation - clickbait_spoiling_data - this dataset is the original dataset taken from the Clickbait spoiling competition - generated_questions - this dataset contains questions generated for clickbait posts by the Vicuna model - models_output - in this dataset were inserted generated spoilers from the best-selected models - regressor_data - this dataset was used to train a model that estimates the BLEU of generated spoiler without knowing reference spoiler
提供机构:
MateuszW
原始信息汇总

数据集概述

数据集用途

  • 用于训练模型解决点击诱饵破坏任务中的问题,包括问答、文本生成和学习排序问题。

数据集结构

数据集包含以下五个主要目录:

  1. clf_data:用于训练分类器,以决定两个生成的破坏者中哪一个更匹配点击诱饵帖子。分类结果为二元分类,类别1表示第一个破坏者“更好”,类别0表示相反情况。
  2. clickbait_spoiling_data:来自点击诱饵破坏竞赛的原始数据集。
  3. generated_questions:包含Vicuna模型为点击诱饵帖子生成的问題。
  4. models_output:包含最佳选定模型生成的破坏者。
  5. regressor_data:用于训练模型,该模型估计生成的破坏者的BLEU分数,无需知道参考破坏者。

数据集语言

  • 英语(en)

数据集标签

  • 破坏者生成
  • 点击诱饵破坏
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作