MateuszW/spoiler_generation
收藏Hugging Face2023-07-08 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/MateuszW/spoiler_generation
下载链接
链接失效反馈官方服务:
资源简介:
---
task_categories:
- question-answering
- text-generation
language:
- en
tags:
- spoiler generation
- clickbait spoiling
---
# Datasets used for spoiler generation task
## Dataset Description
This dataset contains multiple datasets used for spoiler generation task for Clickbait spoiling competition for
training models based on Question Answering, text generation, or learn-to-rank problems.
## Dataset Structure
This dataset has 5 main directories:
- clf_data - this dataset was used to train a classifier for two generated spoilers to decide
which spoiler better matches a clickbait post. This model makes a binary classification,
class 1 corresponds to the situation when the first spoiler is "better" than the second,
and class 0 corresponds to the opposite situation
- clickbait_spoiling_data - this dataset is the original dataset taken from the Clickbait spoiling competition
- generated_questions - this dataset contains questions generated for clickbait posts by the Vicuna model
- models_output - in this dataset were inserted generated spoilers from the best-selected models
- regressor_data - this dataset was used to train a model that estimates the BLEU of generated
spoiler without knowing reference spoiler
提供机构:
MateuszW
原始信息汇总
数据集概述
数据集用途
- 用于训练模型解决点击诱饵破坏任务中的问题,包括问答、文本生成和学习排序问题。
数据集结构
数据集包含以下五个主要目录:
- clf_data:用于训练分类器,以决定两个生成的破坏者中哪一个更匹配点击诱饵帖子。分类结果为二元分类,类别1表示第一个破坏者“更好”,类别0表示相反情况。
- clickbait_spoiling_data:来自点击诱饵破坏竞赛的原始数据集。
- generated_questions:包含Vicuna模型为点击诱饵帖子生成的问題。
- models_output:包含最佳选定模型生成的破坏者。
- regressor_data:用于训练模型,该模型估计生成的破坏者的BLEU分数,无需知道参考破坏者。
数据集语言
- 英语(en)
数据集标签
- 破坏者生成
- 点击诱饵破坏



