广告文案生成数据集
收藏魔搭社区2026-05-16 更新2024-05-15 收录
下载链接:
https://modelscope.cn/datasets/lvjianjin/AdvertiseGen
下载链接
链接失效反馈官方服务:
资源简介:
## 数据集描述
AdvertiseGen以商品网页的标签与文案的信息对应关系为基础构造,是典型的开放式生成任务,在模型基于key-value输入生成开放式文案时,与输入信息的事实一致性需要得到重点关注。
## 数据预览
任务描述:给定商品信息的关键词和属性列表kv-list,生成适合该商品的广告文案adv;
数据规模:训练集114k,验证集1k,测试集3k;
数据来源:清华大学CoAI小组;
数据样例:
```
{
"content": "上衣 牛仔布 白色 简约 刺绣 外套 破洞",
"summary": "简约而不简单的牛仔外套,白色的衣身十分百搭。衣身多处有做旧破洞设计,打破单调乏味,增加一丝造型看点。衣身后背处有趣味刺绣装饰,丰富层次感,彰显别样时尚。"
}
```
## 基线系统
本数据集提供的基线系统,基于百度提出的ERNIE-UNIMO统一模态预训练框架。在本次比赛的三个文本生成任务中,我们基于本基线使用的模型是UNIMO-text,是基于ERNIE-UNIMO框架在文本数据上预训练得到模型。
## Dataset Description
AdvertiseGen is constructed based on the correspondence between product webpage tags and advertising copy information, which is a typical open-ended generation task. When the model generates open-ended copy based on key-value inputs, the factual consistency with the input information requires focused attention.
## Data Preview
Task Description: Given the keyword and attribute list (kv-list) of product information, generate appropriate advertising copy adv for the product;
Data Scale: 114k training samples, 1k validation samples, 3k test samples;
Data Source: CoAI Group of Tsinghua University;
Data Example:
{
"content": "Top, denim, white, minimalist, embroidery, jacket, ripped",
"summary": "A minimalist denim jacket that is far from ordinary. The white body is extremely versatile. Multiple worn and ripped designs are added to the jacket, breaking the monotony and adding a touch of visual appeal. Interesting embroidery decorations are placed on the back of the jacket, enriching the layering and showcasing a unique fashion sense."
}
## Baseline System
The baseline system provided for this dataset is based on the ERNIE-UNIMO unified-modal pre-training framework proposed by Baidu. Among the three text generation tasks in this competition, the model we used based on this baseline is UNIMO-text, which is a model pre-trained on text data using the ERNIE-UNIMO framework.
提供机构:
maas
创建时间:
2022-12-20
搜集汇总
数据集介绍

背景与挑战
背景概述
广告文案生成数据集是一个用于开放式生成任务的数据集,基于商品网页的标签与文案的信息对应关系构造,包含大量训练、验证和测试数据。任务要求根据商品关键词和属性列表生成广告文案,基线系统基于ERNIE-UNIMO框架。
以上内容由遇见数据集搜集并总结生成



