five

广告文案生成数据集

收藏
魔搭社区2026-05-16 更新2024-05-15 收录
下载链接:
https://modelscope.cn/datasets/lvjianjin/AdvertiseGen
下载链接
链接失效反馈
官方服务:
资源简介:
## 数据集描述 AdvertiseGen以商品网页的标签与文案的信息对应关系为基础构造,是典型的开放式生成任务,在模型基于key-value输入生成开放式文案时,与输入信息的事实一致性需要得到重点关注。 ## 数据预览 任务描述:给定商品信息的关键词和属性列表kv-list,生成适合该商品的广告文案adv; 数据规模:训练集114k,验证集1k,测试集3k; 数据来源:清华大学CoAI小组; 数据样例: ``` { "content": "上衣 牛仔布 白色 简约 刺绣 外套 破洞", "summary": "简约而不简单的牛仔外套,白色的衣身十分百搭。衣身多处有做旧破洞设计,打破单调乏味,增加一丝造型看点。衣身后背处有趣味刺绣装饰,丰富层次感,彰显别样时尚。" } ``` ## 基线系统 本数据集提供的基线系统,基于百度提出的ERNIE-UNIMO统一模态预训练框架。在本次比赛的三个文本生成任务中,我们基于本基线使用的模型是UNIMO-text,是基于ERNIE-UNIMO框架在文本数据上预训练得到模型。

## Dataset Description AdvertiseGen is constructed based on the correspondence between product webpage tags and advertising copy information, which is a typical open-ended generation task. When the model generates open-ended copy based on key-value inputs, the factual consistency with the input information requires focused attention. ## Data Preview Task Description: Given the keyword and attribute list (kv-list) of product information, generate appropriate advertising copy adv for the product; Data Scale: 114k training samples, 1k validation samples, 3k test samples; Data Source: CoAI Group of Tsinghua University; Data Example: { "content": "Top, denim, white, minimalist, embroidery, jacket, ripped", "summary": "A minimalist denim jacket that is far from ordinary. The white body is extremely versatile. Multiple worn and ripped designs are added to the jacket, breaking the monotony and adding a touch of visual appeal. Interesting embroidery decorations are placed on the back of the jacket, enriching the layering and showcasing a unique fashion sense." } ## Baseline System The baseline system provided for this dataset is based on the ERNIE-UNIMO unified-modal pre-training framework proposed by Baidu. Among the three text generation tasks in this competition, the model we used based on this baseline is UNIMO-text, which is a model pre-trained on text data using the ERNIE-UNIMO framework.
提供机构:
maas
创建时间:
2022-12-20
搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
广告文案生成数据集是一个用于开放式生成任务的数据集,基于商品网页的标签与文案的信息对应关系构造,包含大量训练、验证和测试数据。任务要求根据商品关键词和属性列表生成广告文案,基线系统基于ERNIE-UNIMO框架。
以上内容由遇见数据集搜集并总结生成
二维码
社区交流群
二维码
科研交流群
商业服务