five

content_rephrasing

收藏
魔搭社区2025-11-27 更新2025-05-24 收录
下载链接:
https://modelscope.cn/datasets/facebook/content_rephrasing
下载链接
链接失效反馈
官方服务:
资源简介:
## Message Content Rephrasing Dataset Introduced by Einolghozati et al. in Sound Natural: Content Rephrasing in Dialog Systems https://aclanthology.org/2020.emnlp-main.414/ We introduce a new task of rephrasing for amore natural virtual assistant. Currently, vir-tual assistants work in the paradigm of intent-slot tagging and the slot values are directlypassed as-is to the execution engine. However,this setup fails in some scenarios such as mes-saging when the query given by the user needsto be changed before repeating it or sending itto another user. For example, for queries like‘ask my wife if she can pick up the kids’ or ‘re-mind me to take my pills’, we need to rephrasethe content to ‘can you pick up the kids’ and‘take your pills’. In this paper, we study theproblem of rephrasing with messaging as ause case and release a dataset of 3000 pairs oforiginal query and rephrased query. We showthat BART, a pre-trained transformers-basedmasked language model with auto-regressivedecoding, is a strong baseline for the task, andshow improvements by adding a copy-pointerand copy loss to it. We analyze different trade-offs of BART-based and LSTM-based seq2seqmodels, and propose a distilled LSTM-basedseq2seq as the best practical model.

消息内容重写数据集(Message Content Rephrasing Dataset)由Einolghozati等人在《自然对话:对话系统中的内容重写》(Sound Natural: Content Rephrasing in Dialog Systems)一文中提出,相关论文链接为https://aclanthology.org/2020.emnlp-main.414/。 我们提出了一项面向更自然虚拟助手的重写任务。当前,虚拟助手普遍采用意图-槽位标注范式运行,槽位值会被直接原样传递至执行引擎。但在部分场景下,例如消息对话场景,用户发起的查询在重复发送或转发给其他用户前,需先进行内容重写。举例而言,针对“询问我的妻子是否能来接孩子”或“提醒我服药”这类查询,我们需要将其重写为“你能来接孩子吗”与“记得服药”。本文以消息对话为应用场景研究重写问题,并发布了包含3000对原始查询与重写后查询的数据集。我们证实,BART——一种基于预训练Transformer的掩码语言模型,搭配自回归解码机制——是该任务的强基准模型,并通过为其添加复制指针(copy-pointer)与复制损失实现了性能提升。我们分析了基于BART与基于长短期记忆网络(LSTM)的序列到序列(seq2seq)模型的不同权衡策略,并提出了基于蒸馏LSTM的seq2seq模型作为最优实用模型。
提供机构:
maas
创建时间:
2025-05-20
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作