未明确命名

Name: 未明确命名
Creator: 明尼苏达大学计算机科学系
Published: 2023-05-25 05:36:15
License: 暂无描述

arXiv2023-05-25 更新2024-08-06 收录

下载链接：

http://arxiv.org/abs/2305.15582v1

下载链接

链接失效反馈

官方服务：

资源简介：

本研究使用多个公开的NLP数据集，包括Grammarly的Yahoo Answers Formality Corpus、Emobank数据集、Sentiment140数据集和Wiki Neutrality Corpus，用于构建和测试多风格文本转换模型。这些数据集涵盖了形式性、情感、唤醒度和偏见等多个微观风格，用于训练和验证模型的多风格转换能力。研究通过调整训练样本的风格分布，构建了一个伪平行数据集，以平衡训练数据，从而提高多风格转换模型的效果。该研究旨在解决多风格文本转换中的数据需求问题，特别是在缺乏高质量配对数据集的情况下，如何有效训练风格转换模型。

This study employs multiple publicly available NLP datasets, including Grammarly's Yahoo Answers Formality Corpus, Emobank dataset, Sentiment140 dataset, and Wiki Neutrality Corpus, to construct and test multi-style text conversion models. These datasets cover multiple micro-level style dimensions including formality, sentiment, arousal, and bias, and are used to train and validate the multi-style conversion capability of the models. The study constructs a pseudo-parallel dataset by adjusting the style distribution of training samples to balance the training data, thereby enhancing the performance of multi-style text conversion models. This research aims to address the data demand issue in multi-style text conversion, particularly how to effectively train style conversion models when high-quality parallel datasets are scarce.

提供机构：

明尼苏达大学计算机科学系

创建时间：

2023-05-25

5,000+

优质数据集

54 个

任务类型

进入经典数据集