未明确命名
收藏arXiv2023-05-25 更新2024-08-06 收录
下载链接:
http://arxiv.org/abs/2305.15582v1
下载链接
链接失效反馈官方服务:
资源简介:
本研究使用多个公开的NLP数据集,包括Grammarly的Yahoo Answers Formality Corpus、Emobank数据集、Sentiment140数据集和Wiki Neutrality Corpus,用于构建和测试多风格文本转换模型。这些数据集涵盖了形式性、情感、唤醒度和偏见等多个微观风格,用于训练和验证模型的多风格转换能力。研究通过调整训练样本的风格分布,构建了一个伪平行数据集,以平衡训练数据,从而提高多风格转换模型的效果。该研究旨在解决多风格文本转换中的数据需求问题,特别是在缺乏高质量配对数据集的情况下,如何有效训练风格转换模型。
This study employs multiple publicly available NLP datasets, including Grammarly's Yahoo Answers Formality Corpus, Emobank dataset, Sentiment140 dataset, and Wiki Neutrality Corpus, to construct and test multi-style text conversion models. These datasets cover multiple micro-level style dimensions including formality, sentiment, arousal, and bias, and are used to train and validate the multi-style conversion capability of the models. The study constructs a pseudo-parallel dataset by adjusting the style distribution of training samples to balance the training data, thereby enhancing the performance of multi-style text conversion models. This research aims to address the data demand issue in multi-style text conversion, particularly how to effectively train style conversion models when high-quality parallel datasets are scarce.
提供机构:
明尼苏达大学计算机科学系
创建时间:
2023-05-25



