UNIGEN

Name: UNIGEN
Creator: 韩国中央大学
Published: 2024-05-03 09:20:28
License: 暂无描述

arXiv2024-05-03 更新2024-06-24 收录

下载链接：

https://anonymous.4open.science/r/unigen-ARR/

下载链接

链接失效反馈

官方服务：

资源简介：

UNIGEN是由韩国中央大学开发的一个用于情感分类的通用领域泛化数据集。该数据集通过零样本数据生成技术创建，旨在使小型任务模型能够泛化到任何共享标签空间的领域，从而提高数据生成范式的实际应用性。UNIGEN数据集包含100万个数据条目，通过使用大型预训练语言模型（PLM）生成，无需人工标注数据，适用于多种自然语言处理任务，特别是在解决领域泛化问题上具有显著优势。

UNIGEN is a general-domain generalization dataset for sentiment classification developed by Chung-Ang University, Republic of Korea. This dataset is constructed using zero-shot data generation technologies, with the goal of enabling small task-specific models to generalize across any domain that shares a unified label space, thereby improving the practical applicability of data generation paradigms. The UNIGEN dataset consists of 1 million data entries, which are generated via large pre-trained language models (PLMs) without relying on manually annotated data. It is suitable for a variety of natural language processing (NLP) tasks and demonstrates prominent advantages particularly in solving domain generalization problems.

提供机构：

韩国中央大学

创建时间：

2024-05-02

5,000+

优质数据集

54 个

任务类型

进入经典数据集