UNIGEN
收藏arXiv2024-05-03 更新2024-06-24 收录
下载链接:
https://anonymous.4open.science/r/unigen-ARR/
下载链接
链接失效反馈官方服务:
资源简介:
UNIGEN是由韩国中央大学开发的一个用于情感分类的通用领域泛化数据集。该数据集通过零样本数据生成技术创建,旨在使小型任务模型能够泛化到任何共享标签空间的领域,从而提高数据生成范式的实际应用性。UNIGEN数据集包含100万个数据条目,通过使用大型预训练语言模型(PLM)生成,无需人工标注数据,适用于多种自然语言处理任务,特别是在解决领域泛化问题上具有显著优势。
UNIGEN is a general-domain generalization dataset for sentiment classification developed by Chung-Ang University, Republic of Korea. This dataset is constructed using zero-shot data generation technologies, with the goal of enabling small task-specific models to generalize across any domain that shares a unified label space, thereby improving the practical applicability of data generation paradigms. The UNIGEN dataset consists of 1 million data entries, which are generated via large pre-trained language models (PLMs) without relying on manually annotated data. It is suitable for a variety of natural language processing (NLP) tasks and demonstrates prominent advantages particularly in solving domain generalization problems.
提供机构:
韩国中央大学
创建时间:
2024-05-02



