five

ZEROGEN

收藏
arXiv2022-10-22 更新2024-06-21 收录
下载链接:
https://github.com/HKUNLP/ZeroGen
下载链接
链接失效反馈
官方服务:
资源简介:
ZEROGEN是由上海人工智能实验室创建的数据集,用于零样本学习研究。该数据集通过大规模预训练语言模型(PLMs)生成,包含约20万条数据,无需人工标注。数据集的创建过程涉及使用特定任务提示引导PLMs生成训练数据,然后训练小型任务模型(如LSTM)。ZEROGEN主要应用于自然语言处理领域,旨在解决零样本学习问题,即模型在未见过的任务上的性能表现。

ZEROGEN is a dataset developed by the Shanghai AI Laboratory for zero-shot learning research. It is generated using large-scale pre-trained language models (PLMs), containing approximately 200,000 data samples without manual annotation. The dataset creation process involves using task-specific prompts to guide PLMs in generating training data, followed by the training of small task-specific models such as LSTMs. ZEROGEN is mainly applied in the field of natural language processing, aiming to solve the zero-shot learning problem, which refers to the performance of models on tasks they have not encountered during training.
提供机构:
上海人工智能实验室
创建时间:
2022-02-16
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作