five

SIRIS-Lab/erc-classification-dataset

收藏
Hugging Face2025-07-24 更新2025-08-09 收录
下载链接:
https://hf-mirror.com/datasets/SIRIS-Lab/erc-classification-dataset
下载链接
链接失效反馈
官方服务:
资源简介:
ERC面板分类数据集旨在通过研究论文的标题和摘要对多标签分类器进行微调,以预测一个或多个ERC(欧洲研究理事会)面板。该数据集包含三个划分:训练集、测试-面板集和测试-人工集。训练集是通过三种不同的大型语言模型(LLM)的伪标签生成的,包含基于每篇论文标题和摘要内容的多标签面板分配。测试-面板集包含只有单个面板分配给每个文档的ERC项目。测试-人工集是在训练集中LLM之间存在分歧的情况下,使用Argilla创建的,由人工注释者进行审查并基于多数同意原则分配最终标签。

The ERC Panel Classification Dataset is designed to fine-tune a multi-label classifier to predict one or more ERC (European Research Council) panels based on research paper titles and abstracts. The dataset consists of three splits: the training set, test-panels, and test-humans. The training set is generated through pseudolabeling using outputs from three different large language models (LLMs), containing multi-label panel assignments based on the content of each papers title and abstract. The test-panels consist of ERC projects with single panel assignments. The test-humans set is created using Argilla for cases in the training set where there was disagreement between the LLMs, with human annotators reviewing the documents and assigning final labels based on majority agreement.
提供机构:
SIRIS-Lab
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作