SIRIS-Lab/erc-classification-dataset
收藏Hugging Face2025-07-24 更新2025-08-09 收录
下载链接:
https://hf-mirror.com/datasets/SIRIS-Lab/erc-classification-dataset
下载链接
链接失效反馈官方服务:
资源简介:
ERC面板分类数据集旨在通过研究论文的标题和摘要对多标签分类器进行微调,以预测一个或多个ERC(欧洲研究理事会)面板。该数据集包含三个划分:训练集、测试-面板集和测试-人工集。训练集是通过三种不同的大型语言模型(LLM)的伪标签生成的,包含基于每篇论文标题和摘要内容的多标签面板分配。测试-面板集包含只有单个面板分配给每个文档的ERC项目。测试-人工集是在训练集中LLM之间存在分歧的情况下,使用Argilla创建的,由人工注释者进行审查并基于多数同意原则分配最终标签。
The ERC Panel Classification Dataset is designed to fine-tune a multi-label classifier to predict one or more ERC (European Research Council) panels based on research paper titles and abstracts. The dataset consists of three splits: the training set, test-panels, and test-humans. The training set is generated through pseudolabeling using outputs from three different large language models (LLMs), containing multi-label panel assignments based on the content of each papers title and abstract. The test-panels consist of ERC projects with single panel assignments. The test-humans set is created using Argilla for cases in the training set where there was disagreement between the LLMs, with human annotators reviewing the documents and assigning final labels based on majority agreement.
提供机构:
SIRIS-Lab



