TRECCOVID-RF-Aspire

Name: TRECCOVID-RF-Aspire
Creator: figshare
Published: 2025-05-01 07:13:43
License: 暂无描述

DataCite Commons2025-05-01 更新2024-07-29 收录

下载链接：

https://figshare.com/articles/dataset/TRECCOVID-RF-Aspire/19425515/1

下载链接

链接失效反馈

官方服务：

资源简介：

This is a copy of the TRECCOVID-RF dataset used in the paper "Multi-Vector Models with Textual Guidance for Fine-Grained Scientific Document Similarity" by Sheshera Mysore, Arman Cohan, Tom Hope. The TRECCOVID dataset presents an ad-hoc search dataset. The versions of the original dataset used may be accessed here: query topics, relevance annotations, and the metadata for papers is obtained from the CORD-19 dataset in the 2021-06-21 release. The dataset released here converts the original TRECCOVID dataset into a reformulated form, TRECCOVID-RF which is used in the paper. See further details of the paper, how this dataset was compiled, and how it was used: https://github.com/allenai/aspire The contents of the dataset are as follows: <code>abstracts-treccovid.jsonl</code>: <code>jsonl</code> file containing the paper-id, abstracts, and titles for the queries and candidates which are part of the dataset. <code><code>treccovid</code>-queries-release.csv</code>: Metadata associated with every query.<code>test-pid2anns-</code><code><code>treccovid</code>.json</code>: JSON file with the query paper-id, candidate paper-ids for every query paper in the dataset. Use these files in conjunction with <code>abstracts-</code><code><code>treccovid</code>.jsonl</code> to generate files for use in model evaluation. <code><code>treccovid</code>-evaluation_splits.json</code>: Paper-ids for the splits to use in reporting evaluation numbers. <code>aspire/src/evaluation/ranking_eval.py</code> included in the github repo accompanying this dataset implements the evaluation protocol and computes evaluation metrics. Please see the paper for descriptions of the experimental protocol we recommend to report evaluation metrics.

提供机构：

figshare

创建时间：

2022-03-26

5,000+

优质数据集

54 个

任务类型

进入经典数据集