Benchmark for Pairs of Papers in Semantic Scholar: 1 hop vs. 2-4 hops version 0.0
收藏NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/8222853
下载链接
链接失效反馈官方服务:
资源简介:
Benchmark for Pairs of Papers in Semantic Scholar: 1 hop vs. 2-4 hops (version 0.0)
There are two files: valid.txt and test.txt; both files use the same format.
Columns 2 and 3 are corpus ids from Semantic Scholar.
Column 1 is the distance between the two papers in the citation index.
Columns 4 and 5 are the bins of the two paper, respectively. The bin is a number between 0 and 100. Papers are sorted by publication date. There are about 2M papers per bin, with the oldest papers in bin 0, and the newest papers in bin 99.
Bin 100 is a catch-all for papers with unknown publication dates.
head valid.txt
1 248518397 1041744 97 51
2 248518397 23848439 97 21
3 248518397 4235810 97 12
4 248518397 82079949 97 11
1 3374228 140728989 79 0
1 68334187 36144275 58 34
2 68334187 7008060 58 4
1 205881482 94036919 77 72
2 205881482 95069173 77 53
3 205881482 53480264 77 52
Each row is assigned to a bin, B, where B = max(col4, col5).
Task: the task is to distinguish pairs of papers with distance == 1 from pairs of papers with distance > 1.
Test/Train splits: For all thresholds, 0 <= T_{train} <= 99, train a model on rows in bins between 0 and T_{train} (inclusively). Test these models on rows in all bins 0 <= T_{test} <= 99. Report average accuracy for all combinations of T_{train} and T_{test}.
Average Accuracy is defined as: mean(Predict(row) == 1, Gold(row) == 1)
The means are computed over rows in a test bin.
创建时间:
2023-08-12



