five

Benchmark for Pairs of Papers in Semantic Scholar: 1 hop vs. 2-4 hops version 0.0

收藏
NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/8222853
下载链接
链接失效反馈
官方服务:
资源简介:
Benchmark for Pairs of Papers in Semantic Scholar: 1 hop vs.  2-4 hops (version 0.0) There are two files: valid.txt and test.txt; both files use the same format. Columns 2 and 3 are corpus ids from Semantic Scholar. Column 1 is the distance between the two papers in the citation index. Columns 4 and 5 are the bins of the two paper, respectively.  The bin is a number between 0 and 100.  Papers are sorted by publication date.  There are about 2M papers per bin, with the oldest papers in bin 0, and the newest papers in bin 99. Bin 100 is a catch-all for papers with unknown publication dates. head valid.txt 1       248518397       1041744 97      51 2       248518397       23848439        97      21 3       248518397       4235810 97      12 4       248518397       82079949        97      11 1       3374228 140728989       79      0 1       68334187        36144275        58      34 2       68334187        7008060 58      4 1       205881482       94036919        77      72 2       205881482       95069173        77      53 3       205881482       53480264        77      52 Each row is assigned to a bin, B, where B = max(col4, col5).   Task: the task is to distinguish pairs of papers with distance == 1 from pairs of papers with distance > 1. Test/Train splits: For all thresholds, 0 <= T_{train} <= 99, train a model on rows in bins between 0 and T_{train} (inclusively).  Test these models on rows in all bins 0 <= T_{test} <= 99.  Report average accuracy for all combinations of T_{train} and T_{test}.   Average Accuracy is defined as: mean(Predict(row) == 1, Gold(row) == 1) The means are computed over rows in a test bin.
创建时间:
2023-08-12
二维码
社区交流群
二维码
科研交流群
商业服务