PMOA-CITE dataset
收藏figshare.com2023-06-04 更新2025-03-26 收录
下载链接:
https://figshare.com/articles/dataset/PMOA-CITE_dataset/12547574/1
下载链接
链接失效反馈官方服务:
资源简介:
The dataset used in the experiments on the paper "Modeling citation worthiness by using attention‑based
bidirectional long short‑term memory networks
and interpretable models"There are one million sentences in total, and further splitted into trainning, validation and testing by 60%, 20% and 20%, respectively.For the pre-processing of the dataset, please refer to the paper.The data are stored in jsonl format (each row is an json object), we list a couple of rows as example:{"sec_name":"introduction","cur_sent_id":"12213838@0#3$0","next_sent_id":"12213838@0#3$1","cur_sent":"All three spectrin subunits are essential for normal development.","next_sent":"βH, encoded by the karst locus, is an essential protein that is required for epithelial morphogenesis .","cur_scaled_len_features":{"type":1,"values":[0.17716535433070865,0.13513513513513514]},"next_scaled_len_features":{"type":1,"values":[0.32677165354330706,0.35135135135135137]},"cur_has_citation":0,"next_has_citation":1}{"sec_name":"results","prev_sent_id":"12230634@1@1#0$2","cur_sent_id":"12230634@1@1#0$3","next_sent_id":"12230634@1@1#0$4","prev_sent":"μIU/ml at the 2.0-h postprandial time point.","cur_sent":"Statistically significant differences between the mean plasma insulin levels of dogs treated with 50 mg/kg of GSNO, and those treated with 50 mg/kg GSNO and vitamin C (50 mg/kg) were observed at the 1.0-h and 1.5-h time points (P < 0.05).","next_sent":"The mean plasma insulin concentrations in the dogs treated with 50 mg/kg of vitamin C and 50 mg/kg of GSNO, or 50 mg/kg of GSNO was significantly altered compared to those of controls or captopril-treated dogs (P < 0.05).","prev_scaled_len_features":{"type":1,"values":[0.09448818897637795,0.08108108108108109]},"cur_scaled_len_features":{"type":1,"values":[0.8582677165354331,1.0]},"next_scaled_len_features":{"type":1,"values":[0.7913385826771654,0.9459459459459459]},"prev_has_citation":0,"cur_has_citation":0,"next_has_citation":0}{"sec_name":"results","prev_sent_id":"12213837@1@0#3$3","cur_sent_id":"12213837@1@0#3$4","next_sent_id":"12213837@1@0#3$5","prev_sent":"Cleavage of VAMP2 by BoNT/D releases the NH2-terminal 59 amino acids from the protein and eliminates exocytosis.","cur_sent":"However, in this case, exocytosis cannot be recovered by addition of the cleaved fragment .","next_sent":"Peptides that exactly correspond to the BoNT/D cleavage site (VAMP2 aa 25–59 and 60–94-cys) were equally efficient at mediating liposome fusion (unpublished data).","prev_scaled_len_features":{"type":1,"values":[0.36220472440944884,0.35135135135135137]},"cur_scaled_len_features":{"type":1,"values":[0.2795275590551181,0.2972972972972973]},"next_scaled_len_features":{"type":1,"values":[0.562992125984252,0.5135135135135135]},"prev_has_citation":0,"cur_has_citation":1,"next_has_citation":0}For the code using this dataset to modeling citation worthiness, please refer to https://github.com/sciosci/cite-worthiness
本研究论文《基于注意力机制的双向长短期记忆网络与可解释模型构建文献价值模型》所采用的实验数据集,总计包含一百万条句子,并按60%、20%、20%的比例分别划分为训练集、验证集和测试集。数据集的预处理方法请参阅相关论文。数据以l格式存储(每行为一个JSON对象),以下为部分示例数据:
{
"sec_name":"introduction",
"cur_sent_id":"12213838@0#3$0",
"next_sent_id":"12213838@0#3$1",
"cur_sent":"所有三种肌动蛋白亚基对于正常发育至关重要。",
"next_sent":"βH,由karst基因座编码,是一种对上皮形态发生必不可少的蛋白质。",
"cur_scaled_len_features":{
"type":1,
"values":[0.17716535433070865,0.13513513513513514]
},
"next_scaled_len_features":{
"type":1,
"values":[0.32677165354330706,0.35135135135135137]
},
"cur_has_citation":0,
"next_has_citation":1
},
{
"sec_name":"results",
"prev_sent_id":"12230634@1@1#0$2",
"cur_sent_id":"12230634@1@1#0$3",
"next_sent_id":"12230634@1@1#0$4",
"prev_sent":"在2.0小时餐后时间点,浓度为μIU/ml。",
"cur_sent":"在1.0小时和1.5小时时间点,观察到用50 mg/kg的GSNO治疗的狗与用50 mg/kg GSNO和维生素C(50 mg/kg)治疗的狗的血浆胰岛素水平之间有统计学意义的差异(P < 0.05)。",
"next_sent":"与对照或卡托普利处理的狗相比,用50 mg/kg的维生素C和50 mg/kg的GSNO或仅用50 mg/kg的GSNO治疗的狗的血浆胰岛素浓度显著改变(P < 0.05)。",
"prev_scaled_len_features":{
"type":1,
"values":[0.09448818897637795,0.08108108108108109]
},
"cur_scaled_len_features":{
"type":1,
"values":[0.8582677165354331,1.0]
},
"next_scaled_len_features":{
"type":1,
"values":[0.7913385826771654,0.9459459459459459]
},
"prev_has_citation":0,
"cur_has_citation":0,
"next_has_citation":0
},
{
"sec_name":"results",
"prev_sent_id":"12213837@1@0#3$3",
"cur_sent_id":"12213837@1@0#3$4",
"next_sent_id":"12213837@1@0#3$5",
"prev_sent":"BoNT/D通过裂解VAMP2释放蛋白质的氨基末端59个氨基酸并消除胞吐作用。",
"cur_sent":"然而,在这种情况下,通过添加裂解片段无法恢复胞吐作用。",
"next_sent":"与BoNT/D裂解位点(VAMP2 aa 25–59和60–94-cys)完全对应的肽段在介导脂质体融合方面具有同等效率(未发表数据)。",
"prev_scaled_len_features":{
"type":1,
"values":[0.36220472440944884,0.35135135135135137]
},
"cur_scaled_len_features":{
"type":1,
"values":[0.2795275590551181,0.2972972972972973]
},
"next_scaled_len_features":{
"type":1,
"values":[0.562992125984252,0.5135135135135135]
},
"prev_has_citation":0,
"cur_has_citation":1,
"next_has_citation":0
}
对于使用此数据集进行文献价值建模的代码,请参阅https://github.com/sciosci/cite-worthiness
提供机构:
figshare.com



