five

Dataset - Templates Recommendation in the Open Research Knowledge Graph

收藏
NIAID Data Ecosystem2026-03-13 收录
下载链接:
https://zenodo.org/record/6607164
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset has been created for implementing a content-based recommender system in the context of the Open Research Knowledge Graph (ORKG). The recommender system accepts research paper's title and abstracts as input and recommends existing templates in the ORKG semantically relevant to the given paper.   Two approaches have been trained on this dataset in the context of this master's thesis, namely a Natural Language Inference (NLI) approach based on SciBERT embeddings and an unsupervised approach based on ElasticSearch.   This publication consists therefore of one general dataset, two training sets for each approach, validation set for the supervised approach and a test set for both approaches.   dataset.json The main JSON object consists of a list of templates and a list of neutral papers. Each template object has an ID, label, list of research fields, list of properties and list of papers using that template, whereas each paper object has ID, label, DOI, research field and abstract. Each neutral paper object has the same schema of a paper object using that template. See an example instance below.   { "templates": [ { "id": "R138668", "label": "Psychiatric Disorders AI Overview", "research_fields": [ { "id": "http://orkg.org/orkg/resource/R133", "label": "Artificial Intelligence" } ... ], "properties": [ "Study cohort", ... ], "papers": [ { "id": "R138698", "label": "Application of Autoencoder in Depression Diagnosis", "doi": "10.12783/dtcse/csma2017/17335", "research_field": { "id": "R104", "label": "Bioinformatics" }, "abstract": "Major depressive disorder (MDD) is a mental disorder characterized by at least two weeks of low mood which is present across most situations. Diagnosis of MDD using rest-state functional magnetic resonance imaging (fMRI) data faces many challenges due to the high dimensionality, small samples, noisy and individual variability. No method can automatically extract discriminative features from the origin time series in fMRI images for MDD diagnosis. In this study, we proposed a new method for feature extraction and a workflow which can make an automatic feature extraction and classification without a prior knowledge. An autoencoder was used to learn pre-training parameters of a dimensionality reduction process using 3-D convolution network. Through comparison with the other three feature extraction methods, our method achieved the best classification performance. This method can be used not only in MDD diagnosis, but also other similar disorders." }, ... }, ... ] "neutral_papers": [ { "id": "R109377", "label": "Structural basis of SARS-CoV-2 3CLpro and anti-COVID-19 drug discovery from medicinal plants", "doi": "10.1016/j.jpha.2020.03.009", "research_field": { "id": "R104", "label": "Bioinformatics" }, "abstract": "Abstract The recent outbreak of coronavirus disease 2019 (COVID-19) caused by SARS-CoV-2 in December 2019 raised global health concerns. The viral 3-chymotrypsin-like cysteine protease (3CLpro) enzyme controls coronavirus replication and is essential for its life cycle. 3CLpro is a proven drug discovery target in the case of severe acute respiratory syndrome coronavirus (SARS-CoV) and middle east respiratory syndrome coronavirus (MERS-CoV). Recent studies revealed that the genome sequence of SARS-CoV-2 is very similar to that of SARS-CoV. Therefore, herein, we analysed the 3CLpro sequence, constructed its 3D homology model, and screened it against a medicinal plant library containing 32,297 potential anti-viral phytochemicals/traditional Chinese medicinal compounds. Our analyses revealed that the top nine hits might serve as potential anti- SARS-CoV-2 lead molecules for further optimisation and drug development process to combat COVID-19." }, ... ] }   All other files The main JSON object consists of a list of entailments, a list of contradiction and a list of neutrals. Each object of the above mentioned lists has the same schema. An instance_id created by concatenating the template_id (when exists) with the paper_id, a template_id, a paper_id, premise (representing the paper's title), hypthesis (representing the paper's abstract), their concatenation in sequence and the target class. See an example instance below.   { "entailments": [ { "instance_id": "R138668xR138698", "template_id": "R138668", "paper_id": "R138698", "premise": "psychiatric disorders ai overview study cohort outcome assessment aims performance findings used models data", "hypothesis": "application of autoencoder in depression diagnosis major depressive disorder (mdd) is a mental disorder characterized by at least two weeks of low mood which is present across most situations diagnosis of mdd using rest state functional magnetic resonance imaging (fmri) data faces many challenges due to the high dimensionality, small samples, noisy and individual variability no method can automatically extract discriminative features from the origin time series in fmri images for mdd diagnosis in this study, we proposed a new method for feature extraction and a workflow which can make an automatic feature extraction and classification without a prior knowledge an autoencoder was used to learn pre training parameters of a dimensionality reduction process using 3 d convolution network through comparison with the other three feature extraction methods, our method achieved the best classification performance this method can be used not only in mdd diagnosis, but also other similar disorders", "sequence": "[CLS] psychiatric disorders ai overview study cohort outcome assessment aims performance findings used models data [SEP] application of autoencoder in depression diagnosis major depressive disorder (mdd) is a mental disorder characterized by at least two weeks of low mood which is present across most situations diagnosis of mdd using rest state functional magnetic resonance imaging (fmri) data faces many challenges due to the high dimensionality, small samples, noisy and individual variability no method can automatically extract discriminative features from the origin time series in fmri images for mdd diagnosis in this study, we proposed a new method for feature extraction and a workflow which can make an automatic feature extraction and classification without a prior knowledge an autoencoder was used to learn pre training parameters of a dimensionality reduction process using 3 d convolution network through comparison with the other three feature extraction methods, our method achieved the best classification performance this method can be used not only in mdd diagnosis, but also other similar disorders [SEP]", "target": "entailment" }, ... ], "contradictions": [ ... ], "neutrals": [ ... ] }   Statistics - Training (supervised) Validation (supervised) Training (unsupervised) Test Entailment 180 20 200 52 Neutral 180 20 200 64 Contradictrion 736 84 0 0 Total 1096 124 400 116
创建时间:
2022-06-03
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作