Reactome modified for tracing ArangoDB version
收藏NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/10103767
下载链接
链接失效反馈官方服务:
资源简介:
Reactome database download and customization
The Reactome database [1,2] was downloaded as a neo4j graph database (https://reactome.org/download-data version 75), which is covered by the Creative Commons Attribution 4.0 International (CC BY 4.0) license. A series of database queries was used to generate a database version suitable for graph data science which can be followed in detail in the attached Jupyter notebook (Or at GDS-Public/notebooks/reactome/Reactome GDS.ipynb at main · SBRG/GDS-Public (github.com)).
Nodes, labels and relationships not required for graph algorithmic analyses were removed. For instance, this included nodes like person, affiliation, and taxa as well as all nodes representing entities of organisms other than Homo sapiens. Subcellular locations (compartments) of biological entities were set as node properties. To allow for improved graph traversal, selected relationships were reversed or added. Because currency metabolites, e.g. ATP, NAD(P)H and H+, can artificially connect metabolic reactions and pathways in network analyses [3,4], we labelled such compounds plus the regulatory protein ubiquitin accordingly and thereby excluded them from all analyses. Finally, the database was transformed into an ArangoDB graph database consisting of 1,703,054 nodes and 3,368,926 edges.
References
1. Gillespie, M. et al. The reactome pathway knowledgebase 2022. Nucleic Acids Research 50, D687–D692 (2022).
2. Fabregat, A. et al. Reactome graph database: Efficient access to complex pathway data. PLoS Computational Biology 14, (2018).
3. Ma, H. & Zeng, A.-P. Reconstruction of metabolic networks from genome data and analysis of their global structure for various organisms. BIOINFORMATICS vol. 19 https://academic.oup.com/bioinformatics/article/19/2/270/372721 (2003).
4. Martínez, V. S. et al. The topology of genome-scale metabolic reconstructions unravels independent modules and high network flexibility. PLoS Computational Biology 18, (2022).
Reactome数据库下载与定制化处理
Reactome数据库[1,2]以Neo4j图数据库(Neo4j)的形式下载(下载地址:https://reactome.org/download-data,版本75),该数据集采用知识共享署名4.0国际许可协议(Creative Commons Attribution 4.0 International,CC BY 4.0)进行授权。通过一系列数据库查询操作,我们构建了适配图数据科学研究的定制化数据库版本,详细流程可参考附带的Jupyter笔记本(Jupyter Notebook),或访问GitHub仓库SBRG/GDS-Public下的路径GDS-Public/notebooks/reactome/Reactome GDS.ipynb。
移除了图算法分析无需用到的节点、标签与关系:例如人员、所属机构、分类单元等节点,以及所有非智人(Homo sapiens)来源的生物实体节点。将生物实体的亚细胞定位(区室)设置为节点属性。为优化图遍历性能,对部分选定的关系进行了反向处理或新增。由于通用代谢物(如ATP、NAD(P)H及H+)会在网络分析中人为连接代谢反应与通路[3,4],我们对这类化合物及调控蛋白泛素(ubiquitin)进行了标记,并将其从所有分析中排除。最终,该数据库被转换为ArangoDB图数据库(ArangoDB),共包含1,703,054个节点与3,368,926条边。
参考文献
1. Gillespie, M. 等. Reactome通路知识库2022. 核酸研究(Nucleic Acids Research)50, D687–D692 (2022).
2. Fabregat, A. 等. Reactome图数据库:高效访问复杂通路数据. PLoS计算生物学(PLoS Computational Biology)14, (2018).
3. Ma, H. & Zeng, A.-P. 从基因组数据重建代谢网络并分析不同生物的全局结构. 生物信息学(BIOINFORMATICS)第19卷 https://academic.oup.com/bioinformatics/article/19/2/270/372721 (2003).
4. Martínez, V. S. 等. 基因组规模代谢重建的拓扑结构揭示独立模块与高网络灵活性. PLoS计算生物学(PLoS Computational Biology)18, (2022).
创建时间:
2023-11-10



