Data and code supplementary files for “A genomic medicine approach to identifying novel drugs” PhD thesis
收藏NIAID Data Ecosystem2026-05-10 收录
下载链接:
https://figshare.com/articles/dataset/Data_and_code_supplementary_files_for_A_genomic_medicine_approach_to_identifying_novel_drugs_PhD_thesis/27073753
下载链接
链接失效反馈官方服务:
资源简介:
This study was conducted entirely in-silico.
The majority of the analysis code used for this study was developed as a set of Jupyter notebooks, stored as plain python files with a ‘.py’ extension that can be opened and run in a Jupyter environment according to the README.md file found in the unzipped Supplementary File 5: SF5-code_archive.zip, which includes all chapters of this thesis. With the relevant data in place (Supplementary File 6: SF6-appendix-B_archive.zip and Supplementary File 7: SF7-results-data_archive.tar.gz) and software dependencies installed, the analyses can be repeated and this thesis generated as a pdf using a script (generate_thesis.sh), with the vast majority of graphs, charts and tables appearing in the thesis able to be reproduced, investigated and tested - allowing the review and identification of bugs and issues with the aim of providing a more robust and accurate analysis.
Contents:Supplementary File 3: SF3-DUGGIE_DGI_DB.zip - The DUGGIE (DrUG-Gene IntEractions) drug-gene interaction database, consisting of a list of 1,323 approved drugs identified by ATC code, each with a gene target list of 5 or more targets. The data set contains 5,600 unique gene targets in 64,312 unique interactions with drugs, collated from the freely available online datasets STITCH, T3DB, GtoPdb, DrugBank, DSigDB, TTD and DGIdb.
Supplementary File 4: SF4-STITCH_DGI_DB.zip - The STITCH drug-gene interaction database, the largest contributing database to DUGGIE, formatted and quality controlled in an identical manner to DUGGIE for comparison purposes.
Supplementary File 5: SF5-code_archive.zip - Archive of bash scripts and python 3 Jupyter notebook code used to conduct this project.
Supplementary File 6: SF6-appendix-B_archive.zip - Archive of scripts and results supporting the mini analysis in Appendix B of the thesis, investigating permutation issues encountered with the MAGMA gene set analysis tool.
Supplementary File 7: SF7-results-data_archive.tar.gz - Archive of all result data, sufficient to recreate the thesis document using the original thesis Jupyter notebooks found in supplementary File 5.
Licences:Documentation and Thesis © Copyright 2023 Mark Einon, Licensed under the Creative Commons Attribution-NoDerivatives 4.0 International (CC BY-ND 4.0) license. See documentation-license.txt in supplementary file 5.
Software © Copyright 2023 Cardiff University, Licensed under the GNU AFFERO GENERAL PUBLIC LICENSE (AGPL). See software-license.txt in supplementary file 5.
DUGGIE contributing data licences:STITCH: https://creativecommons.org/licenses/by-nc-sa/4.0/T3DB: "T3DB is offered to the public as a freely available resource. Use and re-distribution of the data, in whole or in part, for commercial purposes requires explicit permission of the authors and explicit acknowledgment of the source material (T3DB) and the original publication (see below). We ask that users who download significant portions of the database cite the T3DB paper in any resulting publications." http://www.t3db.ca/downloadsGtoPdb: Contents https://creativecommons.org/licenses/by-sa/4.0/DrugBank: https://creativecommons.org/licenses/by-nc/4.0/DsigDB: "DSigDB is freely accessible: http://tanlab.ucdenver.edu/DSigDB." - User manualTTD: Unclear. Website states "All Rights Reserved" but resource structure and description in 2002 publication indicate "open-access".DGIdb: "The data used in DGIdb is all open access and where possible made available as raw data dumps in the downloads section." (https://www.dgidb.org/browse/sources)
本研究完全基于计算机模拟(in-silico)开展。
本研究所用的大部分分析代码以Jupyter笔记本(Jupyter notebooks)套件形式开发,存储为扩展名为".py"的纯Python文件,可根据解压后的补充文件5:SF5-code_archive.zip中的README.md文件,在Jupyter环境中打开并运行,该压缩包包含本论文的全部章节。在配置好相关数据(补充文件6:SF6-appendix-B_archive.zip与补充文件7:SF7-results-data_archive.tar.gz)并安装软件依赖后,可通过脚本(generate_thesis.sh)重复分析流程并生成PDF格式的论文;论文中的绝大多数图表与表格均可复现、探究与验证,便于审阅并排查漏洞与问题,旨在提供更稳健、精准的分析结果。
内容说明:
补充文件3:SF3-DUGGIE_DGI_DB.zip——DUGGIE(DrUG-Gene IntEractions,药物-基因相互作用数据库),包含1323种经ATC代码标识的获批药物,每种药物均配有至少5个基因靶点的列表。该数据集包含64312个独特药物-基因相互作用对,涉及5600个独特基因靶点,整合自公开在线数据集STITCH、T3DB、GtoPdb、DrugBank、DSigDB、TTD与DGIdb。
补充文件4:SF4-STITCH_DGI_DB.zip——STITCH药物-基因相互作用数据库,是DUGGIE的最大贡献数据源,为便于对比,已按照与DUGGIE一致的格式与质量控制流程进行处理。
补充文件5:SF5-code_archive.zip——本项目所用的Bash脚本与Python 3 Jupyter笔记本代码归档。
补充文件6:SF6-appendix-B_archive.zip——支持论文附录B小型分析的脚本与结果归档,用于探究MAGMA基因集分析工具遇到的置换问题。
补充文件7:SF7-results-data_archive.tar.gz——所有结果数据归档,可借助补充文件5中的原论文Jupyter笔记本复现论文文档。
许可协议:
文档与论文 © 2023 Mark Einon 版权所有,采用知识共享署名-禁止演绎4.0 国际许可协议(CC BY-ND 4.0),详情请见补充文件5中的documentation-license.txt。
软件 © 2023 加的夫大学(Cardiff University)版权所有,采用GNU Affero通用公共许可证(AGPL),详情请见补充文件5中的software-license.txt。
DUGGIE贡献数据源许可协议:
STITCH:https://creativecommons.org/licenses/by-nc-sa/4.0/
T3DB:"T3DB向公众免费开放。对数据库全部或部分内容的使用与再分发(用于商业用途)需获得作者的明确许可,并需明确标注来源材料(T3DB)与原始文献(详见下文)。请下载该数据库大量内容的用户在其发表的成果中引用T3DB相关论文。" http://www.t3db.ca/downloads
GtoPdb:https://creativecommons.org/licenses/by-sa/4.0/
DrugBank:https://creativecommons.org/licenses/by-nc/4.0/
DSigDB:"DSigDB可免费访问:http://tanlab.ucdenver.edu/DSigDB。"——用户手册
TTD:许可条款不明确。其官网标注"保留所有权利",但2002年出版物中的资源结构与描述显示其为"开放获取"资源。
DGIdb:"DGIdb中使用的数据均为开放获取数据,尽可能以原始数据转储的形式在下载专区提供。"(https://www.dgidb.org/browse/sources)
创建时间:
2025-09-20



