five

PheKnowLator Human Disease Knowledge Graphs - Build Data (Original)|疾病知识图谱数据集|生物医学数据整合数据集

收藏
Mendeley Data2024-05-10 更新2024-06-28 收录
疾病知识图谱
生物医学数据整合
下载链接:
https://zenodo.org/records/7026640
下载链接
链接失效反馈
资源简介:
RELEASE V2.1.0 KNOWLEDGE GRAPH: ORIGINAL DATA SOURCES Release: v2.1.0 The goal of this build was to create a knowledge graph that represented human disease mechanisms and included the central dogma. The data sources utilized in this release include many of the sources used in the initial release, as well as some new data made available by the Comparative Toxicogenomics Database and experimental data from the Human Protein Atlas. Data sources are listed by type (Ontology and Data not represented in an ontology [Database Sources]). Additional details are provided for each data source below. Please see documentation on the primary release (https://github.com/callahantiff/PheKnowLator/wiki/v2-Data-Sources) for additional details on each data source as well as citation information. Data Access: https://console.cloud.google.com/storage/browser/pheknowlator/archived_builds/release_v2.1.0/build_01MAY2021 ONTOLOGIES Cell Ontology Cell Line Ontology Chemical Entities of Biological Interest (ChEBI) Ontology Gene Ontology Human Phenotype Ontology Mondo Disease Ontology Pathway Ontology Protein Ontology Relations Ontology Sequence Ontology Uber-Anatomy Ontology Vaccine Ontology Cell Ontology (CL) Homepage: GitHub Citation: Bard J, Rhee SY, Ashburner M. An ontology for cell types. Genome Biology. 2005;6(2):R21 Usage: Utilized to connect transcripts and proteins to cells. Additionally, the edges between this ontology and its dependencies are utilized: ChEBI GO PATO PRO RO UBERON Cell Line Ontology (CLO) Homepage: http://www.clo-ontology.org/ Citation: Sarntivijai S, Lin Y, Xiang Z, Meehan TF, Diehl AD, Vempati UD, Schürer SC, Pang C, Malone J, Parkinson H, Liu Y. CLO: the cell line ontology. Journal of Biomedical Semantics. 2014;5(1):37 Usage: Utilized this ontology to map cell lines to transcripts and proteins. Additionally, the edges between this ontology and its dependencies are utilized: CL DOID NCBITaxon UBERON Chemical Entities of Biological Interest (ChEBI) Homepage: https://www.ebi.ac.uk/chebi/ Citation: Hastings J, Owen G, Dekker A, Ennis M, Kale N, Muthukrishnan V, Turner S, Swainston N, Mendes P, Steinbeck C. ChEBI in 2016: Improved services and an expanding collection of metabolites. Nucleic Acids Research. 2015;44(D1):D1214-9 Usage: Utilized to connect chemicals to complexes, diseases, genes, GO biological processes, GO cellular components, GO molecular functions, pathways, phenotypes, reactions, and transcripts. Gene Ontology (GO) Homepage: http://geneontology.org/ Citations: Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA. Gene ontology: tool for the unification of biology. Nature Genetics. 2000;25(1):25 The Gene Ontology Consortium. The Gene Ontology Resource: 20 years and still GOing strong. Nucleic Acids Research. 2018;47(D1):D330-8 Usage: Utilized to connect biological processes, cellular components, and molecular functions to chemicals, pathways, and proteins. Additionally, the edges between this ontology and its dependencies are utilized: CL NCBITaxon RO UBERON Other Gene Ontology Data Used: goa_human.gaf.gz Human Phenotype Ontology (HPO) Homepage: https://hpo.jax.org/ Citation: Köhler S, Carmody L, Vasilevsky N, Jacobsen JO, Danis D, Gourdine JP, Gargano M, Harris NL, Matentzoglu N, McMurry JA, Osumi-Sutherland D. Expansion of the Human Phenotype Ontology (HPO) knowledge base and resources. Nucleic Acids Research. 2018;47(D1):D1018-27 Usage: Utilized to connect phenotypes to chemicals, diseases, genes, and variants. Additionally, the edges between this ontology and its dependencies are utilized: CL ChEBI GO UBERON Files Other Human Phenotype Ontology Data Used: phenotype.hpoa Mondo Disease Ontology (Mondo) Homepage: https://mondo.monarchinitiative.org/ Citation: Mungall CJ, McMurry JA, Köhler S, Balhoff JP, Borromeo C, Brush M, Carbon S, Conlin T, Dunn N, Engelstad M, Foster E. The Monarch Initiative: an integrative data and analytic platform connecting phenotypes to genotypes across species. Nucleic Acids Research. 2017;45(D1):D712-22 Usage: Utilized to connect diseases to chemicals, phenotypes, genes, and variants. Additionally, the edges between this ontology and its dependencies are utilized: CL NCBITaxon GO HPO UBERON Pathway Ontology (PW) Homepage: rgd.mcw.edu Citation: Petri V, Jayaraman P, Tutaj M, Hayman GT, Smith JR, De Pons J, Laulederkind SJ, Lowry TF, Nigam R, Wang SJ, Shimoyama M. The pathway ontology–updates and applications. Journal of Biomedical Semantics. 2014;5(1):7. Usage: Utilized to connect pathways to GO biological processes, GO cellular components, GO molecular functions, Reactome pathways. Several steps are taken in order to connect Pathway Ontology identifiers to Reactome pathways and GO biological processes. To connect Pathway Ontology identifiers to Reactome pathways, we use ComPath Pathway Database Mappings developed by Daniel Domingo-Fernández (PMID:30564458). Files Downloaded Mapping Data curated_mappings.txt kegg_reactome.csv Generated Mapping Data REACTOME_PW_GO_MAPPINGS.txt Protein Ontology (PRO) Homepage: https://proconsortium.org/ Citation: Natale DA, Arighi CN, Barker WC, Blake JA, Bult CJ, Caudy M, Drabkin HJ, D’Eustachio P, Evsikov AV, Huang H, Nchoutmboube J. The Protein Ontology: a structured representation of protein forms and complexes. Nucleic Acids Research. 2010;39(suppl_1):D539-45 Usage: Utilized to connect proteins to chemicals, genes, anatomy, catalysts, cell lines, cofactors, complexes, GO biological processes, GO cellular components, GO molecular functions, pathways, proteins, reactions, and transcripts. Additionally, the edges between this ontology and its dependencies are utilized: ChEBI DOID GO Notes: A partial, human-only version of this ontology was used. Details on how this version of the ontology was generated can be found under the Protein Ontology section of the Data_Preparation.ipynb Jupyter Notebook. Files Generated Human Version Protein Ontology (PRO) human_pro.owl (closed with hermit reasoner) Other PRO Data Used: promapping.txt Generated Mapping Data Merged Gene, RNA, Protein Map: Merged_gene_rna_protein_identifiers.pkl Ensembl Transcript-PRO Identifier Mapping: ENSEMBL_TRANSCRIPT_PROTEIN_ONTOLOGY_MAP.txt Entrez Gene-PRO Identifier Mapping: ENTREZ_GENE_PRO_ONTOLOGY_MAP.txt UniProt Accession-PRO Identifier Mapping: UNIPROT_ACCESSION_PRO_ONTOLOGY_MAP.txt STRING-PRO Identifier Mapping: STRING_PRO_ONTOLOGY_MAP.txt Relations Ontology (RO) Homepage: GitHub Citation: Smith B, Ceusters W, Klagges B, Köhler J, Kumar A, Lomax J, Mungall C, Neuhaus F, Rector AL, Rosse C. Relations in biomedical ontologies. Genome Biology. 2005;6(5):R46. Usage: Utilizing this ontology to connect all data sources in knowledge graph. Additionally, the ontology is queried prior to building the knowledge graph to identify all relations, their inverse properties, and their labels. Files Generated RO Data INVERSE_RELATIONS.txt RELATIONS_LABELS.txt Sequence Ontology (SO) Homepage: GitHub Citation: Eilbeck K, Lewis SE, Mungall CJ, Yandell M, Stein L, Durbin R, Ashburner M. The Sequence Ontology: a tool for the unification of genome annotations. Genome Biology. 2005;6(5):R44 Usage: Utilized to connect transcripts and other genomic material like genes and variants. Files Generated Mapping Data genomic_sequence_ontology_mappings.xlsx SO_GENE_TRANSCRIPT_VARIANT_TYPE_MAPPING.txt Uber-Anatomy Ontology (Uberon) Homepage: GitHub Citation: Mungall CJ, Torniai C, Gkoutos GV, Lewis SE, Haendel MA. Uberon, an integrative multi-species anatomy ontology. Genome Biology. 2012;13(1):R5 Usage: Utilized to connect tissues, fluids, and cells to proteins and transcripts. Additionally, the edges between this ontology and its dependencies are utilized: ChEBI CL GO PRO Vaccine Ontology (VO) Homepage: http://www.violinet.org/vaccineontology/ Citations: He Y, Racz R, Sayers S, Lin Y, Todd T, Hur J, Li X, Patel M, Zhao B, Chung M, Ostrow J. Updates on the web-based VIOLIN vaccine database and analysis system. Nucleic Acids Research. 2013;42(D1):D1124-32 Xiang Z, Todd T, Ku KP, Kovacic BL, Larson CB, Chen F, Hodges AP, Tian Y, Olenzek EA, Zhao B, Colby LA. VIOLIN: vaccine investigation and online information network. Nucleic Acids Research. 2007;36(suppl_1):D923-8 Usage: Utilized the edges between this ontology and its dependencies: ChEBI DOID GO PRO UBERON DATABASE SOURCES BioPortal ClinVar Comparative Toxicogenomics Database DisGeNET Ensembl GeneMANIA Genotype-Tissue Expression Project Human Genome Organisation Gene Nomenclature Committee Human Protein Atlas National Center for Biotechnology Information Gene Reactome Pathway Database Search Tool for Recurring Instances of Neighbouring Genes Database Universal Protein Resource Knowledgebase BioPortal Homepage: BioPortal Citation: BioPortal. Lexical OWL Ontology Matcher (LOOM) Ghazvinian A, Noy NF, Musen MA. Creating mappings for ontologies in biomedicine: simple methods work. In AMIA Annual Symposium Proceedings 2009 (Vol. 2009, p. 198). American Medical Informatics Association Usage: BioPortal was utilized to obtain mappings between MeSH identifiers and ChEBI identifiers for chemicals-diseases, chemicals-genes, chemical-GO biological processes, chemicals-GO cellular components, chemicals-GO molecular functions, chemicals-phenotypes, chemicals-proteins, and chemicals-transcripts. Additional information on how this data was processed can be obtained from the NCBO_rest_api.py GitHub Gist script. ⭐ ALTERNATIVE METHOD⭐ Since the above approach can take over two days to process, we have developed an alternative solution that downloads the mesh2021.nt data file directly from MeSH and the Flat_file_tab_delimited/names.tsv.gz file directly from ChEBI. Using these files, we have recapitulated the LOOM algorithm implemented by BioPortal when creating mappings between these resources. The procedure is relatively straightforward and utilizes the following information from each resource: For all MeSH SCR Chemicals, obtain the following information: Identifiers: MeSH identifiers Labels: string labels using the RDFS:label object property Synonyms: track down all synonyms using the vocab:concept and vocab:preferredConcept object properties For all ChEBI classes, obtain the following information: Labels: string labels using the RDFS:label object property Synonyms: track down all synonyms using all synonym object properties Files Generated Data: MESH_CHEBI_MAP.txt ClinVar Homepage: https://www.ncbi.nlm.nih.gov/clinvar/ Citation: Landrum MJ, Lee JM, Benson M, Brown GR, Chao C, Chitipiralla S, Gu B, Hart J, Hoffman D, Jang W, Karapetyan K. ClinVar: improving access to variant interpretations and supporting evidence. Nucleic Acids Research. 2017;46(D1):D1062-7 Usage: ClinVar was utilized to create variant-gene, variant-disease, and variant-phenotype edges. The original data is filtered such that only records meeting the following criteria were included: Assembly = "GRCh38" ClinSigSimple = 1 1 = at least one current record submitted with an interpretation of Likely pathogenic or Pathogenic (independent of whether that record includes assertion criteria and evidence)" ReviewStatus in ["criteria provided, multiple submitters, no conflicts", "reviewed by expert panel", "practice guideline"] Files Downloaded Data variant_summary.txt.gz var_citations.txt allele_gene.txt.gz Generated Edge Data: CLINVAR_VARIANT_GENE_DISEASE_PHENOTYPE_EDGES.txt Comparative Toxicogenomics Database (CTD) Homepage: http://ctdbase.org/ Citations: Curated [chemical–gene interactions|chemical-go interactions|chemical–disease interactions|gene–pathway interactions] data were retrieved from the Comparative Toxicogenomics Database (CTD), MDI Biological Laboratory, Salisbury Cove, Maine, and NC State University, Raleigh, North Carolina. World Wide Web (URL: http://ctdbase.org/) Davis AP, Grondin CJ, Johnson RJ, Sciaky D, McMorran R, Wiegers J, Wiegers TC, Mattingly CJ. The comparative toxicogenomics database: update 2019. Nucleic Acids Research. 2018;47(D1):D948-54 Usage: Comparative Toxicogenomics Database (CTD) was utilized to create chemical-disease, chemical-gene, chemical-GO biological process, chemical-GO cellular components, chemical-GO molecular functions, chemical-phenotype, chemical-protein, chemical-rna, and gene-pathway edges. The original data is filtered such that only records meeting the following criteria were included: chemical-disease: DirectEvidence != "" chemical-gene: Organism == "Homo sapiens", GeneForms == "gene", and affects not in InteractionActions chemical-GO biological process: PhenotypeName == "Biological Process" and Interaction <= "1.04e-47" (10th percentile) chemical-GO cellular components: PhenotypeName == "Cellular Component" and Interaction <= "1.04e-47" (10th percentile) chemical-GO molecular functions: PhenotypeName == "Molecular Function" and Interaction <= "1.04e-47" (10th percentile) chemical-phenotype: DirectEvidence != "" chemical-protein: Organism == "Homo sapiens", GeneForms == "protein", and affects not in InteractionActions chemical-rna: Organism == "Homo sapiens", GeneForms == "mRNA", and affects and activity not in InteractionActions gene-pathway edges: PathwayName == R-HSA- Files Downloaded Data Chemical-Gene Relations: CTD_chem_gene_ixns.tsv.gz Chemical-Disease/Phenotype Relations: CTD_chemicals_diseases.tsv.gz Chemical-GO Relations: CTD_chem_go_enriched.tsv.gz Gene-Pathway Relations: CTD_genes_pathways.tsv.gz DisGeNET Homepage: https://www.disgenet.org/ Citation: Gene-disease association data retrieved from DisGeNET v6.0 (http://www.disgenet.org/), Integrative Biomedical Informatics Group GRIB/IMIM/UPF. [December, 2019]. Piñero J, Ramírez-Anguita JM, Saüch-Pitarch J, Ronzano F, Centeno E, Sanz F, Furlong LI. The DisGeNET knowledge platform for disease genomics: 2019 update. Nucleic Acids Research. 2019. Usage: DisGeNET was utilized to create gene-disease, and gene-phenotype edges. The original data is filtered such that only records meeting the following criteria were included: EI >= "1.0" (90th percentile). Additionally, data from this source was used to create mappings between different types of disease and phenotype identifiers, including: OMIM, ORPHA, UMLS, ICD ➞ DOID OMIM, ORPHA, UMLS, ICD ➞ HPO Files Downloaded Data Disease/Phenotype-Gene Relations: curated_gene_disease_associations.tsv.gz Disease Identifier Mapping: disease_mappings.tsv.gz Generated Mapping Data Disease Identifier Mapping: PHENOTPYE_HPO_MAP.txt Phenotype Identifier Mapping: DISEASE_DOID_MAP.txt Ensembl Homepage: https://uswest.ensembl.org/index.html Citation: Zerbino DR, Achuthan P, Akanni W, Amode MR, Barrell D, Bhai J, Billis K, Cummins C, Gall A, Girón CG, Gil L. Ensembl 2018. Nucleic Acids Research. 2017;46(D1):D754-61 Usage: Ensembl data was utilized to create mappings between Ensembl genes, transcripts, and proteins with NCBI Gene identifiers, HUGO gene symbols, UniProt Accession identifiers, and Protein Ontology identifiers in the knowledge graph (for additional details on the processing of these data, see Data_Preparation.ipynb): Ensembl Transcript IDs ➞ PRO IDs Gene Ensembl IDs ➞ Entrez Gene IDs Gene Ensembl IDs ➞ PRO IDs Gene Symbols ➞ Transcript Ensembl IDs Entrez Gene IDs ➞ Transcript Ensembl IDs Entrez Gene IDs ➞ PRO IDs Protein Ensembl IDs ➞ UniProt Protein Accession STRING IDs ➞ PRO IDs UniProt Protein Accession ➞ Entrez Gene IDs Files Downloaded Data Homo_sapiens.GRCh38.102.gtf Homo_sapiens.GRCh38.102.uniprot.tsv.gz Homo_sapiens.GRCh38.102.entrez.tsv.gz Generated Mapping Data Cleaned Ensembl Gene Set: ensembl_identifier_data_cleaned.txt Merged Gene, RNA, Protein Map: Merged_gene_rna_protein_identifiers.pkl Ensembl Transcript-PRO Identifier Mapping: ENSEMBL_TRANSCRIPT_PROTEIN_ONTOLOGY_MAP.txt Gene Symbol-Ensembl Transcript Identifier Mapping: GENE_SYMBOL_ENSEMBL_TRANSCRIPT_MAP.txt Entrez Gene-Ensembl Transcript Identifier Mapping: ENTREZ_GENE_ENSEMBL_TRANSCRIPT_MAP.txt Entrez Gene-PRO Identifier Mapping: ENTREZ_GENE_PRO_ONTOLOGY_MAP.txt Ensembl Gene-Entrez Gene Identifier Mapping: ENSEMBL_GENE_ENTREZ_GENE_MAP.txt GeneMANIA Homepage: https://genemania.org/ Citation: Warde-Farley D, Donaldson SL, Comes O, Zuberi K, Badrawi R, Chao P, Franz M, Grouios C, Kazi F, Lopes CT, Maitland A. The GeneMANIA prediction server: biological network integration for gene prioritization and predicting gene function. Nucleic Acids Research. 2010;38(suppl_2):W214-20 Usage: GeneMANIA was utilized to create gene-gene edges. Files Downloaded Data: COMBINED.DEFAULT_NETWORKS.BP_COMBINING.txt Genotype-Tissue Expression Project (GTEx) Homepage: https://gtexportal.org/home/ Citation: Lonsdale J, Thomas J, Salvatore M, Phillips R, Lo E, Shad S, Hasz R, Walters G, Garcia F, Young N, Foster B. The genotype-tissue expression (GTEx) project. Nature Genetics. 2013;45(6):580 Usage: The Genotype-Tissue Expression (GTEx) Project was utilized to create edges between protein-cell, protein-anatomy, rna-cell and rna-anatomy entities. The original data were filtered such that only those edges where the median TPM was >=1.0 and genes were of any type other than protein-coding were included. It should also be noted that we chose to use the RNASeQC file over the RSEM file as advised by the GTEx website. The RSEM estimates are based on combining isoform-level estimates, which adds uncertainty to the resulting gene-level values (the isoform-level estimates are highly inaccurate in some cases). The file contains 54 unique tissue and/or cell types. GTEx provides mappings from tissue types to UBERON and EFO. These provided mappings were verified and extended, such that all samples which referenced a cell type were also mapped to the Cell and the Cell Line ontologies. This resulted in a total of 56 mappings (1.04 mappings/concepts). Files Downloaded Data: GTEx_Analysis_2017-06-05_v8_RNASeQCv1.1.9_gene_median_tpm.gct Mapping Results: zooma_tissue_cell_mapping_04JAN2020.xlsx Generated Data The final mapping set was combined with terms from the Human Protein Atlas, see here for more information. All HPA tissue and cell type strings: HPA_tissues.txt Final Term Mapping: HPA_GTEx_TISSUE_CELL_MAP.txt Final RNA, Gene, Protein-Tissues and Cell Types Relations: HPA_GTEX_RNA_GENE_PROTEIN_EDGES.txt Human Genome Organisation Gene Nomenclature Committee (HUGO) Homepage: https://www.genenames.org/ Citations: HGNC Database, HUGO Gene Nomenclature Committee (HGNC), European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom www.genenames.org Yates B, Braschi B, Gray K, Seal R, Tweedie S, Bruford E. Genenames.org: the HGNC and VGNC Resources in 2017. Nucleic Acids Research. 2017;45(D1):D619-625 Usage: The Human Genome Organisation (HUGO) data was utilized to obtain mappings between NCBI Gene identifiers, HUGO gene symbols, UniProt Accession identifiers, and Protein Ontology identifiers. For additional details on the processing of these data, see Data_Preparation.ipynb: Ensembl Transcript IDs ➞ PRO IDs Gene Ensembl IDs ➞ Entrez Gene IDs Gene Ensembl IDs ➞ PRO IDs Gene Symbols ➞ Transcript Ensembl IDs Entrez Gene IDs ➞ Transcript Ensembl IDs Entrez Gene IDs ➞ PRO IDs Protein Ensembl IDs ➞ UniProt Protein Accession STRING IDs ➞ PRO IDs UniProt Protein Accession ➞ Entrez Gene IDs Files Downloaded Data: hgnc_complete_set.txt Generated Data Merged Gene, RNA, Protein Map: Merged_gene_rna_protein_identifiers.pkl Gene Symbol-Ensembl Transcript Identifier Mapping: GENE_SYMBOL_ENSEMBL_TRANSCRIPT_MAP.txt Human Protein Atlas (HPA) Homepage: https://www.proteinatlas.org/ Citation: Uhlén M, Fagerberg L, Hallström BM, Lindskog C, Oksvold P, Mardinoglu A, Sivertsson Å, Kampf C, Sjöstedt E, Asplund A, Olsson I. Tissue-based map of the human proteome. Science. 2015;347(6220):1260419 Usage: The Human Protein Atlas (HPA) was utilized to create rna-cell, rna-anatomy, protein-cell, and protein-anatomy edges. Evidence between gene and RNA expression in specific tissue types was derived by HPA, such that the consensus normalized expression was >=1.0. Zooma was utilized to automatically annotate the 153 unique tissues and cell types from Human Protein Atlas for all human protein-coding genes in the Human Proteome to the Cell Ontology, Cell Line Ontology, and the Uber-Anatomy Ontology. To best represent each concept, the automatic mappings from Zooma were extend through manual mapping efforts to ensure each concept cell type was matched to a Cell Ontology, Cell Line Ontology, and UBERON ontology term. This resulted in a total of 281 mappings (1.84 mappings/concepts). Files Downloaded Data: proteinatlas_search.tsv Mapping Results: zooma_tissue_cell_mapping_04JAN2020.xlsx Generated Data Final Term Mapping: HPA_GTEx_TISSUE_CELL_MAP.txt Final RNA, Gene, Protein-Tissues and Cell Types Relations: HPA_GTEX_RNA_GENE_PROTEIN_EDGES.txt National Center for Biotechnology Information (NCBI) Entrez Gene Homepage: https://www.ncbi.nlm.nih.gov/gene/ Citation: Maglott D, Ostell J, Pruitt KD, Tatusova T. Entrez Gene: gene-centered information at NCBI. Nucleic Acids Research. 2005;33(suppl_1):D54-8. Usage: The National Center for Biotechnology Information (NCBI) Gene data was utilized to obtain mappings between NCBI Gene identifiers, HUGO gene symbols, UniProt Accession identifiers, and Protein Ontology identifiers. For additional details on the processing of these data, see Data_Preparation.ipynb: Ensembl Transcript IDs ➞ PRO IDs Gene Ensembl IDs ➞ Entrez Gene IDs Gene Ensembl IDs ➞ PRO IDs Gene Symbols ➞ Transcript Ensembl IDs Entrez Gene IDs ➞ Transcript Ensembl IDs Entrez Gene IDs ➞ PRO IDs Protein Ensembl IDs ➞ UniProt Protein Accession STRING IDs ➞ PRO IDs UniProt Protein Accession ➞ Entrez Gene IDs Files Downloaded Data: Homo_sapiens.gene_info.gz Generated Data Merged Gene, RNA, Protein Map: Merged_gene_rna_protein_identifiers.pkl Entrez Gene-Ensembl Transcript Identifier Mapping: ENTREZ_GENE_ENSEMBL_TRANSCRIPT_MAP.txt Entrez Gene-PRO Identifier Mapping: ENTREZ_GENE_PRO_ONTOLOGY_MAP.txt Ensembl Gene-Entrez Gene Identifier Mapping: ENSEMBL_GENE_ENTREZ_GENE_MAP.txt Uniprot Accession-Entrez Gene Identifier Mapping: UNIPROT_ACCESSION_ENTREZ_GENE_MAP.txt Reactome Pathway Database Homepage: https://reactome.org/ Citation: Fabregat A, Jupe S, Matthews L, Sidiropoulos K, Gillespie M, Garapati P, Haw R, Jassal B, Korninger F, May B, Milacic M. The reactome pathway knowledgebase. Nucleic Acids Research. 2017;46(D1):D649-55 Usage: The Reactome Database was utilized to create chemical-pathway, GO Biological process-pathway, pathway-GO Cellular component, GO Molecular function-pathway, and protein-pathway edges. The original data is filtered such that only records meeting the following criteria were included: chemical-pathway: column[5] == "Homo sapiens" GO Biological process-pathway: column[5] startswith "REACTOME", column[8] == "P", and column[12] == "taxon:9606" pathway-GO Cellular component: column[5] startswith "REACTOME", column[8] == "C", and column[12] == "taxon:9606" GO Molecular function-pathway: column[5] startswith "REACTOME", column[8] == "F", and column[12] == "taxon:9606" protein-pathway: column[5] == "Homo sapiens" Files Downloaded Data Chemical-Pathway Relations: ChEBI2Reactome_All_Levels.txt Pathway-GO Relations: gene_association.reactome Protein-Pathway Relations: UniProt2Reactome_All_Levels.txt Search Tool for Recurring Instances of Neighbouring Genes (STRING) Database Homepage: string-db.org Citation: Szklarczyk D, Gable AL, Lyon D, Junge A, Wyder S, Huerta-Cepas J, Simonovic M, Doncheva NT, Morris JH, Bork P, Jensen LJ. STRING v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Research. 2018;47(D1):D607-13 Usage: The Search Tool for Recurring Instances of Neighbouring Genes (STRING) Database was utilized to create protein-protein edges. The original data is filtered such that only records meeting the following criteria were included: combined_score >= "700" (>90th percentile). Files Downloaded Data: 9606.protein.links.v11.0.txt.gz Generated Data: STRING-PRO Identifier Mapping: STRING_PRO_ONTOLOGY_MAP.txt Universal Protein Resource (UniProt) Knowledgebase Homepage: https://www.uniprot.org/ Citation: UniProt Consortium. UniProt: a worldwide hub of protein knowledge. Nucleic acids research. 2018;47(D1):D506-15 Usage: The Universal Protein Resource (UniProt) Knowledgebase was utilized to obtain cofactor/catalyst-protein and protein-coding gene-protein edges as well as mappings between NCBI Gene identifiers, HUGO gene symbols, Universal Protein Resource (UniProt) Accession identifiers, and Protein Ontology identifiers. For additional details on the processing of these data, see Data_Preparation.ipynb: Ensembl Transcript IDs ➞ PRO IDs Gene Ensembl IDs ➞ Entrez Gene IDs Gene Ensembl IDs ➞ PRO IDs Gene Symbols ➞ Transcript Ensembl IDs Entrez Gene IDs ➞ Transcript Ensembl IDs Entrez Gene IDs ➞ PRO IDs Protein Ensembl IDs ➞ UniProt Protein Accession STRING IDs ➞ PRO IDs UniProt Protein Accession ➞ Entrez Gene IDs Files Downloaded Data Cofactor and Catalyst relations: Cofactor/Catalyst Query Results UniProt Identifier Mapping: UniProt Identifier Query Results Generated Data Merged Gene, RNA, Protein Map: Merged_gene_rna_protein_identifiers.pkl Protein-Cofactor Relations: UNIPROT_PROTEIN_COFACTOR.txt Protein-Catalyst Relations: UNIPROT_PROTEIN_CATALYST.txt UniProt Accession-PRO Identifier Mapping: UNIPROT_ACCESSION_PRO_ONTOLOGY_MAP.txt UniProt Accession-Entrez Gene Identifier Mapping: UNIPROT_ACCESSION_ENTREZ_GENE_MAP.txt This project is licensed under Apache License 2.0 - see the LICENSE.md file for details. If you intend to use any of the information on this Wiki, please provide the appropriate attribution by citing this repository: @misc{callahan_tj_2019_3401437, author = {Callahan, TJ}, title = {PheKnowLator}, month = mar, year = 2019, doi = {10.5281/zenodo.3401437}, url = {https://doi.org/10.5281/zenodo.3401437} }
创建时间:
2023-06-28
用户留言
有没有相关的论文或文献参考?
这个数据集是基于什么背景创建的?
数据集的作者是谁?
能帮我联系到这个数据集的作者吗?
这个数据集如何下载?
点击留言
数据主题
具身智能
数据集  4099个
机构  8个
大模型
数据集  439个
机构  10个
无人机
数据集  37个
机构  6个
指令微调
数据集  36个
机构  6个
蛋白质结构
数据集  50个
机构  8个
空间智能
数据集  21个
机构  5个
5,000+
优质数据集
54 个
任务类型
进入经典数据集
热门数据集

CHARLS

中国健康与养老追踪调查(CHARLS)数据集,旨在收集反映中国45岁及以上中老年人家庭和个人的高质量微观数据,用以分析人口老龄化问题,内容包括健康状况、经济状况、家庭结构和社会支持等。

charls.pku.edu.cn 收录

flames-and-smoke-datasets

该仓库总结了多个公开的火焰和烟雾数据集,包括DFS、D-Fire dataset、FASDD、FLAME、BoWFire、VisiFire、fire-smoke-detect-yolov4、Forest Fire等数据集。每个数据集都有详细的描述,包括数据来源、图像数量、标注信息等。

github 收录

Hang Seng Index

恒生指数(Hang Seng Index)是香港股市的主要股票市场指数,由恒生银行旗下的恒生指数有限公司编制。该指数涵盖了香港股票市场中最具代表性的50家上市公司,反映了香港股市的整体表现。

www.hsi.com.hk 收录

CACD

跨年龄名人数据集是用于跨年龄人脸识别和检索的数据集。它包含 2,000 位名人的 163,446 张图像。该数据集于 2014 年由马里兰大学计算机科学系发表,论文名为 cross-age Reference Coding for Age-invariant Face Recognition and Retrieval。

OpenDataLab 收录

马达加斯加岛 – 世界地理数据大百科辞条

马达加斯加岛在非洲的东南部,位于11o56′59″S - 25o36′25″S及43o11′18″E - 50o29′36″E之间。通过莫桑比克海峡与位于非洲大陆的莫桑比克相望,最近距离为415千米。临近的岛屿分别为西北部的科摩罗群岛、北部的塞舌尔群岛、东部的毛里求斯岛和留尼汪岛等。在google earth 2015年遥感影像基础上研发的马达加斯加海岸线数据集表明,马达加斯加岛面积591,128.68平方千米,其中马达加斯加本岛面积589,015.06平方千米,周边小岛面积为2,113.62平方千米。马达加斯加本岛是非洲第一大岛,是仅次于格陵兰、新几内亚岛和加里曼丹岛的世界第四大岛屿。岛的形状呈南北走向狭长纺锤形,南北向长1,572千米;南北窄,中部宽,最宽处达574千米。海岸线总长16,309.27千米, 其中马达加斯加本岛海岸线长10,899.03千米,周边小岛海岸线长5,410.24千米。马达加斯加岛属于马达加斯加共和国。全国共划分22个区,119个县。22个区分别为:阿那拉芒加区,第亚那区,上马齐亚特拉区,博爱尼区,阿齐那那那区,阿齐莫-安德列发那区,萨瓦区,伊达西区,法基南卡拉塔区,邦古拉法区,索非亚区,贝齐博卡区,梅拉基区,阿拉奥特拉-曼古罗区,阿那拉兰基罗富区,阿莫罗尼马尼亚区,法土法韦-非图韦那尼区,阿齐莫-阿齐那那那区,伊霍罗贝区,美那贝区,安德罗伊区和阿诺西区。首都安塔那那利佛(Antananarivo)位于岛屿的中东部。马达加斯加岛是由火山及喀斯特地貌为主。贯穿海岛的是巨大火山岩山体-察腊塔纳山,其主峰马鲁穆库特鲁山(Maromokotro)海拔2,876米,是全国最高峰。马达加斯加自然景观垂直地带性分异显著,是热带雨林和热带草原广布的地区。岛上大约有20多万种动植物,其中包括马达加斯加特有物种狐猴(Lemur catta)、马达加斯加国树猴面包树(Adansonia digitata L.)等。

国家对地观测科学数据中心 收录