five

scbirlab/evo-pharm-atlas-1

收藏
Hugging Face2026-05-09 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/scbirlab/evo-pharm-atlas-1
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-4.0 tags: - biology - chemistry - antibiotics pretty_name: Evolutionary Pharmacology Atlas 1 size_categories: - 100k<n<1M configs: - config_name: conservation default: true data_files: "orthology/_all.rbh.tsv.gz" sep: "\t" - config_name: canonical-target-info data_files: "umap/_all.umap-targets-taxon.csv.gz" sep: "," - config_name: species-info data_files: "orthology/_all.rbh_m.coldata.tsv.gz" sep: "\t" - config_name: inhibitors data_files: "inhibitors/all.tsv.gz" sep: "\t" --- # **evo-pharm-atlas-1** Calculated protein drug target conservation for 10,584 targets from ChEMBL v36 across 8207 bacterial proteomes from UniProt KB. ## Dataset Details ### Dataset Description - **Curated by:** [@eachanjohnson](https://huggingface.co/eachanjohnson) - **Funded by:** The Francis Crick Institute; UKRI - **License:** CC-by-4.0 ### Dataset Sources <!-- Provide the basic links for the dataset. --> <!-- - **Repository:** https://doi.org/10.5281/zenodo.8136904 --> - **Paper** https://doi.org/10.1038/s41589-023-01349-8 <!-- - **Demo [optional]:** [More Information Needed] --> ## Dataset Structure The data are separated into four configs. ### conservation Conservation of canonical targets across bacterial proteomes. - **target_taxon_id**: NCBI taxonomy ID of source species of canonical drug target - **target_taxon_l1**: Approximately taxonomic Kingdom of canonical drug target (from ChEMBL) - **target_taxon_l2**: Approximately taxonomic Class of canonical drug target (from ChEMBL) - **target_taxon_l3**: Approximately taxonomic Family of canonical drug target (from ChEMBL) - **target_uniprot_id**: UniProt KB ID of canonical drug target protein - **ortholog_uniprot_id**: UniProt KB ID of ortholog protein - **target_accession**: Full UniProt accession ID of canonical drug target protein - **ortholog_accession**: Full UniProt accession ID of ortholog protein - **target_length**: Number of amino acids in canonical drug target - **ortholog_length**: Number of amino acids in ortholog protein - **alignment_length**: Length of BLASTp alignment (from Diamond) - **target_ortholog_identity**: Amino acid identity between drug target and ortholog (from Diamond) - **gap_openings**: Number of gaps in alignment (from Diamond) - **mismatches**: Number of mismatches in alignment (from Diamond) - **e_value**: Expectation-value of alignment (from Diamond) - **bit_score**: Score of alignment (from Diamond) - **target_ortholog_coverage**: Aligment length divided by query length - **target_organism_name**: Source species name of canonical drug target (from ChEMBL) - **target_is_species_group**: Flag whether target is annotated as being from a group of species, e.g. "Bacteria" (from ChEMBL) - **target_gene_symbol**: Gene symbol of canonical drug target (from ChEMBL) - **target_ec_number**: Enzyme Commission number drug target (from ChEMBL) - **target_go_process_id**: Semi-colon-separated list of Gene Ontology process IDs of canonical drug target (from ChEMBL) - **target_go_process_name**: Semi-colon-separated list of Gene Ontology process names of canonical drug target (from ChEMBL) - **target_chembl_id**: Canonical drug target ChEMBL ID - **target_name**: Canonical drug target name - **pLI**: For human targets, Probability of Loss-of-function Intolerance (from gNomad) - **LOEUF**: For human targets, Loss-of-function Observed/Expected Upper bound Fraction (from gNomad) - **ortholog_target_name**: Ortholog protein name - **ortholog_target_locus**: Ortholog gene locus tag - **ortholog_taxon_id**: NCBI taxonomy ID of source species of ortholog protein - **target_is_human**: Flag whether canonical drug target is human protein - **target_is_bacteria**: Flag whether canonical drug target is bacterial protein ### canonical-target-info Information on canonical drug targets and their aggregate conservation across bacterial proteomes. - **target_uniprot_id**: UniProt KB ID of canonical drug target protein - **target_taxon_id**: NCBI taxonomy ID of source species of canonical drug target - **target_taxon_l1**: Approximately taxonomic Kingdom of canonical drug target (from ChEMBL) - **target_taxon_l2**: Approximately taxonomic Class of canonical drug target (from ChEMBL) - **target_taxon_l3**: Approximately taxonomic Family of canonical drug target (from ChEMBL) - **target_accession**: Full UniProt accession ID of canonical drug target protein - **target_length**: Number of amino acids in canonical drug target - **target_organism_name**: Source species name of canonical drug target (from ChEMBL) - **target_is_species_group**: Flag whether target is annotated as being from a group of species, e.g. "Bacteria" (from ChEMBL) - **target_gene_symbol**: Gene symbol of canonical drug target (from ChEMBL) - **target_ec_number**: Enzyme Commission number drug target (from ChEMBL) - **target_go_process_id**: Semi-colon-separated list of Gene Ontology process IDs of canonical drug target (from ChEMBL) - **target_go_process_name**: Semi-colon-separated list of Gene Ontology process names of canonical drug target (from ChEMBL) - **target_chembl_id**: Canonical drug target ChEMBL ID - **target_name**: Canonical drug target name - **target_is_human**: Flag whether canonical drug target is human protein - **target_is_bacteria**: Flag whether canonical drug target is bacterial protein - **entropy**: Shannon entropy of target amino acid identity across all bacterial strains - **sparsity**: Sparsity index of target amino acid identity across all bacterial strains - **mean_conservation**: Mean target amino acid identity across all bacterial strains - **median_conservation**: Median target amino acid identity across all bacterial strains - **UMAP ...**: Target x bacteria amino acid identity matrix projected into two dimensions using UMAP. ### species-info Information on all bacterial strains in the dataset. - **ortholog_taxon_id**: NCBI taxonomy ID of bacterial strain - **ortholog_domain**: Domain of bacterial strain - **ortholog_kingdom**: Kingdom of bacterial strain - **ortholog_phylum**: Phylum of bacterial strain - **ortholog_class**: Class of bacterial strain - **ortholog_order**: Order of bacterial strain - **ortholog_family**: Family of bacterial strain - **ortholog_genus**: Genus of bacterial strain - **ortholog_species**: Species of bacterial strain - **ortholog_subspecies**: Subspecies of bacterial strain - **ortholog_strain**: Bacterial strain ### inhibitors Information on ChEMBL inhibitors that engage conserved drug targets. - **molecule_chembl_id**: Molecule ChEMBL ID of chemical - **molecule_name**: Name of chemical (from ChEMBL) - **molecule_smiles**: SMILES of chemical (from ChEMBL) - **molecule_inchikey**: InChI Key of chemical (from ChEMBL) - **is_oral**: Is orally-administered drug (from ChEMBL) - **is_topical**: Is topical drug (from ChEMBL) - **is_parenteral**: Is parenteral-administered drug (from ChEMBL) - **is_orphan**: Is orphan drug (from ChEMBL) - **is_natural_product**: Is a natural product (from ChEMBL) - **is_chemical_probe**: Is a chemical probe (from ChEMBL) - **has_black_box**: For drugs, has an FDA black box warning (from ChEMBL) - **max_phase**: Maximum clinical phase (from ChEMBL) - **molecule_chembl_id_2**: ChEMBL ID of chemical (from UniChem, redundant) - **chembl_url**: URL to ChEMBL web page for chemical (from UniChem) - **pubchem_id**: PubChem ID of chemical (from UniChem) - **pubchem_url**: PubChem URL of chemical (from UniChem) - **drugbank_id**: For drugs, DrugBank ID (from UniChem) - **drugbank_url**: For drugs, DrugBank URL (from UniChem) - **vendor_...**: Vendor catalog numbers (from UniChem) - **..._url**: Vendor URLs (from UniChem) - **target_taxon_id**: NCBI taxonomy ID of source species of canonical drug target - **target_organism_name**: Source species name of canonical drug target (from ChEMBL) - **target_chembl_id**: Target ChEMBL ID of canonical drug target - **target_name**: Name of canonical drug target - **assay_chembl_id**: Assay ChEMBL ID for assay indicating interaction of chemical with target - **assay_type**: F = functional; B = binding - **assay_target_confidence_score**: ChEMBL confidence in accurate target annotation - **molecule_target_pchembl**: Quantitative assay outcome (usually -log10(IC50 in molar)). ## Dataset Creation ### Curation Rationale To identify bacterial orthologous targets of existing drugs for rapid and rational repurposing. #### Data Collection and Processing Data were generated using the [Repurposing by Othology (RepOrt) pipeline](https://github.com/scbirlab/nf-report), which uses [Diamond](https://github.com/bbuchfink/diamond) for high-throughput BLASTP, and ChEMBL and UniChem APIs. #### Personal and Sensitive Information None. <!-- ## Bias, Risks, and Limitations --> <!-- This section is meant to convey both technical and sociotechnical limitations. --> <!-- [More Information Needed] --> <!-- ### Recommendations --> <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. --> <!-- Users should be made aware of the risks, biases and limitations of the dataset. More information needed for further recommendations. --> <!-- ## Citation **BibTeX:** ``` @article{} ``` **APA:** > --> ## Dataset Card Contact [@eachanjohnson](https://huggingface.co/eachanjohnson)
提供机构:
scbirlab
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作