Protein and Ligand Dataset for Drug Repositioning in Childhood Acute Lymphoblastic Leukemia (ALL)
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://data.mendeley.com/datasets/r5ftnf4j9f
下载链接
链接失效反馈官方服务:
资源简介:
This dataset includes two main components (proteins and ligands), which can be used in computational research focused on drug repositioning in Childhood Acute Lymphoblastic Leukemia (ALL):
1- Protein Sequences (proteins.txt): This dataset file contains amino acid sequences of selected proteins used in a study aiming to identify novel therapeutic candidates by drug repositioning for Childhood Acute Lymphoblastic Leukemia (ALL). The sequences were extracted from the UniProt database and are proteins which are known or predicted to be associated with ALL pathogenesis, treatment procedures, or immunological relevance. The file is structured in a JSON-like format with UniProt IDs as keys and amino acid sequences as values. Each entry corresponds to one protein.
Additional Metadata:
- Data Type: Amino acid sequence data (FASTA-like JSON format)
- Unique Proteins: 8479
- Average Sequence Length: 529.81
- Maximum Sequence Length: 14507
2. Ligand Data (ligands.txt): A collection of drug-like small molecules represented in SMILES or similar formats. The ligands in this file were selected by considering their therapeutic potential and relevance to ALL-related targets. The data was sourced from databases such as ChEMBL and DrugBank. Also, some data related to FDA approved drugs were added manually.
Additional Metadata:
- Data Type: SMILES strings in JSON-like format (key-value pairs).
- Number of ligands: ~220.000
The combined dataset supports research in bioinformatics, drug discovery, and leukemia-specific therapeutic targeting.
The dataset is designed to aid computational biology, bioinformatics and artificial intelligence research, especially for researchers in the field of leukemia biology, drug-target interaction modeling, and systems pharmacology.
创建时间:
2025-04-21



