Datasets for KGDDP-biomarker
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/13920083
下载链接
链接失效反馈官方服务:
资源简介:
The following Gene Expression Omnibus (GEO) accession numbers were used: GSE169568 and GSE126124. The drug-target interaction (DTI) data was sourced from the DrugBank database, while the protein-protein interaction (PPI) and Gene Ontology data were obtained from the UniProt database. Pathway data was retrieved from the Reactome database.
1. **Knowledge Graph Data** (`kg`): - **File**: `kg_del_selfloop.csv` - **Description**: This dataset contains the knowledge graph data that represents relationships between biological entities, including drugs, proteins, and diseases. It is crucial for understanding the interconnections and enhancing the predictive capabilities of the model.
2. **Negative Pathway-Protein Relationships** (`pro_path_neg_sp`): - **File**: `human_neg_pathpro.csv` - **Description**: This dataset provides information about negative relationships between pathways and proteins. It helps to identify potential non-relevant or inhibitory connections that may impact disease diagnosis.
3. **Negative Disease-Protein Interactions** (`dpi_neg`): - **File**: `neg_dpi_df_t10.csv` - **Description**: This dataset includes negative interactions between diseases and proteins, which assists in refining the model by removing misleading associations that do not contribute positively to predictions.
4. **Feature Profiles** (`fp_df`): - **File**: `bdki_db_gdsc_fp.csv` - **Description**: This dataset contains feature profiles of various samples, which are used to train the model. It includes a variety of biomarker data that is essential for accurate disease prediction.
5. **Expression Triples** (`exp_triples`): - **File**: `exp_triples.csv` - **Description**: This dataset consists of expression triples representing relationships between genes and their expression levels. It is crucial for capturing the expression profiles of samples and understanding their role in disease pathology.
6. **Expression Graph Triples** (`exp_triples_graph`): - **File**: `exp_graph_triples.csv` - **Description**: This dataset contains graph triples that depict relationships within the expression data. It is used to construct a graph representation of the data, which is essential for graph-based analysis techniques.
7. **Expression Input Data** (`exp_input`): - **File**: `se_exp_input.csv` - **Description**: This dataset serves as the input for the expression data model, containing necessary information to perform predictions based on gene expression levels.
8. **Sample Information** (`dls`): - **File**: `sample_info.csv` - **Description**: This dataset includes sample information, including diagnosis details. It is used to filter out samples without diagnosis data and plays a critical role in training and validating the model.
创建时间:
2024-10-11



