Datasets used in the INTREPPPID manuscript
收藏NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/10594149
下载链接
链接失效反馈官方服务:
资源简介:
INTREPPPID Manuscript Datasets
The enclosed archive holds all the datasets used in the INTREPPPID manuscript. See the INTREPPPID documentation for details on the format of the HDF5 files.
Files are organised as follows:
[FORMAT]/seed_[SEED]/[TAXON]/[DATASET_NAME].h5
Where:
FORMAT is whether the HDF5 is in the RAPPPID or INTREPPPID format.
SEED is the random seed used to generate the dataset. They are all phone numbers found in songs.
TAXON is the NCBI Taxon ID of the organism from which the dataset was generated
DATASET_NAME is the name of the dataset.
"Why are there only Human (9606) datasets in the INTREPPPID format?"
In the manuscript, we use the INTREPPPID format to train them model on Human data, and then test the model using datasets in the RAPPPID format. INTREPPPID can only be trained on datasets with orthology data, but can be tested on datasets without since the orthologous locality loss is only used during training.
创建时间:
2024-02-09



