five

Synthetic gene expression data with underlying gene network

收藏
NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/8242660
下载链接
链接失效反馈
官方服务:
资源简介:
This is the synthetic gene expression data along with the underlying gene network used in the simulation studies of Hu and Szymczak (2023) for evaluating network-guided random forest. In this dataset we consider the situation of 1000 genes and 1000 samples each for training and testing sets. Each file contains a list of 100 replications of the considered scenario which can be identified via the file name. In particular, we consider 6 different scenarios depending on the number of disease modules and how are the effects of disease genes distributed within the disease module. When there are disease genes, we also consider 3 different levels of effect sizes. The binary responses are then generated via a logistic regression model. More details on these scenarios and the data generation mechanism can be found in Hu and Szymczak (2023). The data is generated by the function gen_data in R package networkRF which can be accessed at https://github.com/imbs-hl/networkRF. To obtain the datasets with 3000 genes, which is the other part of the data used in the simulation studies of Hu and Szymczak (2023), simply modify the num.var argument of the function gen_data. More descriptions on the implementation and the format of the output can be found in the help page of the R package.
创建时间:
2023-08-15
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作