SynGen6: Synthetic Genomic Dataset with Diverse Ancestry
收藏DataCite Commons2024-10-11 更新2025-04-16 收录
下载链接:
https://ieee-dataport.org/documents/syngen6-synthetic-genomic-dataset-diverse-ancestry
下载链接
链接失效反馈官方服务:
资源简介:
SynGen6 is a synthetic genomic dataset that encompasses six distinct populations. We utilized Principal Component Analysis (PCA) and ϵ-local differential privacy (LDP) to generate synthetic samples. We then simulated phenotype vectors associated with significant SNPs, mirroring real-world gene-disease associations. We also generated synthetic SNPs to watermark the dataset enabling verification of outsourced computations. Lastly, synthetic relatives were created to support research on kinship inference and family-based genomic analyses. The actual SynGen6 data can be created by runningour scripts in the All of Us Research Hub WorkBench. Here, we provide a toy example based on the 1000 genomes public dataset.
提供机构:
IEEE DataPort
创建时间:
2024-10-11



