five

TMC-SNPdb: An Indian germline variant dataset derived from whole exome sequence

收藏
NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://www.ncbi.nlm.nih.gov/sra/ERP015935
下载链接
链接失效反馈
官方服务:
资源简介:
Cancer is predominantly a somatic disease. A mutant allele found in cancer cell genome is considered somatic when it is absent in paired normal genome and dbSNP, the most comprehensive public SNP database. However, dbSNP inadequately represents several non-Caucasian populations including that from the Indian subcontinent, posing a limitation in cancer genomic analyses of data from these populations. We present “TMC-SNPdb”, as the first open source freely accessible (through ANNOVAR), flexible and upgradable SNP database from whole exome data of 62 normal samples derived from cancer patients of Indian origin, representing 114,309 unique germline variants. TMC-SNPdb is presented with a companion subtraction tool that can be executed with command line option or an easy-to-use graphical user interface (GUI) with the ability to deplete additional Indian population specific SNPs over and above that possible with dbSNP and 1000 Genomes databases. Using an institutional generated whole exome data set of 132 samples of Indian origin, we demonstrate that “TMC-SNPdb” reduced 42%, 33% and 28% false positive somatic events post dbSNP depletion in Indian origin tongue, gallbladder, and cervical cancer samples, respectively. Beyond cancer somatic analyses, we anticipate utility of “TMC-SNPdb” in several Mendelian germline diseases.
创建时间:
2023-10-13
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作