five

htSNP finder - PCA Based Methods

收藏
simtk.org2004-09-30 更新2025-03-22 收录
下载链接:
https://simtk.org/projects/htsnp-pca
下载链接
链接失效反馈
官方服务:
资源简介:
The immense volume and rapid growth of human genomic data, especially single nucleotide polymorphisms (SNPs), present special challenges for both biomedical researchers and automatic algorithms. One such challenge is to select an optimal subset of SNPs, commonly referred as "haplotype tagging SNPs" (htSNPs), to capture most of the haplotype diversity of each haplotype block or gene-specific region. This information-reduction process facilitates cost-effective genotyping and, subsequently, genotype-phenotype association studies. It also has implications for assessing the risk of identifying research subjects on the basis of SNP information deposited in public domain databases. We have investigated methods for selecting htSNPs by use of principal components analysis (PCA). These methods first identify eigenSNPs and then map them to actual SNPs. We evaluated two mapping strategies, greedy discard and varimax rotation, by assessing the ability of the selected htSNPs to reconstruct genotypes of non-htSNPs. We also compared these methods with two other htSNP finders, one of which is PCA based. We applied these methods to three experimental data sets and found that the PCA-based methods tend to select the smallest set of htSNPs to achieve a 90% reconstruction precision. <br/><br/>This project includes the following software/data packages: <br/> <ul> <li> <a href="https://simtk.org/frs?group_id=1245#pack_1908">eigen2htSNP </a> : Source code and datasets. Methods for selecting haplotype tagging SNPs (htSNPs) using Principal Components Analysis (PCA); Lin and Altman, Finding Haplotype Tagging SNPs by Use of Principal Components Analysis, American Journal of Human Genetics 2004 Nov;75(5):850-61. Epub 2004 Sep 23. [PMID: 15389393 PMCID: PMC1182114] </li> </ul>

人类基因组数据,尤其是单核苷酸多态性(SNPs)的庞大体积和快速增长,对生物医学研究人员和自动算法都构成了特殊的挑战。其中一项挑战是选择一个最优的SNPs子集,通常称为“单倍型标记SNPs”(htSNPs),以捕捉每个单倍型块或基因特异性区域的单倍型多样性。这一信息缩减过程有助于实现成本效益的基因分型,进而促进基因型-表型关联研究。此外,它对于评估基于公共领域数据库中存储的SNP信息识别研究对象的危险程度也具有重要意义。我们研究了利用主成分分析(PCA)选择htSNPs的方法。这些方法首先识别特征SNPs,然后将其映射到实际SNPs上。我们评估了两种映射策略,贪婪丢弃和Varimax旋转,通过评估所选htSNPs重构非htSNPs基因型的能力。我们还将这些方法与两种其他htSNP查找器进行了比较,其中一种是基于PCA的。我们将这些方法应用于三个实验数据集,并发现基于PCA的方法倾向于选择最小的htSNPs集合,以实现90%的重构精度。该项目包括以下软件/数据包: <ul> <li><a href="https://simtk.org/frs?group_id=1245#pack_1908">eigen2htSNP</a>:源代码和数据集。使用主成分分析(PCA)选择单倍型标记SNPs(htSNPs)的方法;Lin和Altman,通过主成分分析寻找单倍型标记SNPs,美国人类遗传学杂志2004年11月;75(5):850-61。Epub 2004年9月23日。[PMID: 15389393 PMCID: PMC1182114]</li> </ul>
提供机构:
SimTK
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作