A prediction-based resampling method for estimating the number of clusters in a dataset
收藏PubMed Central2002-06-25 更新2026-05-16 收录
下载链接:
https://pmc.ncbi.nlm.nih.gov/articles/PMC126241/
下载链接
链接失效反馈官方服务:
资源简介:
BACKGROUND: Microarray technology is increasingly being applied in biological and medical research to address a wide range of problems, such as the classification of tumors. An important statistical problem associated with tumor classification is the identification of new tumor classes using gene-expression profiles. Two essential aspects of this clustering problem are: to estimate the number of clusters, if any, in a dataset; and to allocate tumor samples to these clusters, and assess the confidence of cluster assignments for individual samples. Here we address the first of these problems. RESULTS: We have developed a new prediction-based resampling method, Clest, to estimate the number of clusters in a dataset. The performance of the new and existing methods were compared using simulated data and gene-expression data from four recently published cancer microarray studies. Clest was generally found to be more accurate and robust than the six existing methods considered in the study. CONCLUSIONS: Focusing on prediction accuracy in conjunction with resampling produces accurate and robust estimates of the number of clusters.
提供机构:
BMC
创建时间:
2002-06-25



