Bayesian Inference for Growth Mixture Models with an Unknown Number of Classes

Name: Bayesian Inference for Growth Mixture Models with an Unknown Number of Classes
Creator: University of Notre Dame
Published: 2024-11-11 18:14:09
License: 暂无描述

DataCite Commons2024-11-11 更新2025-04-17 收录

下载链接：

https://curate.nd.edu/articles/dataset/Bayesian_Inference_for_Growth_Mixture_Models_with_an_Unknown_Number_of_Classes/26761573/1

下载链接

链接失效反馈

官方服务：

资源简介：

Growth mixture models (GMMs) have been widely used to capture different growth trajectories of unobserved subpopulations (or latent classes). The traditional GMM determines the optimal number of classes through a process called class enumeration, which involves fitting a sequence of models with an increasing number of classes and then selecting the best-fitting model using statistical criteria. Despite its popularity, class enumeration has long been criticized for introducing severe subjectivity when comparing the fitted models. Bayesian nonparametric (BNP) mixture modeling offers an alternative approach to detecting latent classes. The BNP approach circumvents the subjectivity inherent in class enumeration by placing a prior on the mixing distribution, which indirectly induces a prior on the number of classes. Consequently, the number of classes can be inferred directly from the data. However, the BNP approach remains understudied in the context of GMM. To reduce this research gap, the dissertation aims to: 1) propose two BNP-GMMs using the Dirichlet process mixture and the mixture of finite mixtures models; 2) compare the performance of the two proposed models in determining the number of classes $K$ with that of the traditional GMM; and 3) evaluate the performance of the two proposed models in choosing K when using the mode versus when using a loss function called variation of information (VI). Based on Monte Carlo simulations, Study 1 compares the proposed models and the traditional GMM in choosing K when there is no model misspecification, while Study 2 compares them in choosing K when there is model misspecification in the latent mean structure. Overall, simulation results showed that: 1) the proposed models using VI were more accurate than using the mode; 2) when the population was homogeneous (comprising only one class), the proposed models using VI yielded the highest accuracy in choosing K; whereas, when the population was heterogeneous (consisting of three classes), the proposed models using VI achieved superior accuracy in choosing K when class separation was large; and 3) the proposed models using VI demonstrated robustness against exacerbated overfitting caused by model misspecification. For illustration, the proposed BNP-GMMs were applied to data from the Early Childhood Longitudinal Study, Kindergarten Class of 1998-99.

提供机构：

University of Notre Dame

创建时间：

2024-08-20

5,000+

优质数据集

54 个

任务类型

进入经典数据集