Bayesian Modelling of Sequential Discoveries
收藏DataCite Commons2022-04-11 更新2024-07-29 收录
下载链接:
https://tandf.figshare.com/articles/dataset/Bayesian_Modelling_of_Sequential_Discoveries/19576293
下载链接
链接失效反馈官方服务:
资源简介:
We aim at modelling the appearance of distinct tags in a sequence of labelled objects. Common examples of this type of data include words in a corpus or distinct species in a sample. These sequential discoveries are often summarised via accumulation curves, which count the number of distinct entities observed in an increasingly large set of objects. We propose a novel Bayesian method for species sampling modelling by directly specifying the probability of a new discovery, therefore allowing for flexible specifications. The asymptotic behavior and finite sample properties of such an approach are extensively studied. Interestingly, our enlarged class of sequential processes includes highly tractable special cases. We present a subclass of models characterized by appealing theoretical and computational properties, including one that shares the same discovery probability with the Dirichlet process. Moreover, due to strong connections with logistic regression models, the latter subclass can naturally account for covariates. We finally test our proposal on both synthetic and real data, with special emphasis on a large fungal biodiversity study in Finland.
本研究旨在对带标签对象序列中的不同标记进行建模。此类数据的常见示例包括语料库中的词汇,或样本中的不同物种。这类序列发现过程通常通过累积曲线进行总结,该曲线统计随对象集合规模逐步扩大时所观测到的不同实体的数量。我们提出了一种用于物种采样建模的新颖贝叶斯方法,通过直接指定新发现的概率,从而实现灵活的模型设定。我们对该方法的渐近行为与有限样本性质展开了详尽的研究。值得注意的是,我们拓展的序列过程类包含了极具易处理性的特殊情形。我们提出了一类具备优异理论与计算性质的模型子类,其中一类模型的新发现概率与狄利克雷过程(Dirichlet Process)一致。此外,由于与逻辑回归模型存在紧密关联,该子类模型可自然地纳入协变量。我们最终在合成数据与真实数据上对所提方法进行了测试,其中重点针对芬兰一项大型真菌生物多样性研究展开了验证。
提供机构:
Taylor & Francis
创建时间:
2022-04-11



