Data from: A practical introduction to random forest for genetic association studies in ecology and evolution
收藏DataCite Commons2025-06-01 更新2025-05-10 收录
下载链接:
https://datadryad.org/dataset/doi:10.5061/dryad.k55hh8f
下载链接
链接失效反馈官方服务:
资源简介:
Large genomic studies are becoming increasingly common with advances in
sequencing technology, and our ability to understand how genomic variation
influences phenotypic variation between individuals has never been
greater. The exploration of such relationships first requires the
identification of associations between molecular markers and phenotypes.
Here we explore the use of Random Forest (RF), a powerful machine learning
algorithm, in genomic studies to discern loci underlying both discrete and
quantitative traits, particularly when studying wild or non-model
organisms. RF is becoming increasingly used in ecological and population
genetics because, unlike traditional methods, it can efficiently analyze
thousands of loci simultaneously and account for non-additive
interactions. However, understanding both the power and limitations of
Random Forest is important for its proper implementation and the
interpretation of results. We therefore provide a practical introduction
to the algorithm and its use for identifying associations between
molecular markers and phenotypes, discussing such topics as data
limitations, algorithm initiation and optimization, as well as
interpretation. We also provide short R tutorials as examples, with the
aim of providing a guide to the implementation of the algorithm. Topics
discussed here are intended to serve as an entry point for molecular
ecologists interested in employing Random Forest to identify trait
associations in genomic data sets.
提供机构:
Dryad
创建时间:
2018-03-01



