five

Selecting a window size for the analysis of whole genome alignments using AIC

收藏
DataONE2025-09-03 更新2025-09-13 收录
下载链接:
https://search.dataone.org/view/sha256:6220df4ab18d296a064cdf8c7ec0aabb96a0771cd04aed077c2311e388984a67
下载链接
链接失效反馈
官方服务:
资源简介:
The variation of evolutionary histories along the genome presents a challenge for phylogenomic methods to identify the non-recombining regions and reconstruct the phylogenetic tree for each region. To address this problem, many studies used the non-overlapping window approach, often with an arbitrary selection of fixed window sizes that potentially include intra-window recombination events. In this study, we proposed an information-theoretic approach to select a window size that best reflects the underlying histories of the alignment. First, we simulated chromosome alignments that reflected the key characteristics of an empirical dataset and found that the AIC is a good predictor of window size accuracy in correctly recovering the tree topologies of the alignment. Due to the issue of missing data in empirical datasets, we then designed a stepwise non-overlapping window approach and applied this method to the genomes of erato-sara Heliconius butterflies and great apes. We found that the ..., , , # Selecting a window size for the analysis of whole genome alignments using AIC [https://doi.org/10.5061/dryad.jdfn2z3ng](https://doi.org/10.5061/dryad.jdfn2z3ng) ## Description of the data and file structure All datasets analysed on the manuscript were either simulated using the codes provided in `SimNOW-main.zip` or available online. The attached `.tsv` files were generated from running the codes on different datasets. ### Files and variables * `simulation_lowILS.tsv`: summary table for simulations with low ILS * `simulation`: simulation run * `r`: recombination rate on `ms` * `window_size`: window size being analysed * `accuracy`: percentage of sites that correctly recovers the true topology * `rmse`: RMSE of the topology distribution from the window trees compared to the true distribution * `aic`: the AIC score from the window trees * `bic`: the BIC score from the window trees * `simulation_midILS.tsv`: summary table for simulations with medium ILS * `simulation_h...,
创建时间:
2025-09-04
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作