Simulated nucleotide sequences for testing alignment-free genome distance estimates
收藏NIAID Data Ecosystem2026-03-12 收录
下载链接:
https://zenodo.org/record/4022499
下载链接
链接失效反馈官方服务:
资源简介:
This repository contains (12×500=)6,000 pairs of nucleotide sequences that have been simulated for testing alignment-free genome distance estimates, as described in Criscuolo (2019). Given an evolutionary distance d varying from 0.05 to 0.60 (step = 0.05), the program SeqGen was used to simulate the evolution of 500 nucleotide sequence pairs with d substitution events per character (GTR+Γ evolutionary model).
For each of the 12 evolutionary distances d = 0.05, 0.10, ..., 0.60, an XZ-compressed file containing 500 lines is available. Each line contains 18 fields separated by blank spaces:
[1] seed value used during simulation,
[2] true evolutionary distance d between the two simulated sequences,
[3] total number of simulated characters,
[4] number of non-indel characters with nucleotide mismatch,
[5] number of non-indel characters,
[6-9] A, C, G, T frequencies used during simulation,
[10-15] GTR parameters used during simulation,
[16] Γ distribution parameter used during simulation,
[17-18] two simulated sequences with indel events as gaps.
Of note, each pair of aligned sequences without gaps can be regenerated using SeqGen v1.3.4 with parameters from fields [1,3,6-16] and the following two-leaf model tree:
(t1:d,t2:0.000);
where d is given in field [2].
___
Criscuolo A (2019) A fast alignment-free bioinformatics procedure to infer accurate distance-based phylogenetic trees from genome assemblies. Research Ideas and Outcomes, 5:e36178. doi:10.3897/rio.5.e36178
创建时间:
2020-09-11



