five

Simulated nucleotide sequences for testing alignment-free genome distance estimates

收藏
NIAID Data Ecosystem2026-03-12 收录
下载链接:
https://zenodo.org/record/4022499
下载链接
链接失效反馈
官方服务:
资源简介:
This repository contains (12×500=)6,000 pairs of nucleotide sequences that have been simulated for testing alignment-free genome distance estimates, as described in Criscuolo (2019). Given an evolutionary distance d varying from 0.05 to 0.60 (step = 0.05), the program SeqGen was used to simulate the evolution of 500 nucleotide sequence pairs with d substitution events per character (GTR+Γ evolutionary model). For each of the 12 evolutionary distances d = 0.05, 0.10, ..., 0.60, an XZ-compressed file containing 500 lines is available. Each line contains 18 fields separated by blank spaces:   [1]     seed value used during simulation,   [2]     true evolutionary distance d between the two simulated sequences,   [3]     total number of simulated characters,   [4]     number of non-indel characters with nucleotide mismatch,   [5]     number of non-indel characters,   [6-9]   A, C, G, T frequencies used during simulation,   [10-15]   GTR parameters used during simulation,   [16]     Γ distribution parameter used during simulation,   [17-18]   two simulated sequences with indel events as gaps. Of note, each pair of aligned sequences without gaps can be regenerated using SeqGen v1.3.4 with parameters from fields [1,3,6-16] and the following two-leaf model tree: (t1:d,t2:0.000); where d is given in field [2]. ___ Criscuolo A (2019) A fast alignment-free bioinformatics procedure to infer accurate distance-based phylogenetic trees from genome assemblies. Research Ideas and Outcomes, 5:e36178. doi:10.3897/rio.5.e36178
创建时间:
2020-09-11
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作