Interpreting alignment free sequence comparison: what makes a score a good score
收藏DataCite Commons2025-05-07 更新2024-07-03 收录
下载链接:
https://research.aber.ac.uk/en/datasets/3e9573af-363b-4ff7-9f27-72eb84440b68
下载链接
链接失效反馈官方服务:
资源简介:
For protein (aa for amino acid) and DNA sequences: an example set of Linux and Python scripts, including data and the KAST executable.
The scripts run KAST, evaluate the output with an objective function, make score-frequency histograms, generate likelihood scores from the histograms, and annotate KAST output with the likelihood scores.
For proteins (aa), the available data (in the Data subdirectory of the protein example) includes FASTA files containing the protein sequences for the yeasts system and the fly-worm system, and the associated DIOPT files with the ortholog mappings.
For DNA the available data (in the Data subdirectory of the DNA example) includes FASTA files containing the DNA sequences for the strain (query) and species (ref) data sets and the associated file with the NCBI taxonomic mappings. These data sets are relatively large.
Also available are the scripts to the make the figures in the paper; plus a set of histogram data files for both proteins (the yeasts system) and DNA that explore additional parameter sets and which may be used with some of the provided scripts.
提供机构:
Prifysgol Aberystwyth | Aberystwyth University
创建时间:
2022-02-01



