five

Datasets and Software from Wang and Liu BMC Genomics 2016

收藏
DataCite Commons2024-02-05 更新2024-07-27 收录
下载链接:
https://figshare.com/articles/dataset/Datasets_and_Software_from_Wang_and_Liu_BMC_Genomics_2016/7874612
下载链接
链接失效反馈
官方服务:
资源简介:
Datasets and software used in paper "A performance study of the impact of recombination on species tree analysis"<br>=================================================================================================================<br>The dataset and software are provided in a compressed file.The compressed file has been split into multiple pieces to facilitate file hosting.To reassemble and decompress the compressed file, run the following commands (assuming that thefiles are stored on a Linux computer):<br> cd local_directory_where_the_compressed_file_pieces_are_located cat wang-and-liu-2016-data.tar.gz.split-* &gt; wang-and-liu-2016-data.tar.gz tar xzf wang-and-liu-2016-data.tar.gz<br>=================================================================================================================<br>Dataset and software files are arranged in the following directory structure. All the trees are in Newick format.<br>1. ./simulated-model-trees: - it contains 8-taxon, 15-taxon, and 25-taxon simulated model trees. <br>2. ./alternate-model-trees: - it contains the species trees(species_trees.nex) used in Lanier and Knowles' study and the empirical tree (empirical_tree.nex) used in our study.<br>3. ./empirical-SNPs: - chr#.meta: it is the meta file for chromosome#, which maps the loci of the SNPs to its actual positions in the full sequence alignment. - chr#.fasta: FASTA format SNPS of chromosome# - map: map the taxa in chr#.fasta to the species.<br>4. ./software: - ms: used to simulate coalescent trees. [Download][ms_link] - msHOT: used to simulate coalescent trees with recombination hotspots. [Download][mshot_link] - Seq-gen: used to simulate DNA sequence alignment along the coalescent trees.[Download][seq-gen_link] - ASTRAL: used to infer model species tree from coalescent trees.[Download][astral_link] - FastTree: used to infer tree from DNA sequence alignment.[Download][fasttree_link] - LRScan_con.py: implementation of LRScan algorithm.<br>5. The sequence data is contained in the following subdirectories: - ./sequence_data/EP: contains the sequence data simulated from empirical tree. - ./sequence_data/LN: contains the sequence data simulated from species trees used in Lanier and Knowles' study. - ./sequence_data/HOT: contains the sequence data simulated from 8-taxa model trees under deterministic approach (hot1) and non-deterministic approach (hot2). - ./sequence_data/three-case: contains the sequence data simulated from 8-taxa, 15-taxa, 25-taxa model trees. - Within each subdirectory, the folders "100", "200", "1000" indicates the recombination rate of during the sequence generation process.<br>=================================================================================================================<br>[ms_link]:http://home.uchicago.edu/rhudson1/source/mksamples.html[mshot_link]:http://home.uchicago.edu/~rhudson1/source/mksamples.html[seq-gen_link]:http://tree.bio.ed.ac.uk/software/seqgen/[astral_link]:https://github.com/smirarab/ASTRAL[fasttree_link]:http://www.microbesonline.org/fasttree/[seq_link]:https://www.dropbox.com/s/e48j2bcnyxdg2yi/sequence_data.tar.gz?dl=0<br>=================================================================================================================<br>You can redistribute the data and software and/or modifyit under the terms of the GNU General Public License as published bythe Free Software Foundation, either version 3 of the License, or(at your option) any later version.<br>This data and software is distributed in the hope that it will be useful,but WITHOUT ANY WARRANTY; without even the implied warranty ofMERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See theGNU General Public License for more details.<br>You should have received a copy of the GNU General Public Licensealong with this program. If not, see .<br>

论文"重组对物种树分析影响的性能研究"所用数据集与软件 ================================================================================================================= 本研究所用数据集与软件均打包于压缩文件中。为便于文件托管,该压缩文件已拆分为多个分卷。若需合并并解压该压缩文件,请在Linux系统中执行以下命令(假设分卷文件已存储于目标目录): cd 存储压缩文件分卷的本地目录 cat wang-and-liu-2016-data.tar.gz.split-* > wang-and-liu-2016-data.tar.gz tar xzf wang-and-liu-2016-data.tar.gz ================================================================================================================= 数据集与软件文件的目录结构如下,所有进化树文件均采用Newick格式(Newick format): 1. ./simulated-model-trees:包含8分类群、15分类群及25分类群的模拟模型树。 2. ./alternate-model-trees:包含Lanier与Knowles研究中所用的物种树文件species_trees.nex,以及本研究所用的经验树文件empirical_tree.nex。 3. ./empirical-SNPs: - chr#.meta:染色体#的元数据文件,用于将SNP位点映射至全序列比对的实际位置。 - chr#.fasta:染色体#的FASTA格式(FASTA format)SNP序列文件。 - map:将chr#.fasta中的分类群名称映射至对应物种。 4. ./software: - ms:用于模拟溯祖树(coalescent trees),[下载地址][ms_link] - msHOT:用于模拟带有重组热点的溯祖树,[下载地址][mshot_link] - Seq-gen:用于沿溯祖树模拟DNA序列比对,[下载地址][seq-gen_link] - ASTRAL:用于从溯祖树推断模型物种树,[下载地址][astral_link] - FastTree:用于从DNA序列比对推断进化树,[下载地址][fasttree_link] - LRScan_con.py:LRScan算法的Python实现代码。 5. 序列数据存储于以下子目录: - ./sequence_data/EP:包含基于经验树模拟得到的序列数据。 - ./sequence_data/LN:包含基于Lanier与Knowles研究中所用物种树模拟得到的序列数据。 - ./sequence_data/HOT:包含基于8分类群模型树,分别通过确定性方法(hot1)与非确定性方法(hot2)模拟得到的序列数据。 - ./sequence_data/three-case:包含基于8分类群、15分类群及25分类群模型树模拟得到的序列数据。 - 每个子目录下的100、200、1000文件夹分别代表序列生成过程中的重组率参数。 ================================================================================================================= [ms_link]:http://home.uchicago.edu/rhudson1/source/mksamples.html [mshot_link]:http://home.uchicago.edu/~rhudson1/source/mksamples.html [seq-gen_link]:http://tree.bio.ed.ac.uk/software/seqgen/ [astral_link]:https://github.com/smirarab/ASTRAL [fasttree_link]:http://www.microbesonline.org/fasttree/ [seq_link]:https://www.dropbox.com/s/e48j2bcnyxdg2yi/sequence_data.tar.gz?dl=0 ================================================================================================================= 本数据集与软件可根据自由软件基金会发布的GNU通用公共许可证(第3版或任意后续版本)的条款进行再分发及/或修改。 本数据集与软件的发布仅出于实用目的,不附带任何明示或默示的担保,包括但不限于适销性或特定用途适用性的担保。有关详细信息,请参阅GNU通用公共许可证。 您应当已随本程序收到一份GNU通用公共许可证的副本。若未收到,请访问相关页面获取。
提供机构:
figshare
创建时间:
2019-03-21
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作