Comparative and phylogenomic analysis of nuclear and organelle genes in cryptic Coelastrella vacuolata MACC-549 green algae
收藏NIAID Data Ecosystem2026-03-12 收录
下载链接:
https://zenodo.org/record/4720667
下载链接
链接失效反馈官方服务:
资源简介:
Phylogenetic analysis was carried out by using 3 nucleotide (18S, ITS2 and tufA) and 7 protein sequences (PetA, PsbD, PsaC, PsbB, RbcL, AtpA, and PsaB proteins coded on the chloroplast genome). The bulk of all the sequences used were downloaded from a previous study - Wang et. al. 2019. Multiple alignments of separate loci were created by using the L-INS-i Iterative refinement method in MAFFT v.7.427. The multiple alignments were manually curated to correct false base mismatches introduced by the automatic multiple alignment algorithm. We also provide these multiple sequence alignments herewith for anyone to use with their own dataset to phylogenetically classify members within Sphaeropleales.
After alignment, highly variable and non-informative sites from the alignments were removed using GBlocks (v.91b). Minimum number of sequences for a conserved position and for a flank position were set to 50% of the number of sequences plus one, maximum number of contiguous non-conserved positions was set to 20, minimum length of a block was set to 2 and we allowed a gap position in all sequences. The 3 nucleotide sequences were concatenated into a single aligned sequence, referred to as the “Conventional loci alignment”. We also assembled a 10 loci alignment by concatenating the 3 nucleotide and 7 protein sequences into one alignment. Overall, two phylogenetic trees were inferred by iqTree2. These trees are also provided herewith for those who would like to try it out.
FILE DESCRIPTIONS:
The files ending in *.fasta (sequence_18s_v2_aligned.fasta, sequence_fixedITS2_newnames_aligned_edited.fasta and sequence_tufA_v2_aligned.fasta) all contain individual alignments for the 3 nucleotide sequences. All files ending in *.seqs (atpA_aligned.seqs, petA_aligned.seqs, psaB_aligned.seqs, psaC_aligned.seqs, psbB_aligned.seqs, psbD_aligned.seqs, rbcL_aligned.seqs) contain individual alignments for the 7 protein sequences.
The nucleotide sequences are concatenated within Traditional_tree_concatenated_v2.fas and the protein sequences are concatenated in concatenated_proteins_v2.faa. The mixed_loci.tre file is the tree file produced by mixing both the nucleotide and protein sequence. The traditional_loci.tre file is the tree file produced with only the three nucleotide sequences. Finally, the nexus file required for iqTree2 runs are provided in partitions_mixed.txt (use with concatenated_proteins_v2.faa) and partitions_traditional_tree.txt (use with Traditional_tree_concatenated_v2.fas)
We would recommend any who would like to use these trees to start with the individual alignments, add their sequences of interest and proceed with the analysis as described in the paper.
创建时间:
2021-04-28



