Phylogenomic analyses shed lights into the adaptation to aquatic environments in Alismatales
收藏DataCite Commons2025-04-01 更新2024-11-05 收录
下载链接:
https://figshare.com/articles/dataset/Integrating_transcriptomes_to_investigate_genes_associated_with_adaptation_to_water_environments_and_assess_phylogenetic_conflict_and_whole_genome_duplications_in_Alismatales/16967767/8
下载链接
链接失效反馈官方服务:
资源简介:
- The folder '1_all_cds_pep' contains the CDS and PEP sequences for the 95 samples from Alismatales and outgroups. (please contact Lingyun Chen lychen83@qq.com for these sequences)
- '2_MO_alignment_trees' contains the 1005 nuclear orthologs, alignment, concatenated ML tree, and ASTRAL tree.
- '3_chloroplast_alignment_trees' contains chloroplast genes for 92 samples, alignment, concatenated ML tree, and ASTRAL tree.
- '4_3492_extracted_clades' contains the 3492 clusters, which were used for whole genome duplication analyses. The 3492 clusters were mapped to ASTRAL species tree to count the number of duplicated genes at each node.
- '5_phylogenetic_conflict' contains data related to phylogenetic conflict analyses.
- '6_divergence_time' contains a data matrix for BEAST analyses and output
- 'alismatales_40genes_1000M_generations_fix_hypothesis1.xml' is the input for BEAST
- 'alismatales_40genes_1000M_generations_fix_hypothesis1.10percent.tre' is the summary tree generated with TreeAnnotator. It is also the tree in Supplementary Fig. S4
- '7_whole_genome_duplication' contains the ks values used for Ks plot
- '8_gene_evolution_ko_analyses' contains the sequence alignment and phylogenetic trees of gene families. It also included a matrix containing the KEGG information.
- 'gene_orthologs' contains the alignments and individual gene trees:
- '9_Plant_Photos_Figshare' contains original plant photos.<br>
'mafft_file_for_tree_build' - 20 alignment files for tree building
'ortholog_tree' files of the final gene families trees
- 'KEGG_orthologs' contains the numbers of gene annotation with the specific KEGG ortholog:
'matrix_for_enrichment_test.tsv' - gene copy number matrix for all 4687 KOs and 95 species
- 'taxon_list_nov52021' contains abbreviations of species names and full names
<br>
Analyses for folders startwith 1, 2, 3, and 4 following methods at https://bitbucket.org/yanglab/phylogenomic_dataset_construction/src/master/
<br>
Instruction for analyses on phylogenetic conflict, corresponding to the folder '5_phylogenetic_conflict'
5.1. Quartet Sampling analyses (in folder: /5_phylogenetic_conflict/5.1_fig_s2_alismatales_input_quartet_sampling)
quartet_sampling.py --tree MO_1005_astral_speciestree --align ortholog_MO_1005_concatinated.fa.phy --reps 100 --threads 6 --lnlike 2quartet_sampling.py --tree MO_1005_astral_speciestree --align ortholog_MO_1005_concatinated.fa.phy --reps 100 --threads 6 --lnlike 2
<br>
5.2. PhyloNet (in folder: /5_phylogenetic_conflict/5.2_fig3_phylonet)
The folder '5.2_fig3_phylonet' contains two kinds of files. The files endwith '.nex' are the input file, while the files endwith 'output' are the output.
Excute command for each of input files.
java -jar -Xmx140G /PATH_TO_PHYLONET/PhyloNet_3.8.2.jar .nex > _output
<br>
5.3. Consel (in folder: /5_phylogenetic_conflict/5.3_fig4a_all_consel_feb19/estimate_site_wise_log_likelihood_values)
First, estimate the branch length (in folder: /5.3_fig4a_all_consel_feb19/estimate_site_wise_log_likelihood_values)
for a in *aln-cln; do raxml -T 10 -f d -s $a -m GTRGAMMA -g alismatales_10species_topology1.tre -n $a"_output_topology1.cons" -p 123456 -N 10 -o atri291; done &
for a in *aln-cln; do raxml -T 10 -f d -s $a -m GTRGAMMA -g alismatales_10species_topology2.tre -n $a"_output_topology2.cons" -p 123456 -N 10 -o atri291; done &
for a in *aln-cln; do raxml -T 10 -f d -s $a -m GTRGAMMA -g alismatales_10species_topology3.tre -n $a"_output_topology3.cons" -p 123456 -N 10 -o atri291; done &
<br>
The above commands generate 3 tree files endwith '.cons'. Combine the three files, and change the file name, '*_output_3topologies'
<br>
Then, estimate site-wise log-likelihood values
for a in *alismatales.fa.aln-cln
do raxml -T 16 -f G -z $a'_output_3topologies' -s $a -r $a’_output_3topologies’ -m GTRGAMMA -n $a"_sitelh"
done
<br>
Use the files 'RAxML_perSiteLLs*sitelh' generated from the last step for further analyses.
Change the file names generated from last step, as the input of seqmt and makermt need to have the same same, only the extension latter need to vary. For example, change 'RAxML_perSiteLLs.cluster939_1.ortho.fa_alismatales.fa.aln-cln.sitelh' to 'cluster939_1.ortho.fa_alismatales.fa.aln-cln.sitelh'.
<br>
Then excute
for filename in $(ls *.sitelh); do seqmt --puzzle $filename; done
for filename in $(ls *.aln-cln); do makermt $filename; done
for filename in $(ls *.rmt); do consel $filename; done
for filename in $(ls *.pv); do catpv $filename > $filename.out; done
<br>
Use shell scripts to extract the scores in the files endwith *out, and generate a file similar to 'Congruent_au_test.csv'
Execute python au.py. This command will show the results
<br>
5.4. Counting RAxML likelihood scores (in folder: /5_phylogenetic_conflict/5.4_fig4b_likelihood_raxml_output)
Extract the ML cores from the 'RAxML_info.RAxML_bestTree*' file, using command:
for i in $(ls RAxML_info*);
do
echo $i >> ../all_ln_consel.txt
grep "Tree" $i >> ../all_ln_consel.txt
done
<br>
Then, use more shell scripts to generate a file with a format similar to 'all_ln_consel.txt'. This process can be assisted by a script 'extract_raxml_infor_ln.py'
Execute python ln_counts.py. The command will show the results
<br>
5.5. Trees used to count support for the three hypotheses of Alismatales (in folder: /5_phylogenetic_conflict/5.5_fig4c_798trees)
<br>
5.6. Output from the polytomy test (in folder: /5_phylogenetic_conflict/5.6_polytomy_test)
<br>
<br>
If you have any questions, please do not hesitate to contact me lychen83@qq.com
<br>
Lingyun Chen
<br>
提供机构:
figshare
创建时间:
2024-10-10



