five

Supplementary data for: Primate phylogenomics uncovers multiple rapid radiations and ancient interspecific introgression

收藏
Mendeley Data2024-04-12 更新2024-06-27 收录
下载链接:
https://datadryad.org/stash/dataset/doi:10.5061/dryad.rfj6q577d
下载链接
链接失效反馈
官方服务:
资源简介:
*********************************************** ALIGNMENTS Aligned single-copy orthologs from primate datasets obtained via NCBI. Alignments were performed using GUIDANCE2 with Mafft. An example of an alignment script is below. #!/bin/bash #PBS -k o #PBS -l nodes=1:ppn=8,vmem=40gb,walltime=45:00:00 #PBS -M ddvanderpool@gmail.com #PBS -m abe #PBS -N 1_CODON_AlignQSUB #PBS -j oe cd /N/u/danvand/Carbonate/work_Primates/ALIGN_all_PRIMATE_GROUPS_4TAX/CODON_UNALIGNED for file in *unalign.fa; do dest="/N/u/danvand/Carbonate/work_Primates/ALIGN_all_PRIMATE_GROUPS_4TAX/CODON_UNALIGNED/"; f=${file%\_TRANS_unalign.fa}; /N/u/danvand/Carbonate/src/guidance.v2.02/www/Guidance/guidance.pl --seqFile "$dest""$file" --dataset "$f" --msaProgram MAFFT --out Order as_input --proc_num 10 --seqCutoff 0.93 --colCutoff 0.95 --seqType codon --bootstraps 60 --outDir $PWD; ## The above line runs the file through GUIDANCE2 which generates column and seq scores using 60 bootstrap replicates from MAFFT codon alignments. /N/u/danvand/Carbonate/bin/maskLowScoreResidues.pl "$f".MAFFT.aln.With_Names "$f".MAFFT.Guidance2_res_pair_res.scr "$f".GUIDANCE.MASK_SEQ.fa 0.93 nuc; ##The above line uses the scores file to mask low confidence residues /N/u/danvand/Carbonate/bin/trimal -in "$dest""$f".GUIDANCE.MASK_SEQ.fa -out "$dest""$f".GUIDANCE.MASK_SEQ_TRIMal.fa -gt 0.5 -cons 50; ##The above lineTrimAl to cut out sites not present in at least 50% of the taxa or conserved in 50% /N/u/danvand/Carbonate/pythonscripts/Clean.N.py "$dest""$f".GUIDANCE.MASK_SEQ_TRIMal.fa; ## The above line is just an adhoc fix because I realized Trimal was was counting the masked sequence and not trimming them. the easiest fix was to convert the N’s to gaps. /N/u/danvand/Carbonate/pythonscripts/remove_aln_gaps_update.pl "$dest""$f".GUIDANCE.MASK_SEQ_TRIMal.fa.NoNs.fa .5 .9 200 > "$dest""$f”_NO_GAP.fa; ## The above line now takes the new “gapped” sequence and deletes a site if more than 1/2 of the taxa don’t have, Masks the whole sequence if more than 10% of the sequence is missing or if it is under 200bp. /N/u/danvand/Carbonate/pythonscripts/ReplaceString.py "$f"_NO_GAP.fa "_.5_.9_200" "" "$f”_TRANS_GUIDANCE_TRIMAL_NoNcol_CODON.fa; ## The Above line just deletes fasta headerline annotation left from the previous script /N/u/danvand/Carbonate/pythonscripts/Remove_All_N_Taxa.py "$f”_TRANS_GUIDANCE_TRIMAL_NoNcol_CODON.fa; ## The Above line just takes out taxa that composed only of N’s, this turns out to matter and is more of a double check for all seqs before they are finalized. mv "$dest"*GUIDANCE* "$dest"../CODON_ALIGNED; rm "$dest"*MAFFT* "$dest"*Seqs* "$dest"COS* "$dest"END* "$dest"Sample* "$dest"log "$dest”*NO_GAP.fa ##The above lines just cut down build up and only keeps the most relevant last couple of steps, just in case you want to change something. You don’t have to realign everything. done; ***********************************************
创建时间:
2023-06-28
二维码
社区交流群
二维码
科研交流群
商业服务