codon_usage_by_gene_presence-ALL-V3.1.pl
收藏DataCite Commons2024-06-22 更新2024-07-13 收录
下载链接:
http://datos.uchile.cl/file.xhtml?persistentId=doi:10.34691/UCHILE/AYDRZL/LDOG5E
下载链接
链接失效反馈官方服务:
资源简介:
Program will make histograms and boxplots for the distribution of values of each column of several
The list of must be in the file
its format must be:
lines beginning with # are not considered
first line not starting with # must have a single file containing the codon usages in all genomes format like in file average-b.txt
following lines (not starting with #) must have three fields separated by tabs:
First field : name of the gene
Second field: first file, has data for genomes with the gene
Third field : second file, has data for genomes withOUT the gen
All data files need to be in the same format. Data divided be tab separated columns
First line starts with "#" and indicates the name of each field
corresponds to the gene that has been analyzed. Will be used in plots names
Results files will be constructed based on the and the names of each column and
A file with all plots will be saved to a file that additionally contains
Results will be saved in . It must end with "/"
Additionally, the program adds an histogram. For that, the (size of bins), as well as the minimum value to count () and the number of bins to include () must also be added to the command line. If is set to zero, please use 0.0 instead of 0.
This script requires the following modules
Statistics::Descriptive
Statistics::Ttest
Please consider that this script uses T-test to define statistical confidence of differences. This suppose a normal distribution of data. If your data is not normally distributed, please use codon_usage_by_gene_presence-ALL-Wilcoxon-V3.1.pl instead. P value is set to 0.00005. If a different value is required, please change the variable $pvalue accordingly.
Usage: perl codon_usage_by_gene_presence-ALL-V3.1.pl
本程序将为多个[原文此处存在缺失内容]的各列数值分布绘制直方图(histogram)与箱线图(boxplot)。输入文件列表需存储于一个文件中,其格式要求如下:
以#开头的行将被程序忽略。
首个非#开头的行需指定一个包含所有基因组密码子使用情况(codon usage)的文件,格式需与average-b.txt文件保持一致。
后续非#开头的行需包含三个以制表符分隔的字段:
第一个字段:基因名称
第二个字段:第一个文件路径,对应携带该基因的基因组数据集
第三个字段:第二个文件路径,对应不携带该基因的基因组数据集。
所有数据文件需采用统一格式:数据以制表符分隔的列存储。
数据文件的首行以#开头,用于标注各列的字段名称。
基因名称字段对应本次分析的目标基因,将用于绘图的文件名与标题。
结果文件将基于输入列表文件与各列的字段名称生成。
所有绘图结果将整合保存至一个文件,该文件额外包含[原文此处存在缺失内容]。
结果将保存至指定目录,且目录路径必须以正斜杠(/)结尾。
此外,本程序支持额外绘制直方图。为此,需在命令行中指定箱宽(bin size)、计数下限(minimum count value)以及绘图箱的数量(number of bins)。若指定值为0,请使用0.0而非0。
本脚本需依赖以下Perl模块:Statistics::Descriptive、Statistics::Ttest。
请注意,本脚本通过T检验(T-test)评估组间差异的统计学显著性,该方法假设数据服从正态分布。若你的数据不满足正态分布,请改用codon_usage_by_gene_presence-ALL-Wilcoxon-V3.1.pl脚本。本脚本默认显著性P值(p-value)为0.00005,若需修改该值,请自行调整变量$pvalue的取值。
使用方法:perl codon_usage_by_gene_presence-ALL-V3.1.pl
提供机构:
Repositorio de datos de investigación de la Universidad de Chile
创建时间:
2024-03-06



