five

Phylogenetic statistics for loci that BLAST to one of three expressed sequence databases (‘EST’) relative to an equal-sized subsample of loci that do not (‘non-EST’) as well as the full m10wRE dataset (‘Full’).

收藏
NIAID Data Ecosystem2026-03-08 收录
下载链接:
https://figshare.com/articles/dataset/_Phylogenetic_statistics_for_loci_that_BLAST_to_one_of_three_expressed_sequence_databases_8216_EST_8217_relative_to_an_equal_sized_subsample_of_loci_that_do_not_8216_non_EST_8217_as_well_as_the_full_m10wRE_dataset_8216_Full_8217_/988113
下载链接
链接失效反馈
官方服务:
资源简介:
Statistics were calculated for the full m10wRE dataset, the dataset composed only of the 4,715 loci that blasted to at least one of three expressed sequence databases with E-value < 1E–15, and 100 randomly subsampled datasets of 4,715 loci drawn at random loci that did not blast at any level to the expressed sequence databases. For both the ‘EST’ dataset and the ‘non-EST’ subsamples, loci were drawn from the subsample of loci that were between 50 bp and 55 bp aligned length, inclusive, and statistics for each dataset are calculated on the maximum likelihood tree for that dataset. P-values approximate the type-I error rate under the null hypothesis that the ‘EST’ loci are drawn from the ‘non-EST’ pool of loci. P-values are calculated as two times the percent of random subsamples that are more extreme than the statistics observed on the ‘EST’ tree.

本研究针对三类数据集计算统计量:其一为完整的m10wRE数据集;其二为仅包含4715个基因座的子集,该子集内的基因座可通过基本局部比对搜索工具(Basic Local Alignment Search Tool,BLAST)比对至至少三个表达序列数据库之一,且比对E值小于1×10^-15;其三为100个随机子采样数据集,每个数据集包含4715个随机选取的基因座,这些基因座未在任何比对层级上匹配上述表达序列数据库。无论是EST(Expressed Sequence Tag,表达序列标签)数据集还是非EST子样本,其选取的基因座均来自比对长度介于50bp至55bp(含两端)的基因座子集;各数据集的统计量均基于对应数据集的最大似然树计算得到。在原假设“EST基因座取自非EST基因座集合”成立的前提下,P值近似对应一类错误率。P值的计算方式为:统计随机子采样数据集的统计量较“EST树”观测统计量更为极端的比例,再将该比例乘以2,以此得到最终的P值。
创建时间:
2014-04-04
二维码
社区交流群
二维码
科研交流群
商业服务