Statistical analyses of the overlaps between A. ceylanicum genes encoding protein domains (or other traits) and A. ceylanicum genes encoding ES proteins.
收藏Figshare2026-03-17 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/_p_Statistical_analyses_of_the_overlaps_between_i_A_ceylanicum_i_genes_encoding_protein_domains_or_other_traits_and_i_A_ceylanicum_i_genes_encoding_ES_proteins_p_/31794270
下载链接
链接失效反馈官方服务:
资源简介:
Each category of genes (e.g., genes encoding a particular protein domain or having a particular trait) was compared for its degree of overlap to a set of ES genes, and statistically analyzed for the non-randomness of this overlap by a two-tailed Fisher test. In situtations where many categories were compared at once (for instance, an entire set of protein domains from Pfam, InterPro, or Phobius), p-values initially generated by Fisher testing were used to compute q-value significance scores which corrected for multiple hypothesis testing. Statistics are provided for the set of 565 ES genes described in this paper (labeled “UMass_ES”), for the set of 955 adult ES genes described by Uzoechi et al. [58] or Wong et al. (labeled “WashU_ES”), and for the set of 511 adult ES genes described only by Uzoechi et al. or Wong et al. but not by data in this paper (labeled “WashU-only ES”). For each ES set, spreadsheets are given for analyses of overlaps with the following traits; signal or transmembrane domain organizations predicted by Phobius; protein domains in Pfam; protein domains in InterPro; and various binary comparisons with other gene traits (“Various”). In addition, two spreadsheets are given for comparisons of the statistical significances (q-values) for overrepresented Pfam and InterPro domains, with the comparisons being between our set of 565 ES genes and the WashU-only set of 511 adult ES genes. In these comparisons, only domains with a q-value ≤ 0.1 for at least one gene set were considered, and the q-values were compared by computing their ratios for each domain. For a given domain, a q-value ratio very far from 1 indicated that that domain was much more significantly overrepresented in one gene set than in the other. Each analysis of overrepresentation in gene sets provides the following data: “Motif” denotes the specific protein domain (or other gene category) for which genes encoding it are being tested for nonrandomly high (or low) overlap with the ES gene set; “All_genes” gives the total number of v2.1 protein-coding genes within which overlaps were tested; “Motif_genes” gives the number of genes annotated with the protein domain (or other trait) being tested for overlap; “Class_genes” gives the number of ES genes being tested for overlap; “Motif.Class_overlap” gives the observed number of genes falling into both categories; “Exp_rand_overlap” gives the number of genes that would be expected to overlap purely randomly; “Enrichment” gives the ratio of observed versus random overlaps (note that this ratio can be lower than 1, and in fact can be as low as 0); “p-value” gives an initial stastistical significance for the observed overlap, computed by a two-tailed Fisher test; “q-value” gives, for cases of many comparisons at once (e.g., testing for all Pfam domains simultaneously), a significance score that conservatively corrects for multiple hypothesis testing [77,78]. Note that q-values were not computed for simple binary comparisons of gene traits listed in “Various”, which are annotated for A. ceylanicum genes in S1 Table, and defined in its table legend: Intestine-biased; Non-intestine-biased; Pos.intest.immunoreg; Neg.intest.immunoreg; Pos.non-intest.immunoreg; Neg.non-intest.immunoreg; Male-biased; Female-biased; Uzoechi_any_ES; Uzoechi_male.only_ES; Uzoechi_female.only_ES; and Uzoechi_both_ES. Also, again note that highly significant overlaps can be either higher or lower than the randomly expected genome-wide background rate. (XLSX)
创建时间:
2026-03-17



