Aberrant gene expression prediction benchmark based on GTEx v8
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/8427311
下载链接
链接失效反馈官方服务:
资源简介:
This repository contains the aberrant gene expression prediction benchmark data as well as the necessary expected gene expression across tissues and tissue-specific isoform contribution scores for AbExp prediction.
The aberrant gene expression prediction benchmark data (aberrant_expression_prediction_benchmark.parquet) contains the following columns:
individual: GTEx individual
gene: Ensembl gene identifier
tissue: GTEx tissue
tissue_type: GTEx tissue type
mu: OUTRIDER-estimated expected gene expression
theta: OUTRIDER-estimated gene dispersion
counts: Raw gene expression count
normalized_counts: OUTRIDER-normalized gene expression count
l2fc: log2 fold change between observed and expected gene expression count
zscore: z-score of gene expression, obtained by quantile-mapping the OUTRIDER-estimated distribution to the standard normal distribution
nominal_pvalue: OUTRIDER-estimated p-value of being an expression outlier
FDR: FDR-adjusted p-value of being an expression outlier
is_in_benchmark: Whether this observation is part of the aberrant gene expression prediction benchmark
is_underexpressed_outlier: Whether this observation is an underexpression outlier at FDR < 5%. This is the benchmark prediction label.
The isoform proportions table (gtex_v8_isoform_proportions.tsv) contains the following columns:
gene: Ensembl gene identifier
tissue_type: GTEx tissue type
tissue: GTEx tissue
transcript: Ensembl transcript identifier
mean_transcript_proportions: mean transcript proportions across individuals in GTEx v8
median_transcript_proportions: median transcript proportions across individuals in GTEx v8
sd_transcript_proportions: standard deviation of transcript proportions across individuals in GTEx v8
The expected gene expression table (gtex_v8_expected_expression.tsv) contains the following columns:
gene: Ensembl gene identifier
tissue_type: GTEx tissue type
tissue: GTEx tissue
gene_is_expressed: Whether the gene is expressed in the tissue
median_expression: median OUTRIDER-estimated expected gene expression (mu) across individuals
expression_dispersion: OUTRIDER-estimated gene dispersion (theta)
本仓库收录了异常基因表达预测基准数据集,以及用于AbExp(异常基因表达)预测所需的跨组织预期基因表达数据与组织特异性转录本异构体贡献得分。
异常基因表达预测基准数据集(文件名为aberrant_expression_prediction_benchmark.parquet)包含以下字段:
- individual:GTEx(Genotype-Tissue Expression)供体个体
- gene:Ensembl基因标识符
- tissue:GTEx组织来源
- tissue_type:GTEx组织类型
- mu:OUTRIDER算法估计的预期基因表达量
- theta:OUTRIDER算法估计的基因离散度
- counts:原始基因表达计数
- normalized_counts:经OUTRIDER标准化后的基因表达计数
- l2fc:观测基因表达计数与预期基因表达计数的log2倍数变化
- zscore:基因表达的z得分,通过将OUTRIDER估计的表达分布分位数映射至标准正态分布计算得到
- nominal_pvalue:OUTRIDER算法估计的表达异常候选p值
- FDR:表达异常候选的假发现率(False Discovery Rate)校正p值
- is_in_benchmark:该观测值是否属于异常基因表达预测基准集
- is_underexpressed_outlier:该观测值是否为FDR<5%阈值下的低表达异常值,此为基准预测标签。
转录本异构体比例表(文件名为gtex_v8_isoform_proportions.tsv)包含以下字段:
- gene:Ensembl基因标识符
- tissue_type:GTEx组织类型
- tissue:GTEx组织来源
- transcript:Ensembl转录本标识符
- mean_transcript_proportions:GTEx v8队列中个体间的转录本比例均值
- median_transcript_proportions:GTEx v8队列中个体间的转录本比例中位数
- sd_transcript_proportions:GTEx v8队列中个体间的转录本比例标准差
预期基因表达表(文件名为gtex_v8_expected_expression.tsv)包含以下字段:
- gene:Ensembl基因标识符
- tissue_type:GTEx组织类型
- tissue:GTEx组织来源
- gene_is_expressed:该基因是否在对应组织中表达
- median_expression:个体间OUTRIDER算法估计的预期基因表达(mu)中位数
- expression_dispersion:OUTRIDER算法估计的基因离散度(theta)
创建时间:
2024-10-18



