five

Aberrant gene expression prediction benchmark based on GTEx v8

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/8427311
下载链接
链接失效反馈
官方服务:
资源简介:
This repository contains the aberrant gene expression prediction benchmark data as well as the necessary expected gene expression across tissues and tissue-specific isoform contribution scores for AbExp prediction.  The aberrant gene expression prediction benchmark data (aberrant_expression_prediction_benchmark.parquet) contains the following columns: individual: GTEx individual gene: Ensembl gene identifier tissue: GTEx tissue tissue_type: GTEx tissue type mu: OUTRIDER-estimated expected gene expression theta: OUTRIDER-estimated gene dispersion counts: Raw gene expression count normalized_counts: OUTRIDER-normalized gene expression count l2fc: log2 fold change between observed and expected gene expression count zscore: z-score of gene expression, obtained by quantile-mapping the OUTRIDER-estimated distribution to the standard normal distribution nominal_pvalue: OUTRIDER-estimated p-value of being an expression outlier FDR: FDR-adjusted p-value of being an expression outlier is_in_benchmark: Whether this observation is part of the aberrant gene expression prediction benchmark is_underexpressed_outlier: Whether this observation is an underexpression outlier at FDR < 5%. This is the benchmark prediction label. The isoform proportions table (gtex_v8_isoform_proportions.tsv) contains the following columns: gene: Ensembl gene identifier tissue_type: GTEx tissue type tissue: GTEx tissue transcript: Ensembl transcript identifier mean_transcript_proportions: mean transcript proportions across individuals in GTEx v8 median_transcript_proportions: median transcript proportions across individuals in GTEx v8 sd_transcript_proportions: standard deviation of transcript proportions across individuals in GTEx v8 The expected gene expression table (gtex_v8_expected_expression.tsv) contains the following columns: gene: Ensembl gene identifier tissue_type: GTEx tissue type tissue: GTEx tissue gene_is_expressed: Whether the gene is expressed in the tissue median_expression: median OUTRIDER-estimated expected gene expression (mu) across individuals expression_dispersion: OUTRIDER-estimated gene dispersion (theta)

本仓库收录了异常基因表达预测基准数据集,以及用于AbExp(异常基因表达)预测所需的跨组织预期基因表达数据与组织特异性转录本异构体贡献得分。 异常基因表达预测基准数据集(文件名为aberrant_expression_prediction_benchmark.parquet)包含以下字段: - individual:GTEx(Genotype-Tissue Expression)供体个体 - gene:Ensembl基因标识符 - tissue:GTEx组织来源 - tissue_type:GTEx组织类型 - mu:OUTRIDER算法估计的预期基因表达量 - theta:OUTRIDER算法估计的基因离散度 - counts:原始基因表达计数 - normalized_counts:经OUTRIDER标准化后的基因表达计数 - l2fc:观测基因表达计数与预期基因表达计数的log2倍数变化 - zscore:基因表达的z得分,通过将OUTRIDER估计的表达分布分位数映射至标准正态分布计算得到 - nominal_pvalue:OUTRIDER算法估计的表达异常候选p值 - FDR:表达异常候选的假发现率(False Discovery Rate)校正p值 - is_in_benchmark:该观测值是否属于异常基因表达预测基准集 - is_underexpressed_outlier:该观测值是否为FDR<5%阈值下的低表达异常值,此为基准预测标签。 转录本异构体比例表(文件名为gtex_v8_isoform_proportions.tsv)包含以下字段: - gene:Ensembl基因标识符 - tissue_type:GTEx组织类型 - tissue:GTEx组织来源 - transcript:Ensembl转录本标识符 - mean_transcript_proportions:GTEx v8队列中个体间的转录本比例均值 - median_transcript_proportions:GTEx v8队列中个体间的转录本比例中位数 - sd_transcript_proportions:GTEx v8队列中个体间的转录本比例标准差 预期基因表达表(文件名为gtex_v8_expected_expression.tsv)包含以下字段: - gene:Ensembl基因标识符 - tissue_type:GTEx组织类型 - tissue:GTEx组织来源 - gene_is_expressed:该基因是否在对应组织中表达 - median_expression:个体间OUTRIDER算法估计的预期基因表达(mu)中位数 - expression_dispersion:OUTRIDER算法估计的基因离散度(theta)
创建时间:
2024-10-18
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作