five

Intrinsically linked lineage specificity of transposable elements, lncRNA genes, and transcriptional regulation

收藏
DataCite Commons2026-05-04 更新2026-05-07 收录
下载链接:
https://zenodo.org/doi/10.5281/zenodo.19981090
下载链接
链接失效反馈
官方服务:
资源简介:
Overview This repository contains code and data associated with the following study: Lin J, Wu Y, Zeng J, Xiong W, He S, Pontarotti P, Zhu H. Intrinsically linked lineage specificity of transposable elements, lncRNA genes, and transcriptional regulation. The study systematically investigates the relationship between lineage-specific (LS) transposable elements (TEs), lncRNAs, and transcriptional regulation in humans and mice, with applications to spermatogenesis, Alzheimer's disease, and cross-species expression divergence. Repository Structure .├── eGRAMCode/├── GTExAnalysisCode/├── AD_analysis/├── classifiedcells/├── datafiles/├── spermatogenesis-results/└── HEGHVG/ Contents 1. eGRAMCode/ Source code for eGRAM version 3, a computational tool for identifying transcriptional regulatory modules from single-cell RNA-seq (scRNA-seq) data. eGRAMv3 integrates lncRNA–DNA binding site (DBS) data and gene–gene expression correlation based on the Maximal Information Coefficient (MIC), with adaptive significance thresholding via Gaussian Mixture Model fitting. It supports optional transcription factor (TF) DBS input and performs KEGG and WikiPathways enrichment analysis for each identified module. 2. GTExAnalysisCode/ Scripts for analyzing the impacts of simian/rodent TE-derived exons and HS/MS lncRNAs on gene expression and molecular signaling across tissues. Analyses include: - Computing median TPM expression of simian-specific and rodent-specific exons from GTEx (human) and MACA (mouse) bulk RNA-seq wiggle files.- Normalizing exon TPM values to z-scores across tissues.- Computing per-tissue Spearman correlations between HS lncRNAs and their predicted target genes (human GTEx data) and between MS lncRNAs and their predicted target genes (mouse MACA data).- Quantifying cross-species transcriptional divergence (|Δz-score|) of KEGG pathway genes across 10 matched human–mouse tissue pairs, with pancreas as the background reference, and generating the ridge plot figure. 3. AD_analysis/ Code and data for the Alzheimer's disease (AD) cross-species regulatory analysis. This folder contains the scripts and associated input/intermediate data files used to: - Apply eGRAM to RNA-seq data from human AD patients and a mouse AD model expressing humanized amyloid-β.- Identify species-specific regulatory modules among AD-related KEGG pathway genes using HS and MS lncRNA DBS binding and expression correlation.- Compute and compare species-specific co-expression within and across modules. 4. classifiedcells/ Spermatogenesis single-cell RNA-seq data extracted from Murat et al. (2023), covering five cell types in humans and mice: Sertoli cells (ST), spermatogonia (SG), spermatocytes (SC), round spermatids (RS), and elongated spermatids (ES). Cells were filtered using interquartile range (IQR) criteria; gene expression is log-normalized. These files serve as the expression input for eGRAM v3 in the spermatogenesis analysis. Source: Murat F, et al. Reconstruction of ancestral chromosomes and their evolution through mammalian spermatogenesis. Nature 617, 632–638 (2023). 5. datafiles/ The 10 gene expression input files for eGRAM v3, one per cell type per species (5 cell types × 2 species: human and mouse). Each file is a CSV matrix of log-normalized scRNA-seq expression values with genes as rows (annotated by gene symbol and gene type: lncRNA, TF, or marker) and cells as columns. These files are directly used with eGRAMCode/ to reproduce the spermatogenesis regulatory module analysis. 6. spermatogenesis-results/ Result files generated by eGRAM v3 from the spermatogenesis scRNA-seq analysis. For each of the 10 cell type datasets (5 cell types × 2 species), this folder contains: - main.csv: The main module table listing regulators, regulator sets, target genes, and enriched KEGG/WikiPathways terms for each identified module.- moduleEdge: Cytoscape-compatible edge list for network visualization.- sig_lnc_lnc_corr_OR.csv / sig_lnc_lnc_corr_AND.csv: lncRNA–lncRNA significant correlation matrices (MIC-or-TIC and MIC-and-TIC).- sig_lnc_target_corr_OR.csv / sig_lnc_target_corr_AND.csv: lncRNA–target significant correlation matrices.- Additional intermediate files documenting regulator sets, target sets, and module structures before and after redundancy removal. 7. HEGHVG/ Data and scripts related to the identification of highly expressed genes (HEGs) and highly variable genes (HVGs) used to define the 395 conserved spermatogenesis markers. This folder contains: - HEG and HVG gene lists for human and mouse spermatogenic cell types.- Scripts for extracting the subset of HVGs overlapping with conserved markers and lineage-specific lncRNAs, and for generating multi-panel expression dynamics plots (mean expression across the five cell types, one panel per gene).
提供机构:
Zenodo
创建时间:
2026-05-02
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作