Analysis Products: Transcription factor stoichiometry, motif affinity and syntax regulate single cell chromatin dynamics during fibroblast reprogramming to pluripotency

Mendeley Data2024-05-10 更新2024-06-29 收录

下载链接：

https://zenodo.org/records/8313962

下载链接

链接失效反馈

官方服务：

资源简介：

This record contains analysis products for the paper "Transcription factor stoichiometry, motif affinity and syntax regulate single cell chromatin dynamics during fibroblast reprogramming to pluripotency" by Nair, Ameen et al. Please refer to the READMEs in the directories, which are summarized below. The record contains the following files: `clusters.tsv`: contains the cluster id, name and colour of clusters in the paper scATAC.zip Analysis products for the single-cell ATAC-seq data. Contains: - `cells.tsv`: list of barcodes that pass QC. Columns include: - `barcode` - `sample`: (time point) - `umap1` - `umap2` - `cluster` - `dpt_pseudotime_fibr_root`: pseudotime values treating a fibroblast cell as root - `dpt_pseudotime_xOSK_root`: pseudotime values treating xOSK cell as root - `peaks.bed`: list of peaks of 500bp across all cell states. 4th column contains the peak set label. Note that ~5000 peaks are not assigned to any peak set and are marked as NA. - `features.tsv`: 50 dimensional representation of each cell - `cell_x_peak.mtx.gz`: sparse matrix of fragment counts within peaks. Load using scipy.io.mmread in python or readMM in R. Columns correspond to cells from `cells.tsv` (combine sample + barcode). Rows correspond to peaks in `peaks.bed` scATAC_clusters.zip Analysis products corresponding to cluster pseudo-bulks of the single-cell ATAC-seq data. - `clusters.tsv`: contains the cluster id, name and colour used in the paper - `peaks`: contains `overlap_reproducibilty/overlap.optimal_peak` peaks called using ENCODE bulk ATAC-seq pipeline in the narrowPeak format. - `fragments`: contains per cluster fragment files scATAC_scRNA_integration.zip Analysis products from the integration of scATAC with scRNA. Contains: - `peak_gene_links_fdr1e-4.tsv`: file with peak gene links passing FDR 1e-4. For analyses in the paper, we filter to peaks with absolute correlation >0.45. - `harmony.cca.30.feat.tsv`: 30 dimensional co-embedding for scATAC and scRNA cells obtained by CCA followed by applying Harmony over assay type. - `harmony.cca.metadata.tsv`: UMAP coordinates for scATAC and scRNA cells derived from the Harmony CCA embedding. First column contains barcode. scRNA.zip Analysis products for the single-cell RNA-seq data. Contains: - `seurat.rds`: seurat object that contains expression data (raw counts, normalized, and scaled), reductions (umap, pca), knn graphs, all associated metadata. Note that barcode suffix (1-9 corresponds to samples D0, D2, ..., D14, iPSC) - `genes.txt`: list of all genes - `cells.tsv`: list of barcodes that pass QC across samples. Contains: - `barcode_sample`: barcode with index of sample (1-9 corresponding to D0, D2, ..., D14, iPSC) - `sample`: sample name (D0, D2, .., D14, iPSC) - `umap1` - `umap2` - `nCount_RNA` - `nFeature_RNA` - `cluster` - `percent.mt`: percent of mitochondrial transcripts in cell - `percent.oskm`: percent of OSKM transcripts in cell - `gene_x_cell.mtx.gz`: sparse matrix of gene counts. Load using scipy.io.mmread in python or readMM in R. Columns correspond to cells from `cells.tsv` (barcode suffix contains sample information). Rows correspond to genes in `genes.txt` - `pca.tsv`: first 50 PC of each cell - `oskm_endo_sendai.tsv`: estimated raw counts (cts, may not be integers) and log(1+ tp10k) normalized expression (norm) for endogenous and exogenous (Sendai derived) counts of POU5F1 (OCT4), SOX2, KLF4 and MYC genes. Rows are consistent with `seurat.rds` and `cells.tsv` multiome.zip multiome/snATAC: These files are derived from the integration of nuclei from multiome (D1M and D2M), with cells from day 2 of scATAC-seq (labeled D2). - `cells.tsv`: This is the list of nuclei barcodes that pass QC from multiome AND also cell barcodes from D2 of scATAC-seq. Includes: - `barcode` - `umap1`: These are the coordinates used for the figures involving multiome in the paper. - `umap2`: ^^^ - `sample`: D1M and D2M correspond to multiome, D2 corresponds to day 2 of scATAC-seq - `cluster`: For multiome barcodes, these are labels transfered from scATAC-seq. For D2 scATAC-seq, it is the original cluster labels. - `peaks.bed`: This is the same file as scATAC/peaks.bed. List of peaks of 500bp. 4th column contains the peak set label. Note that ~5000 peaks are not assigned to any peak set and are marked as NA. - `cell_x_peak.mtx.gz`: sparse matrix of fragment counts within peaks. Load using scipy.io.mmread in python or readMM in R. Columns correspond to cells from `cells.tsv` (combine sample + barcode). Rows correspond to peaks in `peaks.bed`. - `features.no.harmony.50d.tsv`: 50 dimensional representation of each cell prior to running Harmony (to correct for batch effect between D2 scATAC and D1M,D2M snMultiome). Rows correspond to cells from `cells.tsv`. - `features.harmony.10d.tsv`: 10 dimensional representation of each cell after running Harmony. Rows correspond to cells from `cells.tsv`. multiome/snRNA: - `seurat.rds`: seurat object that contains expression data (raw counts, normalized, and scaled), reductions (umap, pca),associated metadata. Note that barcode suffix (1,2 corresponds to samples D1M, D2M). Please use the UMAP/features from snATAC/ for consistency. - `genes.txt`: list of all genes (this is different from the list in scRNA analysis) - `cells.tsv`: list of barcodes that pass QC across samples. Contains: - `barcode_sample`: barcode with index of sample (1,2 corresponding to D1M, D2M respectively) - `sample`: sample name (D1M, D2M) - `nCount_RNA` - `nFeature_RNA` - `percent.oskm`: percent of OSKM genes in cell - `gene_x_cell.mtx.gz`: sparse matrix of gene counts. Load using scipy.io.mmread in python or readMM in R. Columns correspond to cells from `cells.tsv` (barcode suffix contains sample information). Rows correspond to genes in `genes.txt`

创建时间：

2023-09-12