Additional Data: Poised PABP-RNA hubs implement signal-dependent mRNA decay in development
收藏NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/10054231
下载链接
链接失效反馈官方服务:
资源简介:
This repository contains processed data resulting from iCLIP experiments that were analysed in the following paper:"Poised PABP-RNA hubs implement signal-dependent mRNA decay in development"The paper is published at Nature Structural and Molecular BIology.
Archived data
Data archived in this repository include:
Data derived from iCLIP experiments targeting LIN28A, PABPC1, and PABPC4, that were analysed in the manuscript (see iCLIP.zip). Raw data is available from ENA, with the accession code PRJEB60519.
Sample descriptions are given in iCLIP-SampleAnnotation.csv
Crosslink files in BED6 format (individual replicates and merged replicates)
Peak files generated with the Clippy peak caller in BED6 format
K-mer enrichment around high-confidence crosslink sites in the 3'-UTRs, calculated by the PEKA software
Expression values (salmon quantfiles) for 3'-seq experiments, specified in "QuantseqExperimentsAnnotation.tsv", are available in "SalmonQuantfiles.zip". Raw data is available from ENA, with the accession code PRJEB60519.
Source code of the nextflow pipeline, which was used on the iMaps webserver to analyse iCLIP data and produce the files archived here (see imaps-nf-0.30.zip).
A list of naive genes, that were analysed in the manuscript (see NaiveGeneIds.csv).
Details on iCLIP data generation
iCLIP data for LIN28A-WT (in 2iL and FGF2 treated cells), LIN28A-S200A (in FGF2 treated cells) as well as for PABPC1 and PABPC4 (in LIN28A KO cells with and without LIN28A overexpression), were analysed on iMaps Goodwright server (https://imaps.goodwright.com/). The LIN28A iCLIPs were analysed on 18th of July, 2022; the PABPC iCLIPs were analysed on 26th of December, 2022. The code and settings used in the pipeline (release v0.30) can be viewed at https://github.com/goodwright/imaps-nf , and is also archived here - (imaps-nf-0.30.zip)
First, reads were demultiplexed using Ultraplex and barcodes were trimmed from the reads. The default Ultraplex settings were applied, as denoted below:
adapter='AGATCGGAAGAGCGGTTCAG'adapter2='AGATCGGAAGAGCGTCGTG'barcodes='barcode.csv',final_min_length=20fiveprimemismatches=1ignore_no_match=Falseignore_space_warning=Falseinputfastq='MOD4878A1-merged.fastq.gz',keep_barcode=False,min_trim=3,outputprefix='demux',phredquality=30,phredquality_5_prime=0,sbatchcompression=False,threads=10,threeprimemismatches=0,ultra=False
TrimGalore was used to run FASTQC and quality trim the reads and remove reads with length less than 10 nt:
trim_galore --fastqc --length 10 -q 20 --cores 8 --gzip file.fastq.gz
Reads were then premapped to rRNA, tRNA sequences referred to as small RNA, smRNA, using mouse genome build (GRCm39 GENCODE M28 annotation) with Bowtie v1.3.0 (Langmead et al., 2009)
bowtie --threads 12 --sam -x $INDEX -q --un file.unmapped.fastq -v 2 -m 100 --norc --best --strata file.fq.gz 2
Reads that did not map with Bowtie were then aligned with STAR v2.7.9a (Dobin et al., 2013) to mouse genome build (GRCm39 GENCODE M28 annotation).
STAR \--genomeDir star \--readFilesIn file.unmapped.fastq.gz \--runThreadN 12 \--outFileNamePrefix 1_R1. \\--sjdbGTFfile Homo_sapiens_filtered.gtf \--outSAMattrRGline 'ID:1_R1' 'SM:1_R1' \ --readFilesCommand zcat --outSAMtype BAM SortedByCoordinate --quantMode TranscriptomeSAM --outFilterMultimapNmax 1 --outFilterMultimapScoreRange 1 --outSAMattributes All --alignSJoverhangMin 8 --alignSJDBoverhangMin 1 --outFilterType BySJout --alignIntronMin 20 --alignIntronMax 1000000 --outFilterScoreMin 10 --alignEndsType Extend5pOfRead1 --twopassMode Basic
PCR-duplicates were removed using UMI-tools (Smith, Heger and Sudbery, 2017)
java -jar /UMICollapse/umicollapse.jar \ bam \ -i file.Aligned.sortedByCoord.out.bam \ -o file.dedup.bam \ --umi-sep rbc:
The nucleotide preceding each sequencing read was assigned as the crosslink event.
Peaks of crosslinking signal were identified with Clippy v1.4.1, using the default settings.
Obtained peaks and crosslink sites were used to run PEKA v1.0.0 (Kuret et al., 2022), using the default settings.
For Clippy and PEKA, the GENCODE primary assembly annotation M28 was filtered to retain only entries with transcript support level 1 or 2, in genes where such transcripts were available, and used to produce a segmentation file with the get_segments function from the iCount tool (Curk, 2019).
All files generated during data processing are available from the iMaps Goodwright webserver for analysis of CLIP data (see https://imaps.goodwright.com/collections/882635250203/ and https://imaps.goodwright.com/collections/340215254997/ for LIN28A and PABPC1/4 iCLIPs, respectively).
Source data
Raw sequencing reads, from which the data enclosed here were derived, are accessible at ENA (PRJEB60519).The raw sequencing reads and all data produced by the analysis pipeline is also available at the iMaps webserver (see https://imaps.goodwright.com/collections/882635250203/ and https://imaps.goodwright.com/collections/340215254997/ for LIN28A and PABPC1/4 iCLIPs, respectively); and on the updated Flow webserver (see https://app.flow.bio/projects/882635250203/ and https://app.flow.bio/projects/340215254997/ for LIN28A and PABPC1/4 iCLIPs, respectively).
Downstream computational analysis of enclosed data
The code, used to analyse the data enclosed here and train the CNN to predict transcript stability in naive-to-primed transition based on 3'UTR nucleotide sequence, is available at GitHub (https://github.com/ulelab/LIN28A_RNPreassembly_bioinformatics) and archived on Zenodo (https://zenodo.org/doi/10.5281/zenodo.10054297).
创建时间:
2024-03-29



