Poplar_Isoform Expression_matrix.zip
收藏DataCite Commons2020-08-25 更新2024-07-28 收录
下载链接:
https://figshare.com/articles/Poplar_Isoform_Expression_matrix_zip/12091530/1
下载链接
链接失效反馈官方服务:
资源简介:
A matrix of isoform expression values(FPKM) for each replicate samples for an unstructured population of 268 Populus deltoides. Isoforms were discovered as follows:Three transcript assembly platforms were used in order to maximize isoform detection: (i) Cufflinks version 2.2 with parameters “--library-type fr-firststrand –u -F 0.05 --max-intron-length 12000 --no-faux-reads -g”; (ii) StringTie version 1.3.3 with parameters “-f 0.05 -j 2 –rf” , and (iii) Trinity version 2.3.2 in genome guided mode with parameters “--genome_guided_bam --genome_guided_max_intron 12000 --full_cleanup --SS_lib_type RF --min_contig_length 50”. The collection of Cufflinks and Stringtie isoforms detected for each sample were merged with Stringtie merge using parameters “-F 1 -f 0.05”. PASA version 2.0.2-r20151207 was used to reconcile this merged assembly and the assembly from Trinity using parameters “-C –R -t --cufflinks_gtf -I 12000 --ALT_SPLICE --ALIGNER gmap,blat”. Additionally, the assemblies generated by PASA were filtered by requiring that (i) all splice junctions be supported by at least 2 reads, and (ii) retained introns be supported by a median read coverage of at least 2 (Python scripts stored in github.com/jdLikesPlants/poplar_AS). Requiring a minimum read support for retained introns minimizes the possibility of incorrect identification of intron retention events from the sequencing of pre-mRNA. Finally, the filtered PASA assemblies for each sample were merged with Cuffmerge (Cufflinks version 2.2.1) to generate a master transcriptome that represents all of the potential AS events and transcript isoforms for the population. The resulting assembly was then reformatted and annotated using gffcompare version 0.9.9c (https://github.com/gpertea/gffread). This transcriptome was subjected to a secondary expression-based filtering pipeline to remove artifacts generated during the merge. Cufflinks version 2.2 was used in quantification mode (parameters: --library-type fr-firststrand -G -u -F 0.05 --max-intron-length 12000) to measure the expression of the transcripts in the merged assembly in each sample. To minimize the presence of incorrectly assembled transcripts in the merged transcriptome assembly, each transcript was required to be expressed above FPKM (fragments per kilobase of exon model per million reads mapped) 3 in at least two of three biological replicates of a given individual, and in at least 3 individuals in the population. This final merged and filtered transcriptome was used in all downstream analyses. An overview of this computational pipeline is depicted in Supplementary Figure 1. Additionally, any individual that did not have at least 15 million reads generated during sequencing in at least two of three replicates, as well as individuals for which only one replicate was sequenced were removed from analysis, resulting in a final set of 268 individuals.
提供机构:
figshare
创建时间:
2020-04-07



