Raw count matrix
收藏NIAID Data Ecosystem2026-03-12 收录
下载链接:
https://figshare.com/articles/dataset/Raw_count_matrix/12320693
下载链接
链接失效反馈官方服务:
资源简介:
Reads from Bioproject PRJNA628886 where aligned against reference transcriptome (Bioproject PRJNA236528,
https://doi.org/10.5061/dryad.11978) with BWA
mem (http://bio-bwa.sourceforge.net/bwa.shtml).
Quantification
was performed with SAMtools1 idxstats to generate the
quantification matrix.
p { margin-bottom: 0.25cm; direction: ltr; line-height: 120%; text-align: left; orphans: 2; widows: 2 }
a:link { color: #0000ff }
The matrix was
filtered with edgeR2 and only contigs with more than 1 CPM
(Count Per Million) in at least one sample were kept, providing a
matrix of 76,550 contigs.
File PRJNA628886_raw_quantification_206K.tsv.gz is the raw count matrix of the whole transcriptome.
File 76k_ids_list.txt is the identifier list of contigs expressed at 1 CPM in our conditions.
p { margin-bottom: 0.25cm; direction: ltr; line-height: 120%; text-align: left; orphans: 2; widows: 2 }
a:link { color: #0000ff }
1SAMtools
programs (view, sort, index and idxStats, flagstat): version 1.8,
standard parameters.
Ref:
Li, H. et al.
The Sequence Alignment/Map format and SAMtools.
Bioinformatics 25, 2078–2079
(2009).
2EdgeR:
version 3.26.5.
Ref: Robinson,
M. D., McCarthy, D. J. & Smyth, G. K. edgeR: a Bioconductor
package for differential expression analysis of digital gene
expression data. Bioinformatics 26,
139–140 (2010).
Related to bioproject: https://www.ncbi.nlm.nih.gov/bioproject/PRJNA628886
本数据集的测序读段源自生物项目(BioProject)PRJNA628886,将其与参考转录组(生物项目PRJNA236528,https://doi.org/10.5061/dryad.11978)通过BWA-MEM算法(http://bio-bwa.sourceforge.net/bwa.shtml)完成比对。
定量分析采用SAMtools的idxstats工具完成,生成定量矩阵。
随后使用edgeR对该矩阵进行过滤,仅保留至少在1个样本中每百万序列计数(CPM,Count Per Million)大于1的重叠群(contigs),最终得到包含76,550个重叠群的定量矩阵。
文件PRJNA628886_raw_quantification_206K.tsv.gz为全转录组的原始计数矩阵。
文件76k_ids_list.txt为本实验条件下满足1 CPM表达阈值的重叠群标识符列表。
1. SAMtools工具套件(包含view、sort、index、idxStats、flagstat功能模块):版本1.8,采用默认参数运行。
参考文献:Li, H. 等. 《序列比对/映射格式与SAMtools》. 生物信息学(Bioinformatics) 25, 2078–2079 (2009).
2. edgeR:版本3.26.5。
参考文献:Robinson, M. D., McCarthy, D. J. & Smyth, G. K. 《edgeR:用于数字基因表达数据差异分析的Bioconductor工具包》. 生物信息学(Bioinformatics) 26, 139–140 (2010).
相关生物项目链接:https://www.ncbi.nlm.nih.gov/bioproject/PRJNA628886
创建时间:
2020-10-29



