Additional file 1 of An analysis of proteogenomics and how and when transcriptome-informed reduction of protein databases can enhance eukaryotic proteomics

Figshare2022-06-20 更新2026-04-28 收录

下载链接：

https://figshare.com/articles/dataset/Additional_file_1_of_An_analysis_of_proteogenomics_and_how_and_when_transcriptome-informed_reduction_of_protein_databases_can_enhance_eukaryotic_proteomics/20106526

下载链接

链接失效反馈

官方服务：

资源简介：

Additional file 1: Table S1. Detailed sample description. For each sample it is reported the depths of their proteome and transcriptome datasets, the technology used to generate them, data availability on public repositories and the reference (PMID: Pubmed ID). Table S2. Comparison of PSM scores obtained for the same spectrum in the full database and in the reduced database search, using Mascot search engine. The table shows the number of reallocated spectra whose score in the reduced database search is equal, lower or higher to that in the full database. The score from searching the reduced database is never observed to be higher than the score from the full database, and in particular, not for reallocations on targets in the reduced database. Table S3. Comparison of PSM scores obtained for the same spectrum in the full database and in the reduced database search, using MS-GF+ search engine. The table shows the number of reallocated spectra whose score in the reduced database search is equal, lower otr higher to that in the full database. The score of reallocations on targets in the reduced database search is never higher than in the full database. Table S4. Score cutoffs obtained by target-decoy competition for FDR control at 1% for the full (reference Ensembl database) or reduced (transcriptome-informed reduced database) database searches. Database searches were performed using the Mascot search engine. Table S5. Score cutoffs obtained by target-decoy competition for FDR control at 1% for the full (reference Ensembl database) or reduced (transcriptome-informed reduced database) database searches. Database searches were performed using the MS-GF+ search engine. Table S6. Reallocations which can generate an additional identification in the reduced DB search. Table S7. Additional peptide identifications and corresponding protein identifications. Table S8. Number of spectra or number of spectra identifying additional peptides exclusively identified in the reduced database search due to: i. lower score cutoff at 1% FDR control in the reduced database search compared to the full database; ii. pure reallocation. The former are additional identifications from PSMs only passing the cutoff from the reduced database search and which would not be accepted based on the full database cutoff. It includes cases of identical PSMs in both searches (“no reallocation”) and cases of reallocation from decoy (“decoy_target”), target (“target_target*”) or no match (“no match_target”) in the full database search to target matches in the reduced database. Additional identifications from pure reallocation, instead, are those exclusively originated by reallocation, which would also pass the full database cutoff (i.e., independent from the lower score cutoff effect). Table S9. Number of valid targets and decoys from the full or reduced database obtained at 1% FDR using the cutoffs estimated by TDC on the respective database search results (first and last rows). The second row instead simulates the number of valid targets and decoys which would be obtained from the reduced database if the estimated cutoff were the same as for the full database. The associated nominal FDR level is reported (calculated as (d+1)/t, with d and t being the number of valid decoys and targets). Table S10. Match in the reduced database search for spectra matching valid targets or valid decoys in the full database. Table S11. Score cutoffs obtained by TDC or by BH procedure for FDR control for the full or reduced database searches at various FDR levels (0.5%, 1% and 5%). Table S12. Protein-to-gene ratio in multi-protein CCs. Table S13. Description of the pipeline for transcriptome generation and analysis. Table S14. Description of the pipeline for proteome generation and analysis.

附加文件1：表S1。详细样本说明。针对每份样本，本表格将报告其蛋白质组与转录组数据集的测序深度、生成所用实验技术、公共存储库中的数据可及性，以及参考文献（PMID：PubMed ID）。表S2：采用Mascot搜索引擎时，同一谱图在全数据库与缩减数据库搜索中获得的肽段谱图匹配（Peptide Spectrum Match，PSM）得分对比。本表格统计了重新匹配谱图的数量，这类谱图在缩减数据库搜索中的得分与全数据库搜索得分持平、更低或更高。研究未观察到缩减数据库搜索所得得分高于全数据库搜索得分的情况，尤其在缩减数据库中靶标序列的重新匹配结果中未出现此类现象。表S3：采用MS-GF+搜索引擎时，同一谱图在全数据库与缩减数据库搜索中获得的PSM得分对比。本表格统计了重新匹配谱图的数量，这类谱图在缩减数据库搜索中的得分与全数据库搜索得分持平、更低或更高。缩减数据库搜索中靶标序列的重新匹配得分从未高于全数据库搜索结果。表S4：采用Mascot搜索引擎进行数据库搜索时，针对全数据库（参考Ensembl数据库）与缩减数据库（转录组信息缩减数据库）的搜索结果，通过靶标-诱饵竞争（target-decoy competition，TDC）获得的1%错误发现率（False Discovery Rate，FDR）控制得分阈值。表S5：采用MS-GF+搜索引擎进行数据库搜索时，针对全数据库（参考Ensembl数据库）与缩减数据库（转录组信息缩减数据库）的搜索结果，通过TDC获得的1% FDR控制得分阈值。表S6：缩减数据库搜索中可产生额外鉴定结果的重新匹配事件。表S7：额外肽段鉴定结果及其对应的蛋白鉴定结果。表S8：仅在缩减数据库搜索中被鉴定的额外肽段对应的谱图数量，这类额外鉴定源于两类原因：① 相较于全数据库，缩减数据库搜索中1% FDR控制的得分阈值更低；② 纯重新匹配。前者指仅通过缩减数据库搜索得分阈值、但无法通过全数据库阈值的PSM所产生的额外鉴定结果，此类情况包含两次搜索中PSM完全一致（即"no reallocation"），以及全数据库搜索中来自诱饵序列（"decoy_target"）、靶标序列（"target_target*"）或无匹配结果（"no match_target"）的谱图，在缩减数据库搜索中重新匹配至靶标序列的情况。而纯重新匹配产生的额外鉴定结果，则指仅由重新匹配事件生成、且同时可通过全数据库得分阈值的鉴定结果（即不受得分阈值降低的影响）。表S9：采用TDC针对各自数据库搜索结果估算的阈值，在1% FDR条件下从全数据库或缩减数据库中获得的有效靶标与诱饵序列数量（首行与末行）。第二行则为模拟结果：若缩减数据库搜索采用与全数据库相同的得分阈值，可从缩减数据库中获得的有效靶标与诱饵序列数量。同时报告对应的名义FDR水平（计算方式为(d+1)/t，其中d与t分别为有效诱饵与有效靶标的数量）。表S10：全数据库搜索中匹配有效靶标或有效诱饵的谱图，在缩减数据库搜索中的匹配结果。表S11：针对全数据库与缩减数据库搜索，通过TDC或Benjamini-Hochberg程序（BH程序）在不同FDR水平（0.5%、1%与5%）下获得的得分阈值。表S12：多蛋白CCs中的蛋白-基因比值。表S13：转录组生成与分析流程说明。表S14：蛋白质组生成与分析流程说明。

创建时间：

2022-06-20