Additional file 1 of An analysis of proteogenomics and how and when transcriptome-informed reduction of protein databases can enhance eukaryotic proteomics
收藏DataCite Commons2022-06-21 更新2024-07-29 收录
下载链接:
https://springernature.figshare.com/articles/dataset/Additional_file_1_of_An_analysis_of_proteogenomics_and_how_and_when_transcriptome-informed_reduction_of_protein_databases_can_enhance_eukaryotic_proteomics/20106526/1
下载链接
链接失效反馈官方服务:
资源简介:
Additional file 1: Table S1. Detailed sample description. For each sample it is reported the depths of their proteome and transcriptome datasets, the technology used to generate them, data availability on public repositories and the reference (PMID: Pubmed ID). Table S2. Comparison of PSM scores obtained for the same spectrum in the full database and in the reduced database search, using Mascot search engine. The table shows the number of reallocated spectra whose score in the reduced database search is equal, lower or higher to that in the full database. The score from searching the reduced database is never observed to be higher than the score from the full database, and in particular, not for reallocations on targets in the reduced database. Table S3. Comparison of PSM scores obtained for the same spectrum in the full database and in the reduced database search, using MS-GF+ search engine. The table shows the number of reallocated spectra whose score in the reduced database search is equal, lower otr higher to that in the full database. The score of reallocations on targets in the reduced database search is never higher than in the full database. Table S4. Score cutoffs obtained by target-decoy competition for FDR control at 1% for the full (reference Ensembl database) or reduced (transcriptome-informed reduced database) database searches. Database searches were performed using the Mascot search engine. Table S5. Score cutoffs obtained by target-decoy competition for FDR control at 1% for the full (reference Ensembl database) or reduced (transcriptome-informed reduced database) database searches. Database searches were performed using the MS-GF+ search engine. Table S6. Reallocations which can generate an additional identification in the reduced DB search. Table S7. Additional peptide identifications and corresponding protein identifications. Table S8. Number of spectra or number of spectra identifying additional peptides exclusively identified in the reduced database search due to: i. lower score cutoff at 1% FDR control in the reduced database search compared to the full database; ii. pure reallocation. The former are additional identifications from PSMs only passing the cutoff from the reduced database search and which would not be accepted based on the full database cutoff. It includes cases of identical PSMs in both searches (“no reallocation”) and cases of reallocation from decoy (“decoy_target”), target (“target_target*”) or no match (“no match_target”) in the full database search to target matches in the reduced database. Additional identifications from pure reallocation, instead, are those exclusively originated by reallocation, which would also pass the full database cutoff (i.e., independent from the lower score cutoff effect). Table S9. Number of valid targets and decoys from the full or reduced database obtained at 1% FDR using the cutoffs estimated by TDC on the respective database search results (first and last rows). The second row instead simulates the number of valid targets and decoys which would be obtained from the reduced database if the estimated cutoff were the same as for the full database. The associated nominal FDR level is reported (calculated as (d+1)/t, with d and t being the number of valid decoys and targets). Table S10. Match in the reduced database search for spectra matching valid targets or valid decoys in the full database. Table S11. Score cutoffs obtained by TDC or by BH procedure for FDR control for the full or reduced database searches at various FDR levels (0.5%, 1% and 5%). Table S12. Protein-to-gene ratio in multi-protein CCs. Table S13. Description of the pipeline for transcriptome generation and analysis. Table S14. Description of the pipeline for proteome generation and analysis.
附加文件1:表S1。详细样本描述。针对每个样本,报告了其蛋白质组与转录组数据集的测序深度、生成所用的技术、公共存储库中的数据可获取性,以及参考文献(PMID:PubMed编号)。
表S2。使用Mascot搜索引擎时,全数据库与简化数据库搜索中同一谱图的肽谱匹配(Peptide Spectrum Match,PSM)得分对比。该表展示了得分在简化数据库搜索中等于、低于或高于全数据库搜索的重分配谱图数量。本研究未观察到简化数据库搜索的得分高于全数据库搜索的情况,尤其在简化数据库中针对靶标的重分配结果中未出现该现象。
表S3。使用MS-GF+搜索引擎时,全数据库与简化数据库搜索中同一谱图的肽谱匹配(PSM)得分对比。该表展示了得分在简化数据库搜索中等于、低于或高于全数据库搜索的重分配谱图数量。针对简化数据库搜索中靶标的重分配结果,其得分从未高于全数据库搜索的对应得分。
表S4。使用Mascot搜索引擎进行数据库搜索时,全数据库(参考Ensembl数据库)或简化数据库(基于转录组信息的简化数据库)搜索中,用于1%错误发现率(False Discovery Rate,FDR)控制的靶标-诱饵竞争(Target-Decoy Competition,TDC)所得得分阈值。
表S5。使用MS-GF+搜索引擎进行数据库搜索时,全数据库(参考Ensembl数据库)或简化数据库(基于转录组信息的简化数据库)搜索中,用于1% FDR控制的TDC所得得分阈值。
表S6。可在简化数据库搜索中产生额外肽段鉴定结果的重分配情况。
表S7。额外肽段鉴定结果及其对应的蛋白质鉴定结果。
表S8。仅在简化数据库搜索中被鉴定到额外肽段的谱图数量,此类额外鉴定源于两类情况:i. 简化数据库搜索在1% FDR控制下的得分阈值低于全数据库搜索;ii. 纯重分配。前者指仅通过简化数据库搜索的得分阈值、且无法通过全数据库得分阈值被接受的肽谱匹配(PSM)所产生的额外鉴定结果,包含两类情况:两种搜索中肽谱匹配完全一致的情况("无重分配"),以及全数据库搜索中从诱饵(标注为"decoy_target")、靶标(标注为"target_target*")或无匹配结果(标注为"no match_target")重分配至简化数据库搜索中靶标匹配的情况。而纯重分配产生的额外鉴定结果,则是指仅由重分配所产生、且同样可通过全数据库得分阈值的鉴定结果(即不受低得分阈值效应影响)。
表S9。使用各自数据库搜索结果所得的TDC得分阈值,在1% FDR下从全数据库或简化数据库中得到的有效靶标与诱饵数量(首行与末行)。第二行则模拟了:若简化数据库采用与全数据库相同的得分阈值时,可从简化数据库中得到的有效靶标与诱饵数量。同时报告了关联的名义FDR水平(计算方式为(d+1)/t,其中d与t分别为有效诱饵与有效靶标的数量)。
表S10。全数据库搜索中匹配有效靶标或有效诱饵的谱图,在简化数据库搜索中的匹配结果。
表S11。针对全数据库或简化数据库搜索,在不同FDR水平(0.5%、1%与5%)下,通过TDC或本杰明-霍赫贝格(Benjamini-Hochberg,BH)法进行FDR控制所得的得分阈值。
表S12。多蛋白共表达聚类(Coexpression Clusters,CCs)中的蛋白质-基因比值。
表S13。转录组生成与分析流程的描述。
表S14。蛋白质组生成与分析流程的描述。
提供机构:
figshare
创建时间:
2022-06-21



