five

Protein Identification by Nanopore Peptide Profiling

收藏
NIAID Data Ecosystem2026-03-12 收录
下载链接:
https://zenodo.org/record/5205564
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset belongs to “Protein Identification by Nanopore Peptide Profiling” and describes the raw data and analysis of tryptic digested peptides translocating through a mutant Fragaceatoxin C nanopore. A jupyter notebook describing the analysis and structure is added to this dataset.   Data description: Protein Identification by Nanopore Peptide Profiling.ipynb                Jupyter notebook contained data analysis of data contained in data_0.zip and data_1.zip (Python 3.7) python_scripts.zip                Supplementary  scripts belonging to “Protein Identification by Nanopore Peptide Profiling.ipynb”. See explanation of custom classes in the jupyter notebook. data_0.zip - Folder containing raw electrophysiology data and result after analysis with “Protein Identification by Nanopore Peptide Profiling.ipynb”, with each folder containing the following:                Alpha casein:                                    Tryptic digest of alpha casein                Beta casein:                                      Tryptic digest of beta casein                BSA:                                                  Tryptic digest of bovine serum albumin                Control:                                              Tryptic digest of water (no protein, control measurement)                Cytochrome c:                                   Tryptic digest of cytochrome c                DHFR_His6:                                      Tryptic digest of dihydropholate reductase (His6 tagged)                EFP:                                                  Tryptic digest of elongation factor P                HMW1Act:                                         Tryptic digest of high molecular weight adhesin protein data_1.zip - Folder containing raw electrophysiology data, comma-separated MS peptide masses, and result after analysis with “Protein Identification by Nanopore Peptide Profiling.ipynb”, with each folder containing the following:                Lysozyme:                                         Tryptic digest of lysozyme                              PAN:                                                   Tryptic digest of proteasome-activating nucleotidase                TbpA_Y27A:                                       Tryptic digest of periplasmic binding protein                Trypsin:                                               Tryptic digest of bovine trypsin                Mass_spec:                                        csv files containing measured ESI-MS peptides                Lysozyme synthetic peptides:            Synthetic peptides:                                                                               Lys1:       TPGSR                                                                               Lys2alk:  C(+57.02)ELAAAMK                                                                               Lys3:       HGLDNYR                                                                               Lys4alk:  WWC(+57.02)NDGR                                                                               Lys5:       GTDVQAWIR                                                                               Lys6alk:  GYSLGNWVC(+57.02)AAK                                                                               Lys7:       FESNFNTQATNR The structure of the data files is registered data_1.zip in 'index.csv' (digested proteins) and  'index_peptides.csv' (synthetic peptides) contained in the data folder. In this file, we describe the protein that was measured as well as the folder location and the expected baseline / standard deviation. Structure of ./data/index.csv Protein (string) | Folder (string) | Baseline (pA) (float) | Baseline Error (pA) (float)   In each Folder, there is another 'index.csv', explaining which files are with protein and which are without (blank). Structure of ./data/[protein]/[repeat]/index.csv blank (boolean) | fname (string)   Each folder in data_0.zip and data_1.zip contains a folder for each measure protein, which contains a folder for each repeat. The repeats contain raw axon binary files (.abf), each file contains measurement conditions as follows:     [Date of measurement]_[Pore type]_[Buffer conditions]_[added analyte(s)]_[operator initials]     e.g: 20200312_1M_KCl_50mM_Citricacid_50mM_BTP_pH_38_FraC_G13F_neg70mV_20ul_CytC_TrypsinGold_FL_0000     Measured on 12-03-2020, in 1M KCl buffered with Citricacid (50 mM) adjusted using bis-tris-propane to pH 3.8, using Fragaceatoxin C mutant G13F at a negatively applied potential of 70 mV. 20 µL cytochrome c was added to the cis compartment. The total volume of the container used for all electrophysiology experiments was 400 µL, all samples were prepared at a 1 g/L concentration. A prefix “perf” before analyte description indicates that the chamber was flushed with approximately 2 mL fresh buffer prior to analysis. The buffer condition "BTP" means bis-tris-propane, which is used to titrate to the exact pH of 3.8. Each analysed folder contains results.pkl file, containing the analysis result as provided by “Protein Identification by Nanopore Peptide Profiling.ipynb” - see the jupyter notebook Each analysed folder contains results_analysis.xlsx, which contains sheets with excluded currents, standard deviations, dwell time and beta value for the pore without analyte added “Blank” and results from the analyte added in “Results”. Parameters used for fitting are contained in “Parameters”. The “Histograms” tab shows the raw data of the excluded current spectra.   mass_spec_peaks.zip – Folder containing mass spectrometry files as analysed by PEAKS Studio The folder contains an subfolder for each protein measured using electrospray ionisation mass spectrometry (ESI-MS).                acasein:                alpha casein protein                b_casein:              beta casein protein                BSA:                      bovine serum albumin                CytC:                     cytochrome C digested                DHFR:                   dihydropholate reductase                HMW1_Act:           high molecular weight adhesin protein                PAN:                      proteasome-activating nucleotidase                ThBP:                    periplasmic thiamine binding protein                Trypsin:                 bovine trypsin

本数据集隶属于"基于纳米孔肽谱的蛋白质鉴定"(Protein Identification by Nanopore Peptide Profiling),收录了经突变型弗拉氏腔道毒素C纳米孔(mutant Fragaceatoxin C nanopore)易位的胰蛋白酶酶解肽段的原始数据与分析结果。本数据集附带一份Jupyter Notebook,用于说明数据分析流程与数据结构。 数据说明: Protein Identification by Nanopore Peptide Profiling.ipynb:本Jupyter Notebook用于对data_0.zip与data_1.zip中的数据开展数据分析(基于Python 3.7环境)。 python_scripts.zip:为"Protein Identification by Nanopore Peptide Profiling.ipynb"的配套脚本。自定义类的详细说明请参阅该Jupyter Notebook。 data_0.zip:该压缩包包含电生理学原始数据以及经"Protein Identification by Nanopore Peptide Profiling.ipynb"分析得到的结果,其内部每个子文件夹对应如下样本: α-酪蛋白(Alpha casein):胰蛋白酶酶解的α-酪蛋白 β-酪蛋白(Beta casein):胰蛋白酶酶解的β-酪蛋白 牛血清白蛋白(BSA):胰蛋白酶酶解的牛血清白蛋白 对照组(Control):去蛋白水溶液的胰蛋白酶酶解(空白对照实验) 细胞色素c(Cytochrome c):胰蛋白酶酶解的细胞色素c DHFR_His6:带有His6标签的二氢叶酸还原酶(dihydropholate reductase)的胰蛋白酶酶解产物 延伸因子P(EFP):胰蛋白酶酶解的延伸因子P 高分子量黏附素蛋白(HMW1Act):胰蛋白酶酶解的高分子量黏附素蛋白 data_1.zip:该压缩包包含电生理学原始数据、逗号分隔的质谱肽段质量数据,以及经"Protein Identification by Nanopore Peptide Profiling.ipynb"分析得到的结果,其内部每个子文件夹对应如下样本: 溶菌酶(Lysozyme):胰蛋白酶酶解的溶菌酶 PAN:蛋白酶体激活核苷酶(proteasome-activating nucleotidase)的胰蛋白酶酶解产物 TbpA_Y27A:周质结合蛋白的胰蛋白酶酶解产物 牛胰蛋白酶(Trypsin):胰蛋白酶酶解的牛胰蛋白酶 Mass_spec:包含实测电喷雾电离质谱(Electrospray Ionization Mass Spectrometry, ESI-MS)肽段的csv文件 溶菌酶合成肽段(Lysozyme synthetic peptides):合成肽段,具体如下: Lys1:TPGSR Lys2alk:C(+57.02)ELAAAMK Lys3:HGLDNYR Lys4alk:WWC(+57.02)NDGR Lys5:GTDVQAWIR Lys6alk:GYSLGNWVC(+57.02)AAK Lys7:FESNFNTQATNR data_1.zip的文件结构由其数据文件夹内的`index.csv`(酶解蛋白质样本)与`index_peptides.csv`(合成肽段样本)进行定义。上述文件中记录了被测蛋白质信息、对应文件夹路径,以及预期基线值与标准差。 ./data/index.csv 文件格式: 蛋白质(字符串) | 文件夹路径(字符串) | 基线值(皮安,pA,浮点型) | 基线误差(皮安,pA,浮点型) 每个样本子文件夹内均包含一个`index.csv`文件,用于说明该文件夹内哪些文件包含蛋白质样本,哪些为空白对照样本。 ./data/[protein]/[repeat]/index.csv 文件格式: 空白对照标识(布尔值) | 文件名(字符串) data_0.zip与data_1.zip内的每个样本子文件夹均按被测蛋白质分类,每个蛋白质分类下再按重复实验分为若干子文件夹。每个重复实验文件夹内包含轴突二进制原始文件(.abf,axon binary file),每个文件的命名规则如下: [测量日期]_[纳米孔类型]_[缓冲液条件]_[添加的分析物]_[实验人员姓名首字母] 示例:20200312_1M_KCl_50mM_Citricacid_50mM_BTP_pH_38_FraC_G13F_neg70mV_20ul_CytC_TrypsinGold_FL_0000 该示例对应的实验信息为:2020年3月12日进行测量,缓冲液为含50mM柠檬酸的1M KCl溶液,使用双(三甲基丙烷)胺(bis-tris-propane,BTP)调节pH至3.8;采用突变型G13F弗拉氏腔道毒素C纳米孔,施加-70mV负电位;向cis室添加20μL细胞色素c。 所有电生理学实验所用容器的总体积为400μL,所有样本均按1g/L的浓度配制。若分析物名称前带有前缀“perf”,则表示实验前已用约2mL新鲜缓冲液冲洗样品室。其中缓冲液标识“BTP”指双(三甲基丙烷)胺(bis-tris-propane),用于将溶液pH精准调节至3.8。 每个完成分析的样本文件夹内均包含`results.pkl`文件,存储由"Protein Identification by Nanopore Peptide Profiling.ipynb"生成的分析结果,详细说明请参阅该Jupyter Notebook。 每个完成分析的样本文件夹内还包含`results_analysis.xlsx`文件,该文件包含多个工作表:"Blank"工作表记录未添加分析物时的纳米孔阻断电流、标准差、停留时间与β值;"Results"工作表记录添加分析物后的实验结果;"Parameters"工作表记录拟合所用参数;"Histograms"工作表展示阻断电流谱的原始数据。 mass_spec_peaks.zip:该压缩包包含经PEAKS Studio分析的质谱文件。 其内部按电喷雾电离质谱(Electrospray Ionization Mass Spectrometry, ESI-MS)检测的蛋白质分为若干子文件夹: acasein:α-酪蛋白 b_casein:β-酪蛋白 BSA:牛血清白蛋白 CytC:酶解后的细胞色素c DHFR:二氢叶酸还原酶 HMW1_Act:高分子量黏附素蛋白 PAN:蛋白酶体激活核苷酶 ThBP:周质硫胺素结合蛋白 Trypsin:牛胰蛋白酶
创建时间:
2021-08-18
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作