five

Code and Data for "Real-time dynamic single-molecule protein sequencing on an integrated semiconductor device"

收藏
NIAID Data Ecosystem2026-03-14 收录
下载链接:
https://zenodo.org/record/7017749
下载链接
链接失效反馈
官方服务:
资源简介:
Code and Data for "Real-time dynamic single-molecule protein sequencing on an integrated semiconductor device". Code to analyze data produced by the Quantum-Si benchtop device and semiconductor chip is provided in a Python library qsi_algo under several submodules: - rs_caller.py: Algorithm for calling RS segments (also called ROI segments throughout code). - rs_caller_controller.py: Code framework for executing RS calling and property computation in a distributed manner - rs_properties: Code for computing properties of identified RS - rs_classifier: Algorithms for identifying peptide states (i.e. residue calls) associated with an RS - utils.py: shared helper code - pulse_reader: reader for binary pulse file - filters: ROI and pulse filtering utilities - plotting: functions for visualization of data relevant to the analyses presented Jupyter notebooks (.ipynb) files are named according to the manuscript figure they are associated with. Analysis code inside uses provided RS (recognition segment) data to demonstrate filtering and residue-calling techniques required to replicate analyses shown in manuscript figures. Please note: several methods rely on randomization for model initialization and/or data sampling which can cause small deviations from equivalent analyses in published figures. The raw data produced from the Quantum-Si benchtop device and semiconductor chip for the assays presented in the accompanying study is presented in a pulse-called binary file format. Pulses can be used as input for RS identification and peptide state identification. Pre-segmented (RS-identified) files are included for convenience. The data contained in the files include: {run_id}.bin: Binary format for storing pulse info. The reader provided in qsi_algo.pulse_reader produces the following columns: - aperture_index: unique aperture index on chip - start_f: index of first frame in pulse, counted from the beginning of the run - end_f: index of last frame in pulse, counted from the beginning of the run - dur_f: duration of pulse in frames - dur_s: duration of pulse in seconds - ipd_f: interpulse duration in frames (number of frames since end of preceding pulse) - ipd_s: interpulse duration in seconds (time in seconds elapsed since end of preceding pulse) - snr: signal-to-noise ratio (bin1_intensity / bin1_bg_std) - intensity: intensity of pulse (counts above baseline in bin1) - bin0_intensity: counts above baseline in bin0 - intensity_display: bin1_intensity + bin1_bg_mean - binratio: bin0_intensity / bin1_intensity - bg_mean: bin1 background mean in region of pulse - bg_std: bin1 background standard deviation in region pulse - bin0_bg_mean: bin0 background mean in region of pulse - bin0_bg_std: bin0 background standard deviation in region pulse {run_id}.csv.gz: Compressed comma-separated value file containing RS/ROI properties computed from raw pulses.bin file by included RS caller (example in rs_caller.py). - ap: unique aperture index on chip - ROI: ordinal ROI number in the aperture, 0-indexed - start_p: index (.loc) of first pulse in the ROI (inclusive) in pulse dataframe - end_p: index (.loc) of last pulse in the ROI (inclusive) in pulse dataframe - start_f: first frame of the first pulse in the ROI (inclusive) - end_f: Last frame of the last pulse in the ROI (exclusive) - start_s: Time (in seconds elapsed from beginning of run) of the start of the ROI - end_s: Time (in seconds elapsed from beginning of run) of the end of the ROI - dur_f: Duration in frames of the ROI - dur_s: Duration in seconds of the ROI - num_pulses: Number of pulses in the ROI (that also passed filtering during ROI-calling) - pw_mean: Mean pulse duration (in seconds) of pulses in the ROI - ipd_mean: Mean inter-pulse duration (in seconds) of pulses in the ROI - snr_mean: Mean signal-to-noise ratio of pulses in the ROI - intensity_mean: Mean intensity above baseline of pulses in the ROI - binratio_norm: Estimated pulse bin ratio of pulses in the ROI, according to the following equation: sum(bin0_intensity*dur_f) / np.sum(bin1_intensity*dur_f) - ROI_score: ROI quality score (0-1 from least to most likely to contain recognizer-peptide recognition pulsing) - binratio_skew: bin ratio correction factor accounting for binning signal timing differences across the chip. This factor has already been applied to the binratio_norm column
创建时间:
2022-09-27
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作