Wilhelmlab/detectability-proteometools
收藏Hugging Face2024-11-03 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/Wilhelmlab/detectability-proteometools
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是从PRIDE仓库收集的,涉及三个项目标识符:PXD004732、PXD010595和PXD021013。数据集基于约1000个合成肽的分析结果,使用MaxQuant软件进行特定、半特定或非特定的虚拟消化设置,并使用Trypsin、LysN或AspN作为指定的蛋白酶。数据集旨在用于训练、微调和评估基于肽序列的可检测性预测模型。由于肽是合成的,训练数据可能会偏向于合成性而非消化性。数据集包含Sequences和Classes两个特征,分别表示肽序列和类别。数据集被分为训练集、验证集和测试集,每个集的大小和样本数量都有详细说明。
This dataset was collected from the PRIDE repository with the identifiers PXD004732, PXD010595, and PXD021013. The datasets were originally obtained by analyzing pools of approximately 1000 synthetic peptides. RAW data was analyzed using either specific, semi-specific, or unspecific in silico digestion settings in MaxQuant and with Trypsin, LysN, or AspN as specified protease. The dataset is intended to be used for training, fine-tuning, and evaluating detectability prediction models, given a peptide sequence. Note that since the peptides were synthesized, training on the dataset will be somewhat biased to synthesizability, rather than digestability. The dataset features include Sequences and Classes, representing peptide sequences and classes, respectively. The dataset is divided into training, validation, and test sets, with detailed information on the size and number of examples for each set.
提供机构:
Wilhelmlab



