Whole proteome-level GPS predictions (part 2)
收藏Mendeley Data2024-01-31 更新2024-06-30 收录
下载链接:
https://figshare.com/articles/dataset/Whole_proteome-level_GPS_predictions_part_2_/13228379
下载链接
链接失效反馈官方服务:
资源简介:
This is the expanded set of all predictions for GPS, run on the entire reference proteome, including sites not known to be phosphorylated. This dataset is used to perform a fast update when new phosphosites are discovered. The uncompressed folder will yield a large CSV file with predictions in list format (i.e. one line per kinase-substrate prediction) Columns in this order:substrate_id - unique substrate (accession_site) IDsubstrate_acc - Uniprot accession of substrate proteinsubstrate_name - Name of proteinsite - amino acid type and position (S5, means serine position 5)pep - 15-amino acid sequence centered on site of phosphorylationscore - prediction algorithm scoreKinase Name - name of kinase by our controlled ontology (found in this project) Each entry indicates the protein at position (identified by peptide) and has a score weight prediction for the given kinase. FOR FULL GPS RAW, you must combine this with the first zip part. Please be sure to download both into the same directory before unzipping. The final, uncompressed file, is 24GB.
本数据集为GPS算法的全量预测扩展集,基于完整参考蛋白质组运行生成,涵盖尚未被证实为磷酸化位点的序列位点。该数据集可用于新磷酸化位点被发现时的快速更新。
解压该文件夹后将生成一份大型CSV文件,其中预测结果以列表格式存储(即每一行对应一条激酶-底物预测结果)。列的顺序如下:
substrate_id:底物唯一标识符,格式为accession_site
substrate_acc:底物蛋白的UniProt登录号
substrate_name:蛋白名称
site:氨基酸类型与位点位置(例如S5代表第5位丝氨酸)
pep:以磷酸化位点为中心的15氨基酸序列
score:预测算法得分
Kinase Name:基于本项目受控本体论定义的激酶名称
每条记录均代表通过肽段标识的位点对应的蛋白,并给出该位点针对指定激酶的得分权重预测结果。
若需获取完整的GPS原始数据,需将本数据集与第一个压缩包分卷合并。请确保将两个压缩包下载至同一目录后再进行解压操作。最终解压得到的文件大小为24GB。
创建时间:
2024-01-31



