five

Heisenburger2000/pUniFind

收藏
Hugging Face2026-03-12 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/Heisenburger2000/pUniFind
下载链接
链接失效反馈
官方服务:
资源简介:
# pUniFind: Unified large pretrained deep learning model pushing the limit of mass spectra interpretation <!-- [![arXiv](https://img.shields.io/badge/arXiv-2308.12345-B31B1B)](https://arxiv.org/abs/1234.56789) [![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://www.apache.org/licenses/LICENSE-2.0) --> [![arXiv](https://img.shields.io/badge/arXiv-2507.00087-B31B1B)](https://arxiv.org/abs/2507.00087) [![Windows Executable](https://img.shields.io/badge/Windows-GUI-green)](https://github.com/pFindStudio/pUniFind/releases) [![Bohrium](https://img.shields.io/badge/Web%20server-Bohrium%20App-00BFFF)](https://bohrium.dp.tech/apps/punifind) <!-- [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/your-repo/pUniFind) --> <div style="text-align: center; line-height: 1.8; margin-bottom: 25px;"> </div> This is the official repository for **pUniFind**, the most powerful zero-shot open peptide-spectrum scoring model surpassing other SOTA search engines and the first zero-shot open de novo sequencing deep learning model supporting over 1300 modifications. Developed by [pFind group](https://pfind.net/) and [DP Technology](https://www.dp.tech/en). For source code, Please see [GitHub](https://github.com/pFindStudio/pUniFind/tree/main). ## 🚀 Key Features 🔥 **Powerful open scoring performance.** Surpassing all former SOTA search enegines including open-pFind and MSFragger with MSBooster supporting over 1300 modifications. 🔥 **High Accuracy.** Comprehensive experimental results demonstrate that the model exhibits no significant overfitting to either the target or decoy peptides in the training data, while maintaining high accuracy across different evaluation scenarios. More careful evaluations can be seen in our preprint. 🔥 **Zero-shot open de novo.** The first open de novo sequencing deep learning methods without the need for finetuning, supporting over 1300 modifications. 🔥 **De Novo reliable result filtering and user-friendly result file.** Based on various deep learning features, our model can effectively filter out unreliable results which is extremely useful for real world usage. Our user-friendly results file also contains end-to-end score, cos similarity, mass difference and missing fragment ion sites, which can better help user to evaluate its reliability. Result file also support visualization. ## &#x1F4E3; News - **2026/2/28** Memory protection to prevent leaks; optimized preprocessing for speed; added unit test with demo data; refined default parameters; updated license; bug fixes. - **2026/2/1** pUniFind has released a more user-friendly version of its Linux source code! Please refer to the updated 'Linux Usage Instructions' in the User Guide for more details. We are also working on a fix for Windows-specific issues—stay tuned. - **2025/11/29** pUniFind released linux source code preview!🚀. (We will optimize the user experience in the near future, This is just a preview version linux source code.) - **2025/6/24** pUniFind supports timsTOF open de novo sequencing. - **2025/5/25** pUniFind repository Initial Release 🚀. ## 🛠️ Technical Support Should you encounter any technical issues, suggestions, observe suboptimal performance, or identify inconsistencies between pUniFind results and our evaluation metrics, we welcome your feedback 🙏. We are looking for bad cases to further refine our model. We can improve performance in 50% of poor cases using our proprietary, complex methods, which is why we have not released them publicly. If you have any suggestions about our software, please do not hesitate to contact us. We are **actively** updating and refining our software, since the main author is **far** from graduation :(. **For technical inquiries:** 1. **GitHub Issues**: [Open a new issue](https://github.com/pFindStudio/pUniFind/issues) with: - Data description. - Error logs and environment. - Uploaded folder description 1. **pFind Studio user support WeChat group**: - Please add my WeChat: ```JL_Zhao2000```, and I will invite you into our user support group. (Because WeChat invitation expires in one week.) **For collaboration requests:** 📧 **Contact info**: Jiale Zhao. Email: [zhaojiale22z@ict.ac.cn](mailto:zhaojiale22z@ict.ac.cn) or [marshmallowzjl@gmail.com](mailto:marshmallowzjl@gmail.com). ## 📅 Roadmap **Staring** and **watching** our repo will remind you of our updates. We will keep optimizing our model. | Milestone | Status | |-------------------|--------| | Supporting EThcD | 🚄 Highest priority (maybe 2 months) | | Integarating pUniFind into open-pFind | 🚧 Preparing | | User-defined new PTM Tuning | 📝 Planning | | Improving the performance and speed of scoring and de novo sequencing. | 📝 Long-term | ## 🤝 Citation <a name="-citation"></a> If you find our software is useful and helped your research, **please cite** us 🙏 through: ```bash @misc{zhao2025punifindunifiedlargepretrained, title={pUniFind: a unified large pre-trained deep learning model pushing the limit of mass spectra interpretation}, author={Jiale Zhao and Pengzhi Mao and Kaifei Wang and Yiming Li and Yaping Peng and Ranfei Chen and Shuqi Lu and Xiaohong Ji and Jiaxiang Ding and Xin Zhang and Yucheng Liao and Weinan E and Weijie Zhang and Han Wen and Hao Chi}, year={2025}, eprint={2507.00087}, archivePrefix={arXiv}, primaryClass={cs.LG}, url={https://arxiv.org/abs/2507.00087}, } ``` Your every citation will motivate the main author to make pUniFind more user-friendly and powerful. The main author needs your valuable citations and stars to find a job after graduation 😫. ## DOI [10.5281/zenodo.18887523](https://zenodo.org/records/18976195)

# pUniFind:突破质谱解析极限的统一型大规模预训练深度学习模型 <!-- [![arXiv](https://img.shields.io/badge/arXiv-2308.12345-B31B1B)](https://arxiv.org/abs/1234.56789) [![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://www.apache.org/licenses/LICENSE-2.0) --> [![arXiv](https://img.shields.io/badge/arXiv-2507.00087-B31B1B)](https://arxiv.org/abs/2507.00087) [![Windows可执行程序(GUI)](https://img.shields.io/badge/Windows-GUI-green)](https://github.com/pFindStudio/pUniFind/releases) [![Bohrium应用](https://img.shields.io/badge/Web%20server-Bohrium%20App-00BFFF)](https://bohrium.dp.tech/apps/punifind) <!-- [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/your-repo/pUniFind) --> <div style="text-align: center; line-height: 1.8; margin-bottom: 25px;"></div> 这是**pUniFind**的官方代码仓库。该模型是目前性能最强的零样本(Zero-shot)开放肽谱评分模型,超越所有当前最优(SOTA,State-of-the-Art)搜索引擎,同时也是全球首款支持超过1300种修饰的零样本开放型深度学习从头测序(de novo sequencing)模型。本项目由[pFind研究团队](https://pfind.net/)与[DP科技(DP Technology)](https://www.dp.tech/en)联合开发。 如需获取源代码,请访问[GitHub仓库](https://github.com/pFindStudio/pUniFind/tree/main)。 ## 🚀 核心特性 🔥 **卓越的开放评分性能**:超越包括open-pFind、搭载MSBooster的MSFragger在内的所有前代SOTA搜索引擎,支持超过1300种翻译后修饰(PTM)。 🔥 **高精度表现**:全面的实验结果表明,该模型在训练数据的目标肽段与诱饵肽段上均无明显过拟合现象,且在各类评估场景中均保持高精度水平。更细致的评估可参考我们的预印本论文。 🔥 **零样本开放从头测序**:全球首款无需微调的开放型深度学习从头测序方法,支持超过1300种PTM。 🔥 **可靠的从头测序结果过滤与易用的结果文件**:基于多种深度学习特征,本模型可有效过滤不可靠结果,对实际应用极具价值。我们的易用型结果文件包含端到端评分、余弦相似度、质量偏差以及缺失的碎片离子位点等信息,可帮助用户更便捷地评估结果可靠性。结果文件还支持可视化展示。 ## 📢 最新动态 - **2026/2/28**:新增内存保护机制以防止内存泄漏;优化预处理流程以提升运行速度;新增基于演示数据的单元测试;优化默认参数;更新许可证;修复若干已知bug。 - **2026/2/1**:pUniFind发布了更易用的Linux源代码版本!详情请参阅用户指南中更新后的《Linux使用说明》。我们正在修复Windows平台的特定问题,请持续关注。 - **2025/11/29**:pUniFind发布Linux源代码预览版!🚀(我们将在近期优化用户体验,此版本仅为Linux源代码预览版)。 - **2025/6/24**:pUniFind新增对timsTOF平台的开放从头测序支持。 - **2025/5/25**:pUniFind仓库正式初始发布 🚀。 ## 🛠️ 技术支持 如您在使用过程中遇到技术问题、有改进建议、发现性能不佳的情况,或观察到pUniFind的结果与我们的评估指标存在不一致,欢迎向我们反馈🙏。我们正在收集不良案例以进一步优化模型,凭借自研的复杂方法,我们可将50%的不良案例的性能进行提升,但该方法尚未公开。若您对本软件有任何建议,请随时联系我们。我们正在积极更新和优化本软件,因为主要作者距离毕业还有一段时间:(。 **技术咨询渠道**: 1. **GitHub Issues**:[提交新Issue](https://github.com/pFindStudio/pUniFind/issues),请附带以下信息: - 数据描述 - 错误日志与运行环境 - 上传文件夹的说明 1. **pFind Studio用户支持微信群**: 请添加我的微信:`JL_Zhao2000`,我将邀请您加入用户支持群(由于微信邀请链接有效期为一周)。 **合作洽谈**: 📧 **联系方式**:赵佳乐。邮箱:[zhaojiale22z@ict.ac.cn](mailto:zhaojiale22z@ict.ac.cn) 或 [marshmallowzjl@gmail.com](mailto:marshmallowzjl@gmail.com)。 ## 📅 开发路线图 关注并星标我们的仓库即可及时获取更新。我们将持续优化模型性能。 | 里程碑 | 状态 | |-------------------|--------| | 支持EThcD碎裂方式 | 🚄 最高优先级(预计2个月内完成) | | 将pUniFind集成至open-pFind | 🚧 筹备中 | | 支持用户自定义新PTM调优 | 📝 规划中 | | 提升评分与从头测序的性能及运行速度 | 📝 长期规划 | ## 🤝 引用说明 <a name="-citation"></a> 如果您认为本软件对您的研究有所帮助,请通过以下方式引用我们 🙏: bibtex @misc{zhao2025punifindunifiedlargepretrained, title={pUniFind: a unified large pre-trained deep learning model pushing the limit of mass spectra interpretation}, author={Jiale Zhao and Pengzhi Mao and Kaifei Wang and Yiming Li and Yaping Peng and Ranfei Chen and Shuqi Lu and Xiaohong Ji and Jiaxiang Ding and Xin Zhang and Yucheng Liao and Weinan E and Weijie Zhang and Han Wen and Hao Chi}, year={2025}, eprint={2507.00087}, archivePrefix={arXiv}, primaryClass={cs.LG}, url={https://arxiv.org/abs/2507.00087}, } 每一次引用都将激励主要作者将pUniFind优化得更加易用和强大。主要作者需要您的引用和星标来助力毕业后求职😫。 ## DOI [10.5281/zenodo.18887523](https://zenodo.org/records/18976195)
提供机构:
Heisenburger2000
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作