five

Foundation-Peptidomimetics-Language-Model

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/14720107
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset aim to advance the systematic study and design of peptidomimetics by leveraging non-canonical elements. We extracted over 17,000 non-canonical elements, including non-canonical amino acids and terminal modifications, from peptidomimetics available in the ChEMBL database. These elements have been standardized to facilitate their representation in a sequence-like format, providing a foundation for consistent analysis and design. We developed a foundational language model, GPepT, trained on peptides and peptidomimetics. This model, hosted on HuggingFace (https://huggingface.co/Playingyoyo/GPepT), allows users to design novel peptidomimetics efficiently. The combination of the standardized dataset and GPepT makes it easier to explore, analyze, and generate new peptidomimetic sequences with enhanced scientific precision. Files Included: dictionary.txtA comprehensive dictionary of elements (amino acids and terminal modifications) with the following features: standardized IDs (e.g., canonical amino acids follow the one-letter code; non-canonical amino acids start with "X", terminal modifications with "Z"). SMILES representation tautomeric SMILES peptide bond sites functional groups some physiochemical properties frequency datasetP.txtA dataset of peptidomimetics extracted from ChEMBL, encoded using the standardized vocabulary. Contains: ChEMBL ID sequence representation SMILES representation length Some physiochemical properties peptidomimetics_wetlab.txtSequences (Pep1~pep5) generated by GPepT that were used for experimental validation. Pep1_activity.txtAntimicrobial activity of Pep1 against E. coli.
创建时间:
2025-01-22
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作