Foundation-Peptidomimetics-Language-Model

NIAID Data Ecosystem2026-05-02 收录

下载链接：

https://zenodo.org/record/14720107

下载链接

链接失效反馈

官方服务：

资源简介：

This dataset aim to advance the systematic study and design of peptidomimetics by leveraging non-canonical elements. We extracted over 17,000 non-canonical elements, including non-canonical amino acids and terminal modifications, from peptidomimetics available in the ChEMBL database. These elements have been standardized to facilitate their representation in a sequence-like format, providing a foundation for consistent analysis and design. We developed a foundational language model, GPepT, trained on peptides and peptidomimetics. This model, hosted on HuggingFace (https://huggingface.co/Playingyoyo/GPepT), allows users to design novel peptidomimetics efficiently. The combination of the standardized dataset and GPepT makes it easier to explore, analyze, and generate new peptidomimetic sequences with enhanced scientific precision. Files Included: dictionary.txtA comprehensive dictionary of elements (amino acids and terminal modifications) with the following features: standardized IDs (e.g., canonical amino acids follow the one-letter code; non-canonical amino acids start with "X", terminal modifications with "Z"). SMILES representation tautomeric SMILES peptide bond sites functional groups some physiochemical properties frequency datasetP.txtA dataset of peptidomimetics extracted from ChEMBL, encoded using the standardized vocabulary. Contains: ChEMBL ID sequence representation SMILES representation length Some physiochemical properties peptidomimetics_wetlab.txtSequences (Pep1~pep5) generated by GPepT that were used for experimental validation. Pep1_activity.txtAntimicrobial activity of Pep1 against E. coli.

创建时间：

2025-01-22

5,000+

优质数据集

54 个

任务类型

进入经典数据集