MichelNivard/proteinLM-mixed-pretraining-v1
收藏Hugging Face2025-03-31 更新2025-08-30 收录
下载链接:
https://hf-mirror.com/datasets/MichelNivard/proteinLM-mixed-pretraining-v1
下载链接
链接失效反馈官方服务:
资源简介:
蛋白质语言模型预训练混合数据集,包含来自MG_Prot50、UniRef50和UniRef90-mammals三个来源的蛋白质混合样本,总共近40亿个氨基酸,分布在1500万个蛋白质中,分为10个文件存储。数据集设计用于蛋白质语言模型的预训练,这些模型将微调用于与人类或其他哺乳动物蛋白质相关的预测/结构/交互任务。
Pretraining mix for Protein language models, containing a mixture of proteins from three sources: MG_Prot50, UniRef50, and UniRef90-mammals, totaling nearly 4 billion amino acids across 15 million proteins, divided into 10 files for storage. The dataset is designed for pretraining protein language models that will be fine-tuned for prediction/structure/interaction tasks related to human or other mammalian proteins.
提供机构:
MichelNivard



