MichelNivard/proteinLM-mixed-pretraining-v1

Name: MichelNivard/proteinLM-mixed-pretraining-v1
Creator: MichelNivard
Published: 2025-03-31 12:50:32
License: 暂无描述

Hugging Face2025-03-31 更新2025-08-30 收录

下载链接：

https://hf-mirror.com/datasets/MichelNivard/proteinLM-mixed-pretraining-v1

下载链接

链接失效反馈

官方服务：

资源简介：

蛋白质语言模型预训练混合数据集，包含来自MG_Prot50、UniRef50和UniRef90-mammals三个来源的蛋白质混合样本，总共近40亿个氨基酸，分布在1500万个蛋白质中，分为10个文件存储。数据集设计用于蛋白质语言模型的预训练，这些模型将微调用于与人类或其他哺乳动物蛋白质相关的预测/结构/交互任务。

Pretraining mix for Protein language models, containing a mixture of proteins from three sources: MG_Prot50, UniRef50, and UniRef90-mammals, totaling nearly 4 billion amino acids across 15 million proteins, divided into 10 files for storage. The dataset is designed for pretraining protein language models that will be fine-tuned for prediction/structure/interaction tasks related to human or other mammalian proteins.

提供机构：

MichelNivard

5,000+

优质数据集

54 个

任务类型

进入经典数据集