five

vlccek/MT_dataset

收藏
Hugging Face2026-04-15 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/vlccek/MT_dataset
下载链接
链接失效反馈
官方服务:
资源简介:
# Protein Mutation Fitness Dataset This dataset contains protein sequences and their corresponding fitness scores, along with structural classifications (CATH). ## Files - `train.parquet`: Training set containing 1,560,811 samples. - `validation.parquet`: Validation set containing 178,050 samples. ## Column Descriptions - `wt_sequence`: Wild-type protein sequence. - `mut_sequence`: Mutated protein sequence. - `mutation`: Specific mutation details. - `fitness`: Mutation impact score (+1 = maximally stabilizing, -1 = maximally destabilizing). - `cath_*`: Structural classification based on the CATH database (Class, Architecture, Topology, Homology). - `data_source`: Origin of the data. - `reverse`: Indicator for reverse mutation data. ## Usage The files are in Apache Parquet format and can be easily loaded using pandas: ```python import pandas as pd df = pd.read_parquet('train.parquet') ```
提供机构:
vlccek
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作