five

Dataset 4 - Membrane Protein Types

收藏
Mendeley Data2024-06-25 更新2024-06-26 收录
下载链接:
https://data.mendeley.com/datasets/dbzdybks82
下载链接
链接失效反馈
官方服务:
资源简介:
To establish a quality benchmark dataset for developing a predictor to identify the functional types of membrane proteins, the sequences were collected from UniProtKB/ Swiss-Prot release on 2018_04 at http://www.uniprot.org/according to the following steps (Lin et al. 2013). Proteins belonging to all eight types were collected. Those proteins annotated with ‘‘fragment’’ were removed; meanwhile, those proteins with the length of sequence less than 50 residues were also excluded, in case of the influence of the fragment. Sequences annotated with ambiguous or uncertain terms, such as ‘‘potential,’’ ‘‘probable,’’‘‘probably,’’ ‘‘maybe,’’ or ‘‘by similarity,’’ were removed for further consideration. The Dataset 4 is divided as training dataset and testing dataset with 1332 and 1033 respectively.

为构建用于开发膜蛋白功能类型预测器的高质量基准数据集,本研究依据Lin等人2013年提出的实验步骤,从2018年4月发布的UniProtKB/Swiss-Prot数据库(http://www.uniprot.org/)中采集序列。共收集了全部8类膜蛋白。首先剔除带有‘片段(fragment)’注释的蛋白,同时排除序列长度不足50个残基的蛋白,以避免片段序列对后续分析造成干扰。此外,移除了带有歧义或不确定注释术语(如‘潜在(potential)’‘可能(probable)’‘大概率(probably)’‘或许(maybe)’或‘基于相似性(by similarity)’)的序列,不再纳入后续研究。本数据集4划分为训练数据集与测试数据集,二者样本量分别为1332和1033。
创建时间:
2024-01-23
搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
该数据集是一个用于预测膜蛋白功能类型的基准数据集,包含训练集(1332条)和测试集(1033条),数据来源于UniProtKB/Swiss-Prot并经过严格筛选,去除了片段蛋白、短序列蛋白和标注不明确的蛋白序列。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作