five

Cas相关蛋白序列预测模拟数据

收藏
国家基础学科公共科学数据中心2026-02-28 收录
下载链接:
https://nbsdc.cn/general/dataDetail?id=69a1bf95195d261dfe7849c0&type=1
下载链接
链接失效反馈
官方服务:
资源简介:
本数据集面向CRISPR‑Cas蛋白序列功能预测与同源性分析,基于可信执行环境(TEE)增强的对比学习框架构建。数据涵盖来自四个合作机构的约10万条核心Cas相关蛋白序列,包含标准化标识、氨基酸序列及功能注释;同时整合了百万级通用蛋白背景数据以支持扩展训练。所有数据通过Python脚本与PyTorch框架在Ubuntu系统下生成与处理,并在CPU、GPU及TEE三种环境中执行同一模型,记录了预测结果、分类精度、召回率、响应时间与吞吐量等性能指标。时间记录精度为毫秒级,空间上支持跨机构分布式协同处理。数据质量经完整性校验、分布一致性分析及与公开数据库比对验证,符合科研级标准。本数据集为隐私保护下的蛋白质机器学习研究提供了真实、可复现的多环境性能基准,对推动生物信息安全计算、跨机构协作建模与高性能生物信息分析平台构建具有重要参考价值。

This dataset is targeted at functional prediction and homology analysis of CRISPR-Cas protein sequences, and is constructed based on a contrastive learning framework enhanced by Trusted Execution Environment (TEE). It contains approximately 100,000 core Cas-related protein sequences from four cooperating institutions, including standardized identifiers, amino acid sequences and functional annotations; additionally, it integrates millions of universal protein background datasets to support extended training. All data was generated and processed using Python scripts and the PyTorch framework on the Ubuntu operating system. The same model was deployed and executed across three environments: CPU, GPU, and TEE, with performance metrics including prediction results, classification accuracy, recall rate, response time and throughput being recorded. The time recording precision reaches the millisecond level, and it supports distributed collaborative processing across institutions. The data quality has been validated via integrity checks, distribution consistency analysis and alignment with public databases, meeting scientific research-grade standards. This dataset provides a realistic and reproducible multi-environment performance benchmark for privacy-preserving protein machine learning research, and possesses important reference value for advancing bioinformatics secure computing, cross-institution collaborative modeling and the construction of high-performance bioinformatics analysis platforms.
提供机构:
之江实验室
二维码
社区交流群
二维码
科研交流群
商业服务