five

GROQ-seq Fitness and Function Measurements for TEV Protease Homolog Library

收藏
DataCite Commons2026-05-06 更新2026-05-07 收录
下载链接:
https://zenodo.org/doi/10.5281/zenodo.19926757
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset contains GROQ-seq functional measurements of a sequence-diverse TEV protease library comprising 4,022 natural homologs identified by iterative JackHMMER searches of UniRef100, BFD, and MGnify, together with 422 AI-generated "shrunken" variants produced by the SCISOR discrete-diffusion model — yielding 11,722 unique amino-acid sequences (after synthesis-introduced variation) spanning Levenshtein distances up to 245 and a mean pairwise identity of 41.8% to TEV protease S219V. Fitness is measured via the same split-DHFR system as the TEV protease pilot, with the canonical TEV substrate (ENLYFQS) embedded as the linker; protease cleavage destroys DHFR and reduces trimethoprim resistance, translating activity into differences in cellular growth. Function is reported at the assay condition with the highest dynamic range (high TEV expression, low split-DHFR expression). This run of the assay was performed at the Living Measurements Systems Foundry (LMSF) at the National Institute of Standards and Technology. Part of a broader effort to generate sequence → function data across broad evolutionary sequence space, supporting machine-learning models of sequence-structure-function relationships and informing protein engineering across diverse protease families.
提供机构:
Zenodo
创建时间:
2026-05-06
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作