GROQ-seq Fitness and Function Measurements for TEV Protease Homolog Library
收藏DataCite Commons2026-05-06 更新2026-05-07 收录
下载链接:
https://zenodo.org/doi/10.5281/zenodo.19926757
下载链接
链接失效反馈官方服务:
资源简介:
This dataset contains GROQ-seq functional measurements of a sequence-diverse TEV protease library comprising 4,022 natural homologs identified by iterative JackHMMER searches of UniRef100, BFD, and MGnify, together with 422 AI-generated "shrunken" variants produced by the SCISOR discrete-diffusion model — yielding 11,722 unique amino-acid sequences (after synthesis-introduced variation) spanning Levenshtein distances up to 245 and a mean pairwise identity of 41.8% to TEV protease S219V. Fitness is measured via the same split-DHFR system as the TEV protease pilot, with the canonical TEV substrate (ENLYFQS) embedded as the linker; protease cleavage destroys DHFR and reduces trimethoprim resistance, translating activity into differences in cellular growth. Function is reported at the assay condition with the highest dynamic range (high TEV expression, low split-DHFR expression).
This run of the assay was performed at the Living Measurements Systems Foundry (LMSF) at the National Institute of Standards and Technology. Part of a broader effort to generate sequence → function data across broad evolutionary sequence space, supporting machine-learning models of sequence-structure-function relationships and informing protein engineering across diverse protease families.
提供机构:
Zenodo
创建时间:
2026-05-06



