BAR-CAT: Targeted Recovery of Synthetic Genes via Barcode-Directed CRISPR-dCas9 Enrichment

NIAID Data Ecosystem2026-05-02 收录

下载链接：

https://figshare.com/articles/dataset/BAR-CAT_Targeted_Recovery_of_Synthetic_Genes_via_Barcode-Directed_CRISPR-dCas9_Enrichment/29428988

下载链接

链接失效反馈

官方服务：

资源简介：

Abstract: Modern gene-synthesis platforms let us probe protein function and genome biology at unprecedented scale. Yet in large, diverse gene libraries the proportion of error-free constructs decreases with length due to the propagation of oligo synthesis errors. To rescue these rare, error-free molecules we developed BAR-CAT (Barcode-Assisted Retrieval CRISPR-Activated Targeting), an in-vitro enrichment method that couples unique PAM-adjacent 20-nt barcodes to each library member and uses multiplexed dCas9-sgRNA complexes to fish out the barcodes corresponding to perfect assemblies. After a single 15-min reaction and optimized wash regime (BAR-CAT v1.0), three low-abundance targets in a 300,000-member test library were enriched 600-fold, greatly reducing downstream requirements. When applied to 384x and 1,536x member DropSynth gene libraries, BAR-CAT retrieved up to 122-fold enrichment for 12 targets and revealed practical limits imposed by sgRNA competition and library complexity, which now guide ongoing protocol scaling. By eliminating laborious clone-by-clone validation and working directly on plasmid libraries, BAR-CAT provides a versatile platform for recovering perfect synthetic genes, subsetting large libraries, and ultimately lowering the cost of functional genomics at scale. This dataset contains processed enrichment data for all spacers including: sequence - spacer sequence status - if the sequence was an enrichment target or not log2enrich - log2 fold enrichment score reads.ini - initial reads before enrichment reads.postenrich - reads after enrichment The raw Illumina MiSeq reads of the target gene libraries and nanopore sequence reads for enriched libraries are available on the NCBI Sequence Read Archive under BioProject accession PRJNA1273454 (https://www.ncbi.nlm.nih.gov/bioproject/1273454).

摘要：现代基因合成平台使我们能够以前所未有的规模探究蛋白质功能与基因组生物学。然而在大型多样化基因文库中，由于寡核苷酸合成错误的累积传递，无错误基因构建体的占比会随序列长度增加而降低。为挽救这类稀有无错误分子，我们研发了BAR-CAT（Barcode-Assisted Retrieval CRISPR-Activated Targeting，条形码辅助检索CRISPR激活靶向技术）：该体外富集方法可为每个文库成员耦合独特的、紧邻PAM（前间区序列邻近基序）的20 nt条形码，并利用多重dCas9-sgRNA复合物筛选分离出对应完美组装体的条形码。仅需单次15分钟反应与优化后的洗涤流程（BAR-CAT v1.0版本），在包含30万个成员的测试文库中，3个低丰度靶标即可实现600倍富集，大幅降低了后续实验的操作需求。当应用于包含384个、1536个成员的DropSynth基因文库时，BAR-CAT可实现12个靶标的最高122倍富集，并揭示了由sgRNA（单向导RNA）竞争与文库复杂度带来的实际性能瓶颈，该发现目前正指导后续实验方案的规模化升级。通过省去繁琐的单克隆逐一验证流程，并直接对质粒文库开展操作，BAR-CAT为回收完美合成基因、对大型文库进行子集分选提供了一个通用平台，并最终规模化降低了功能基因组学研究的成本。本数据集包含所有间隔序列（spacer）的预处理富集数据，具体字段如下： sequence：间隔序列的核苷酸序列 status：该序列是否为富集靶标 log2enrich：log2转换的富集倍数评分 reads.ini：富集前的测序读段数 reads.postenrich：富集后的测序读段数目标基因文库的原始Illumina MiSeq测序读段，以及富集文库的纳米孔测序读段，均可在NCBI序列读取档案（SRA）下通过BioProject登录号PRJNA1273454（https://www.ncbi.nlm.nih.gov/bioproject/1273454）获取。

创建时间：

2025-06-27

5,000+

优质数据集

54 个

任务类型

进入经典数据集