High diversity gene libraries facilitate machine learning guided exploration of fluorescent protein sequence space
收藏Figshare2025-11-11 更新2026-04-08 收录
下载链接:
https://figshare.com/articles/dataset/High_diversity_gene_libraries_facilitate_machine_learning_guided_exploration_of_fluorescent_protein_sequence_space/30585419/1
下载链接
链接失效反馈官方服务:
资源简介:
These files contain the NGS mapping files (linking barcodes and gene variants), alignments, and potential dial-out PCR primers for the two parental libraries (C1P and C2P) containing fluorescent proteins from FPBase.com encoded in two different codon versions. These libraries have been made available by the Plesa lab to the community through Addgene.The libraries are described in the publication:<br>A. Benabbas†, P. Kearns†, A. Billo, L. Chisholm, C. Plesa. <i>High diversity gene libraries facilitate machine learning guided exploration of fluorescent protein sequence space</i>. 2025<br>File descriptions:<br><b>C1P_map.all.csv</b> - mapping file for library C1P (Codon1)<b>C1P_map.perfects.csv</b> - same as C1P_map.all.csv but filtered to only include perfect genes (no mutants).<br><b>C2P_map.all.csv</b> - mapping file for library C2P (Codon2)<br><b>C2P_map.perfects.csv</b> - same as C2P_map.all.csv but filtered to only include perfect genes (no mutants).<br><b>FP.C1.v2.genes</b> - The DNA level gene reference file for library C1P, used for bbmap alignments.<br><b>FP.C2.v2.genes</b> - The DNA level gene reference file for library C2P, used for bbmap alignments.<br><b>FPBase.proteins</b> - The protein level reference file for both libraries (C1P and C2P).<br>Each of the csv files contain the following columns:<br><b>bc</b> - the barcode sequence<br><b>dna</b> - the DNA sequence of this gene variant. Includes the stop codon TAA at the end. Excludes the ATG start codon. Sequence is between the NdeI (CATATG) site and the KpnI (GGTACC) site on the pEVBC1 plasmid.<br><b>aatrim</b> - the translated protein sequence (until the first stop codon)<br><b>mutID</b> - a unique ID for each protein variant<br><b>mutations</b> - how many a.a. mutations does this protein variant have relative to the closest designed parental fluorescent protein<br><b>bbmap_Parent</b> - the reference file ID of the bbmap DNA level alignment<br><b>bbmap_POS</b> - the based leftmost mapping POSition of the bbmap DNA level alignment<br><b>bbmap_MAPQ</b> - the MAPping Quality of the bbmap DNA level alignment<br><b>bbmap_CIGAR</b> - the CIGAR string from the bbmap DNA level alignment<br><b>total_reads</b> - how many total reads were seen for this barcode<br><b>consensus_call</b> - how consensus was determined for this barcode. Majority reads is the highest confidence.<br><b>forward_primer</b> - potential FWD primer for dialout PCR of this variant<br><b>reverse_primer</b> - potential REV primer for dialout PCR of this variant<br><b>forward_tm</b> - FWD primer Tm<br><b>reverse_tm</b> - REV primer Tm<br><b>forward_hairpin_dG</b> - FWD primer hairpin deltaG calculated with primer3<br><b>reverse_hairpin_dG</b> - REV primer hairpin deltaG calculated with primer3<br><b>forward_homodimer_3p_run</b> - no more than 4 contiguous complementary bases at the 3' end for FWD self-dimers<br><b>reverse_homodimer_3p_run</b> - no more than 4 contiguous complementary bases at the 3' end for REV self-dimers<br><b>heterodimer_3p_run</b> - no more than 4 contiguous complementary bases at the 3' ends for this hetero-dimer pair<br><b>note</b> - primer design notes<br><br>Any questions regarding this data can be directed to Calin Plesa
提供机构:
Billo, Avery; Chisholm, Lauren; Benabbas, Anissa; Kearns, Phillip; Plesa, Calin
创建时间:
2025-11-11



