High diversity gene libraries facilitate machine learning guided exploration of fluorescent protein sequence space
收藏Figshare2025-11-11 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/High_diversity_gene_libraries_facilitate_machine_learning_guided_exploration_of_fluorescent_protein_sequence_space/30585419
下载链接
链接失效反馈官方服务:
资源简介:
These files contain the NGS mapping files (linking barcodes and gene variants), alignments, and potential dial-out PCR primers for the two parental libraries (C1P and C2P) containing fluorescent proteins from FPBase.com encoded in two different codon versions. These libraries have been made available by the Plesa lab to the community through Addgene.The libraries are described in the publication:A. Benabbas†, P. Kearns†, A. Billo, L. Chisholm, C. Plesa. High diversity gene libraries facilitate machine learning guided exploration of fluorescent protein sequence space. 2025File descriptions:C1P_map.all.csv - mapping file for library C1P (Codon1)C1P_map.perfects.csv - same as C1P_map.all.csv but filtered to only include perfect genes (no mutants).C2P_map.all.csv - mapping file for library C2P (Codon2)C2P_map.perfects.csv - same as C2P_map.all.csv but filtered to only include perfect genes (no mutants).FP.C1.v2.genes - The DNA level gene reference file for library C1P, used for bbmap alignments.FP.C2.v2.genes - The DNA level gene reference file for library C2P, used for bbmap alignments.FPBase.proteins - The protein level reference file for both libraries (C1P and C2P).Each of the csv files contain the following columns:bc - the barcode sequencedna - the DNA sequence of this gene variant. Includes the stop codon TAA at the end. Excludes the ATG start codon. Sequence is between the NdeI (CATATG) site and the KpnI (GGTACC) site on the pEVBC1 plasmid.aatrim - the translated protein sequence (until the first stop codon)mutID - a unique ID for each protein variantmutations - how many a.a. mutations does this protein variant have relative to the closest designed parental fluorescent proteinbbmap_Parent - the reference file ID of the bbmap DNA level alignmentbbmap_POS - the based leftmost mapping POSition of the bbmap DNA level alignmentbbmap_MAPQ - the MAPping Quality of the bbmap DNA level alignmentbbmap_CIGAR - the CIGAR string from the bbmap DNA level alignmenttotal_reads - how many total reads were seen for this barcodeconsensus_call - how consensus was determined for this barcode. Majority reads is the highest confidence.forward_primer - potential FWD primer for dialout PCR of this variantreverse_primer - potential REV primer for dialout PCR of this variantforward_tm - FWD primer Tmreverse_tm - REV primer Tmforward_hairpin_dG - FWD primer hairpin deltaG calculated with primer3reverse_hairpin_dG - REV primer hairpin deltaG calculated with primer3forward_homodimer_3p_run - no more than 4 contiguous complementary bases at the 3' end for FWD self-dimersreverse_homodimer_3p_run - no more than 4 contiguous complementary bases at the 3' end for REV self-dimersheterodimer_3p_run - no more than 4 contiguous complementary bases at the 3' ends for this hetero-dimer pairnote - primer design notesAny questions regarding this data can be directed to Calin Plesa
创建时间:
2025-11-11



