five

Exploring antibiotic resistance in diverse homologs of the dihydrofolate reductase protein family through broad mutational scanning

收藏
DataCite Commons2025-10-28 更新2026-04-25 收录
下载链接:
https://figshare.com/articles/dataset/Exploring_antibiotic_resistance_in_diverse_homologs_of_the_dihydrofolate_reductase_protein_family_through_broad_mutational_scanning/30470525/1
下载链接
链接失效反馈
官方服务:
资源简介:
This compressed zip file contains the of DHFR fitness scores from the publication: <i>Exploring antibiotic resistance in diverse homologs of the dihydrofolate reductase protein family through broad mutational scanning</i>Karl J. Romanowicz et al. , Sci. Adv.11,eadw9178(2025). DOI:10.1126/sciadv.adw9178<br>----------------------------------------------Description of contents:<br><b>BCs15_map.csv</b> - Data for each unique mapped barcode in the Codon 1 library (lib15), with columns:<br>BC = barcode sequencemutID = a unique ID for each sequence. This starts with the NCBI accession for the closest homolog in the library (provided in column IDalign). If the sequence is perfect (0 mutations), just the NCBI accession is used (eg. NP_065309). If a sequence has less than 5 mutations, the name is build up using "NCBI accession"_"mutation1"_"mutation2"_... Where each mutation is provided in the format initial residue, residue number, final residue (eg NP_065309_H114X_I115X_S116P). If a mutant has 5 or more mutations the SHA256 hash of the sequence is appended to the NCBI accession separated by an underscore: "NCBI accession"_SHA256(sequence) (eg. NP_065309_9935c165f8cdd6a17078a94eeeda2dd762fbc548cbbe4b30a7ad8ec5bd177b70).IDalign = the NCBI accession for the closest DHFR homolog in the library. Note, some of these accessions have been pruned in NCBI due to sequence redundancy.mutations = the number of mutations relative to the closest DHFR homolog in the library.cigar = the CIGAR string from the DNA alignment.numBCs = the total number of barcodes observed for this sequence (before QC filters).seq = the observed protein sequence for this barcode. The leading M is omitted.pct_ident = The fraction identity of this sequence relative to the closest DHFR homolog in the library.D01 = read counts for this barcode in LBD03 = read counts for this barcode in M9+suppD05 = read counts for this barcode in M9-suppD06 = read counts for this barcode in 0.058 TMPD07 = read counts for this barcode in 0.5 TMPD08 = read counts for this barcode in 1.0 TMPD09 = read counts for this barcode in 10 TMPD10 = read counts for this barcode in 50 TMPD11 = read counts for this barcode in 200 TMPD01n = normalized read counts for this barcode in LBD03n = normalized read counts for this barcode in M9+suppD05n = normalized read counts for this barcode in M9-suppD06n = normalized read counts for this barcode in 0.058 TMPD07n = normalized read counts for this barcode in 0.5 TMPD08n = normalized read counts for this barcode in 1.0 TMPD09n = normalized read counts for this barcode in 10 TMPD10n = normalized read counts for this barcode in 50 TMPD11n = normalized read counts for this barcode in 200 TMPD03D01fc = M9+supp/LB [log2 fold change]D05D03fc = M9-supp/M9+supp (complementation) [log2 fold change]D06D03fc = 0.058 TMP/M9+supp [log2 fold change]D07D03fc = 0.5 TMP/M9+supp (MIC) [log2 fold change]D08D03fc = 1.0 TMP/M9+supp [log2 fold change]D09D03fc = 10 TMP/M9+supp [log2 fold change]D10D03fc = 50 TMP/M9+supp [log2 fold change]D11D03fc = 200 TMP/M9+supp (400x MIC) [log2 fold change]----------------------------------------------<br><b>BCs16_map.csv</b> - Data for each unique mapped barcode in the Codon 2 library (lib16), with columns:BC = barcode sequencemutID = a unique ID for each sequence. This starts with the NCBI accession for the closest homolog in the library (provided in column IDalign). If the sequence is perfect (0 mutations), just the NCBI accession is used (eg. NP_065309). If a sequence has less than 5 mutations, the name is build up using "NCBI accession"_"mutation1"_"mutation2"_... Where each mutation is provided in the format initial residue, residue number, final residue (eg NP_065309_H114X_I115X_S116P). If a mutant has 5 or more mutations the SHA256 hash of the sequence is appended to the NCBI accession separated by an underscore: "NCBI accession"_SHA256(sequence) (eg. NP_065309_9935c165f8cdd6a17078a94eeeda2dd762fbc548cbbe4b30a7ad8ec5bd177b70).IDalign = the NCBI accession for the closest DHFR homolog in the library. Note, some of these accessions have been pruned in NCBI due to sequence redundancy.mutations = the number of mutations relative to the closest DHFR homolog in the library.cigar = the CIGAR string from the DNA alignment.numBCs = the total number of barcodes observed for this sequence (before QC filters).seq = the observed protein sequence for this barcode. The leading M is omitted.pct_ident = The fraction identity of this sequence relative to the closest DHFR homolog in the library.D02 = read counts for this barcode in LBD04 = read counts for this barcode in M9+suppD12 = read counts for this barcode in M9-suppE01 = read counts for this barcode in 0.058 TMPE02 = read counts for this barcode in 0.5 TMPE03 = read counts for this barcode in 1.0 TMPE04 = read counts for this barcode in 10 TMPE05 = read counts for this barcode in 50 TMPE06 = read counts for this barcode in 200 TMPD02n = normalized read counts for this barcode in LBD04n = normalized read counts for this barcode in M9+suppD12n = normalized read counts for this barcode in M9-suppE01n = normalized read counts for this barcode in 0.058 TMPE02n = normalized read counts for this barcode in 0.5 TMPE03n = normalized read counts for this barcode in 1.0 TMPE04n = normalized read counts for this barcode in 10 TMPE05n = normalized read counts for this barcode in 50 TMPE06n = normalized read counts for this barcode in 200 TMPD04D02fc = M9+supp/LB [log2 fold change]D12D04fc = M9-supp/M9+supp (complementation) [log2 fold change]E01D04fc = 0.058 TMP/M9+supp [log2 fold change]E02D04fc = 0.5 TMP/M9+supp (MIC) [log2 fold change]E03D04fc = 1.0 TMP/M9+supp [log2 fold change]E04D04fc = 10 TMP/M9+supp [log2 fold change]E05D04fc = 50 TMP/M9+supp [log2 fold change]E06D04fc = 200 TMP/M9+supp (400x MIC) [log2 fold change]----------------------------------------------<b>mutIDinfo15.csv</b> - This file contains fitness data for all observed protein sequences in the Codon 1 library (lib15) with columns:mutID = a unique ID for each sequence. This starts with the NCBI accession for the closest homolog in the library (provided in column IDalign). If the sequence is perfect (0 mutations), just the NCBI accession is used (eg. NP_065309). If a sequence has less than 5 mutations, the name is build up using "NCBI accession"_"mutation1"_"mutation2"_... Where each mutation is provided in the format initial residue, residue number, final residue (eg NP_065309_H114X_I115X_S116P). If a mutant has 5 or more mutations the SHA256 hash of the sequence is appended to the NCBI accession separated by an underscore: "NCBI accession"_SHA256(sequence) (eg. NP_065309_9935c165f8cdd6a17078a94eeeda2dd762fbc548cbbe4b30a7ad8ec5bd177b70).fitD03D01 = M9+supp/LB [log2 fitness score]fitD05D03 = M9-supp/M9+supp (complementation) [log2 fitness score]fitD06D03 = 0.058 TMP/M9+supp [log2 fitness score]fitD07D03 = 0.5 TMP/M9+supp (MIC) [log2 fitness score]fitD08D03 = 1.0 TMP/M9+supp [log2 fitness score]fitD09D03 = 10 TMP/M9+supp [log2 fitness score]fitD10D03 = 50 TMP/M9+supp [log2 fitness score]fitD11D03 = 200 TMP/M9+supp (400x MIC) [log2 fitness score]numprunedBCs = the total number of barcodes used in the calculation (passing QC filters). The higher the number, the less uncertainty in the fitness calculation.IDalign = the NCBI accession for the closest DHFR homolog in the library. Note, some of these accessions have been pruned in NCBI due to sequence redundancy.numBCs = the total number of barcodes observed for this sequence (before QC filters).mutations = the number of mutations relative to the closest DHFR homolog in the library.seq = the observed protein sequence. The leading M is omitted.pct_ident = The fraction identity of this sequence relative to the closest DHFR homolog in the library.----------------------------------------------<b>mutIDinfo16.csv</b> - This file contains fitness data for all observed protein sequences in the Codon 2 library (lib16) with columns:mutID = a unique ID for each sequence. This starts with the NCBI accession for the closest DHFR homolog in the library (provided in column IDalign). If the sequence is perfect (0 mutations), just the NCBI accession is used (eg. NP_065309). If a sequence has less than 5 mutations, the name is created up using "NCBI accession"_"mutation1"_"mutation2"_... Where each mutation is provided in the format initial residue, residue number, final residue (eg NP_065309_H114X_I115X_S116P). If a mutant has 5 or more mutations the SHA256 hash of the sequence is appended to the NCBI accession separated by an underscore: "NCBI accession"_SHA256(sequence) (eg. NP_065309_9935c165f8cdd6a17078a94eeeda2dd762fbc548cbbe4b30a7ad8ec5bd177b70).fitD04D02 = M9+supp/LB [log2 fitness score]fitD12D04 = M9-supp/M9+supp (complementation) [log2 fitness score]fitE01D04 = 0.058 TMP/M9+supp [log2 fitness score]fitE02D04 = 0.5 TMP/M9+supp (MIC) [log2 fitness score]fitE03D04 = 1.0 TMP/M9+supp [log2 fitness score]fitE04D04 = 10 TMP/M9+supp [log2 fitness score]fitE05D04 = 50 TMP/M9+supp [log2 fitness score]fitE06D04 = 200 TMP/M9+supp (400x MIC) [log2 fitness score]numprunedBCs = the total number of barcodes used in the calculation (passing QC filters). The higher the number, the less uncertainty in the fitness calculation.IDalign = the NCBI accession for the closest DHFR homolog in the library. Note, some of these accessions have been pruned in NCBI due to sequence redundancy.numBCs = the total number of barcodes observed for this sequence (before QC filters).mutations = the number of mutations relative to the closest DHFR homolog in the library.seq = the observed protein sequence. The leading M is omitted.pct_ident = The fraction identity of this sequence relative to the closest DHFR homolog in the library.----------------------------------------------<b>perfects15_5BCs.csv</b> - Same as mutIDinfo15.csv (Codon 1, lib15) but filtered to only contain perfect DHFR homolog sequences with at least 5 barcodes after QC filters (mutations == 0 &amp; numprunedBCs &gt; 4)<br><b>perfects16_5BCs.csv</b> - Same as mutIDinfo16.csv (Codon 2, lib16) but filtered to only contain perfect DHFR homolog sequences with at least 5 barcodes after QC filters (mutations == 0 &amp; numprunedBCs &gt; 4)<br><b>15-HiFi-nATG_wTAA.fasta</b> - The DNA sequences of the designed DHFR homologs in the Codon 1 library (lib15), excluding the start codon (ATG), but including the stop codon (TAA).<br><b>16-HiFi-nATG_wTAA.fasta</b> - The DNA sequences of the designed DHFR homologs in the Codon 2 library (lib16), excluding the start codon (ATG), but including the stop codon (TAA).<b>DHFR.proteins</b> - The protein sequences of the designed DHFR homologs in both libraries, starting with M.<br><br>----------------------------------------------<br><br>The RMD analysis scripts are available here:<br>https://github.com/PlesaLab/DHFR<br><b>RPubs:</b> Rendered Code<b>NCBI BioProject:</b> PRJNA1189478 for raw .fastq files<br><br>FigShare Repository: DHFR.zip for mapping and count files used in RMD analysis<br><br><br><br><br>
提供机构:
figshare
创建时间:
2025-10-28
二维码
社区交流群
二维码
科研交流群
商业服务