Source code and processed data for "Dual Randomly Barcoded Transposon Sequencing (Dual Tn-seq) Profiles the Genetic Interaction Landscape in Bacteria"

Name: Source code and processed data for "Dual Randomly Barcoded Transposon Sequencing (Dual Tn-seq) Profiles the Genetic Interaction Landscape in Bacteria"
Creator: figshare
Published: 2025-07-01 16:40:55
License: 暂无描述

DataCite Commons2025-07-01 更新2025-09-08 收录

下载链接：

https://figshare.com/articles/dataset/Source_code_and_processed_data_for_Dual_Randomly_Barcoded_Transposon_Sequencing_Dual_Tn-seq_Profiles_the_Genetic_Interaction_Landscape_in_Bacteria_/29382974

下载链接

链接失效反馈

官方服务：

资源简介：

Gene redundancy often complicates systematic approaches to characterizing gene functions because single gene deletions may not produce discernible phenotypes. Thus, despite the advent of next-generation sequencing, over 11 million microbial genes in the NCBI reference sequence database have no known function. In this study, we report Dual Tn-seq, a novel platform for comprehensively assaying the fitness of a large pool of double mutants in parallel. Dual Tn-seq couples random barcode transposon-site sequencing (RB Tn-seq) with the Cre-lox system, enabling deep sampling of ~1.4 billion double mutants in the human pathogen Streptococcus pneumoniae.feba.tar.gz contains an archived copy of the source code tree for RB-TnSeq (https://bitbucket.org/berkeleylab/feba). This includes the key scripts for analyzing RB-TnSeq data, written in perl or R:bin/MapTnSeq.pl to analyze a fastq file of TnSeq data with barcodesbin/DesignRandomPool.pl to convert the output of MapTnSeq.pl into a table of barcodes that consistently map to a single insertion location (a "pool" definition)binMultiCodes.pl to count barcodes from a fastq filebin/combineBarSeq.pl to combine tables of barcode counts from MultiCodes.pl with the pool definition from DesignRandomPool.pllib/FEBA.R contains R functions to compute gene fitness values from a table of insertions and counts per sample (from combineBarSeq.pl)Essentiality.pl counts insertions per gene or region in the TnSeq datacomb.R contains the Essentials() function to identify genes that are important for viability (roughly speaking, essential)as well as dual-TnSeq specific scripts:bin/barcodePairs.pl counts pairs of barcodes in a fastq filebin/byGenePairs.pl counts the number of strains and the number of reads for each pair of genes, based on the input from bpFilter.pl (see below)small.zip contains additional scripts:bpFilter.pl filters the output of barcodePairs.pl to include only insertions in genes, and adds information about where the insertions are located. (This uses the library pbutils.pm, also included.)dblStats.R contains functions to combine multiple dual-TnSeq runs, compute summary statistics per pairs of genes, and adjust for chromosomal bias. It also includes functions for one-versus-all experiments (see below).small.zip also includes various tables of processed data:The table of adjusted statistics per pair of genes, when considering insertions in the central 10-90% of each gene, is in genepair_stats.tsv.gztable_S1_dual_tnseq_dataset_full.xlsx has scores for pairs of genes when considering either the central 10-90% of each gene or all insertions within each gene.The genome and annotation of Streptococcus pneumoniae D39: the genome sequence is in genome.fna. genes.tab and genes.GC are tables of genes. aaseq has the protein sequences.The RB-TnSeq mappings for ML1, ML2, and ML3 are in ML1comb, ML2comb, and ML3comb. The *.withgenes files have additional fields for which gene (locusId) the insertion lies within (if any) and what fraction of the way through the gene the insertion lies (the f field). See the *.stats files for library metrics.For unique protein-coding genes (of sufficient length), a prediction as to whether they are essential or not is in esstable.Models for mapping the libraries are in modelRBlox.txt – the first line has the expected structure of the read, and the second has the continuation of the read if it is from intact vector, instead of from an insertion in the genome. Each library was mapped 5-6 times and for each library, 2 of the runs used the “TnSeq3” protocol. For those mapping files, see modelTnSeq3RBlox*.For each big Dual-TnSeq run, the results are available as a table of all pairs of mapped insertions (runfiltered.tsv.gz, with rare potentially chimeric pairs included) or a table of #strains and #reads per pair of genes (run_genepairs_min.tsv.gz).The results of the chimera test for Dual-TnSeq are in test*_pairs.tsvFinally, for comparison, we also include "one-versus-all" RB-TnSeq data where one of the libraries of transposon mutants was transferred into a mutant background where an individual gene was replaced by an antibiotic resistance marker. This data is provided as an R image, 1vsall.image (from R 4.4). Tables in this image:genes -- all of the predicted genes (type = 1 if protein-coding)ML2 -- mutant library 2, including gene information. (f is the relative location in the gene, relative to the begin and end of the gene, where begin < end and the gene's strand is ignored)ML3 -- mutant library 3ML2fit -- a list of data frames, each entry being the RB-TnSeq data from that mutant background (from oneVsAllFit() in dblStats.R)ML3fit -- similarly for the ML3 experimentsML2hits -- a combined data frame of potentially significant effects in the ML2 data; includes the the usual metrics from RB-TnSeq data (see GeneFitness() in FEBA.R) as well as the identity of the deleted gene in the background strain and fitMedian, which uses the median of all of the 1-vs-all experiments for this library as an alternative control.ML3hits -- similarly for ML3weak -- combined weak hits from ML2hits and ML3hits<br>

提供机构：

figshare

创建时间：

2025-07-01

5,000+

优质数据集

54 个

任务类型

进入经典数据集