Optimization of microhaplotypes for advanced DNA mixture deconvolution

NIAID Data Ecosystem2026-05-10 收录

下载链接：

http://datadryad.org/dataset/doi%253A10.5061%252Fdryad.k98sf7mmt

下载链接

链接失效反馈

官方服务：

资源简介：

Detection of minor DNA components in biological mixtures has increased as molecular techniques have become more sensitive. Accordingly, mixture deconvolution has become a major concern and topic of debate in the forensic DNA community. Short tandem repeat (STR) profile data generated with capillary electrophoresis and massively parallel sequencing (MPS) are subject to inherent issues that complicate mixture deconvolution. Deconvolution may be improved by sequencing microhaplotypes as they are not subject to the amplification noise artifacts and stochastic effects that impact STRs. Before microhaplotypes can be implemented in casework, the following considerations should be addressed: definition of a consistent panel of microhaplotype loci; increased population studies to determine relevant haplotype allele frequencies; incorporation of advanced sequencing technologies into forensic laboratories; development of user-friendly bioinformatic analysis and mixture deconvolution methods; and assessment of the infrastructure requirements necessary to build a searchable microhaplotype criminal database. In two phases, this study will optimize and assess an MPS workflow and analysis package for improved mixture deconvolution using microhaplotypes. Analysis will be performed with NexGenID, a novel software platform optimized for mixture deconvolution and probabilistic genotyping of sequence data. Phase I objectives will include evaluation and down-selection of microhaplotype loci optimal for individualization and mixture deconvolution; construction of wet-bench target assay; haplotyping of donor samples to obtain expanded population allele frequency data; and assessment of the projected performance of the microhaplotype allele calling analysis workflow. Phase II objectives will include evaluating the benefits and limitations of mixture deconvolution and probabilistic genotyping using the microhaplotype wet-bench assay with Illumina sequencing and NexGenID analysis by applying the workflow to in vitro mixtures and constructed mock evidence and also comparing outcomes from NexGenID to analysis of microhaplotype mixtures using a retrofitted version of EuroForMix. By coupling a highly discriminatory microhaplotype MPS assay with NexGenID, microhaplotype analysis can be efficiently implemented by practitioners. The proposed microhaplotype workflow has the potential to exceed minor-contributor detection when compared to STR deconvolution, help solve complex cases, increase the number of samples considered suitable for comparison, and enable retesting of cold cases where a minor contributor was assumed present but was not suitable for comparison. Methods An AmpliSeq for Illumina assay was developed to target amplify and sequence 43 forensically relevant microhaplotype loci, specifically selected for their potential application to complex mixture deconvoution. Target loci were selected from previously published work compiled in the MicroHapDB database [https://microhapdb.readthedocs.io/en/latest/]. A set of 240 test samples was identified from a donor collection of nearly 500 individuals previously collected under IRB and housed at GWU. These donors represent eleven biogeographical populations. Additional donors were identified from purchased blood bank samples previously obtained for internal validation projects. For sensitivity testing of the final assay, 2800M Control DNA (Promega Corp, Madison, WI), NA24385 (NIST RM8391), and two purchased blood samples were serial diluted to test inputs of 2 ng, 1 ng, 0.5 ng, 0.1 ng, 0.05 ng, and 0.025 ng. Each dilution was evaluated in triplicate libraries. A set of 149 complex mixtures were constructed in vitro from aliquots of the population donor samples described above. Mixtures contained between 2 and 5 contributors at disparate contribution proportions, and total DNA amounts of 0.5 ng, 1 ng, or 5 ng. Mixtures were constructed to meet the goals of the following four categories: Category 1 - minor contributor detection limits down to 1%; Category 2 - Estimating correct number of contributors when first degree relative pairs are present in the mixture; Category 3 - improvement to genotype separation over STRs when donors share STR alleles in stutter positions; and Category 4 - presence of donors with imbalanced degradation patterns induced by UV exposure. All constructed mixtures were first processed with STR-CE analysis as follows: amplification of a 1 ng input with Globalfiler full volume reactions, capillary electrophoresis fragment separation on Applied Biosystems 3500 xl Genetic Analyzer, and data analysis with GeneMapper IDX following internally validated SOPs. STRMix v2.9 was used to deconvolve and interpret STR-CE mixtures following internally validated SOPs. To construct the mock touch evidence mixtures, a total of nine participants were identified. Each provided informed consent, and a buccal sample was collected from each to obtain their reference genotype. Trace DNA samples containing 3–5 contributors were made in triplicate by having donors handle items relevant to gun crimes, including: handgun frames, handgun magazines, rifle bolts, and brass 9 mm round cartridge cases. All substrates were decontaminated prior to handling via UV decontamination and/or bleach cleaning. Donors were instructed to wait two hours after washing their hands with warm water before handling the items with their dominant hand for 20 – 60 seconds. After handling, bullet samples were loaded individually into a gun chamber and fired. Fired cartridge casings were collected and individually packaged prior to extraction. Firearm substrate (non-casing) trace DNA samples were collected with wet/dry nylon flocked swabbing. DNA extraction was performed using the Qiagen EZ1&2 DNA Investigator kit in 500 µl Large Volume reactions following internally validated SOP. The rinse-and-swab collection method was performed on fired cartridge casings according to Bille et al (2020, https://doi.org/10.1016/j.fsigen.2020.102238). Cartridge casing samples collected via the rinse-and-swab method were extracted using the modified QIAamp DNA Investigator Kit method described by Bille et al (2020). DNA extracts were concentrated with Microcon DNA Fast flow filter units (MilliporeSigma, Burlington, MA) prior to quantification by Quantifiler Trio DNA Quantification Kit in 11 µl reaction volumes to assess DNA concentration, DNA degradation, and inhibition related to the various substrates. Donor references were amplified with 1 ng DNA. All recovered DNA from mock evidence mixture samples was targeted for amplification and library preparation with the AmpliSeq microhap assay as described below. In addition to touch evidence samples, a set of inhibited mixture samples were constructed. A 1 ng aliquot of NIST RGTM S8 3-persom mixture was combined with humic acid at concentrations of 50 ng, 150 ng, and 250 ng to examine amplification with the AmpliSeq reaction buffer in the presence of an inhibitor. Then, all reference samples, sensitivity samples, constructed mixtures, and mock evidence samples were amplified using the custom microhaplotype AmpliSeq primer mix and AmpliSeq Library PLUS for Illumina prep kit following the manufacturer’s recommendations for Ampliseq for Illumina Custom Panels with one primer pool. First, DNA samples were target amplified in 20 µl reactions with amplification parameters of: 99 ˚C for 2 minutes, 23 cycles of 99 ˚C for 15 seconds and 60 ˚C for 4 minutes, and a final hold at 10 ˚C for up to 24 hours. Amplicons were then partially digested with 2 µl of FuPa Reagent on a thermal cycler as follows: 10 minutes at 50˚C, 10 minutes at 55˚C, and 20 minutes at 62˚C. Next, AmpliSeq CD Index i7 and i5 adapters were ligated to the partially digested amplicon as follows: 30 minutes at 22 ˚C, 5 minutes at 68 ˚C, and 5 minutes at 72 ˚C. After a second library amplification: 98 ˚C for 2 minutes, 7 cycles of 98 ˚C for 15 seconds and 64 ˚C for minute, and a final hold at 10 ˚C for up to 24 hours, libraries were purified with AMpure XP. Finally, libraries were quantified with the Qubit dsDNA HS assay on the Qubit 4 fluorometer and sized using the Agilent TapeStation 4120 and D1000 ScreenTapes (Agilent Technologies, Santa Clara, CA). For Illumina sequencing, libraries were diluted to 4 nM and pooled in equimolar proportions. All pools were diluted to a loading concentration of 9 pM with a 2% PhiX sequencing control, per manufacturer’s recommendations. Cluster generation and 2x300 paired-end sequencing were performed on the MiSeq FGx system using MiSeq v3 (600-cycle) reagents. Libraries were pooled in groups of no more than 40 to ensure adequate depth of coverage for every donor allele in the sample library. Genotyping of all donor references and mixtures samples was performed in one of two ways: 1)Genotype analysis of the population samples was first performed as follows: mapping of sequence data to hg38 Canonical reference was performed with bwa mem and executed in Galaxy (usegalaxy.org). The resultant .bam files were further processed for microhap genotype calling using mh.jar, a JAVA-based application previously developed in collaboration with ThermoFisher and adapted for the current microhaplotype assay. 2)NexGenID (NexGen Forensic Sciences, Columbia, MD), was used to perform haplotype determination from raw .fastq files from all mixture and mock evidence samples as follows: cluster amplicon sequences based on locus, perform a local alignment, and identify unique alleles based on identical sequence. Analytical and stochastic thresholds are applied for identification of unique alleles above noise reads. Both thresholds are sample-specific, driven by input DNA quantity that dictates how many templates were initially added for amplification. Haplotype frequencies were calculated based on phased SNP genotypes obtained from 1000 Genomes Phase 3 sequence data in the UCSC Genome Browser ([http://genome.ucsc.edu/]) as well as the additional 240 single-source donor samples, for a total of 35 populations evaluated. Ae values were calculated using the following formula: Ae=1/Σpi^2, where pi = frequency of alleles, and Informativeness (In) for measuring allele frequency differences among populations was calculated according to Rosenberg et al (2003; DOI:10.1086/380416). Mixture deconvolution was performed using the unique probabilistic genotyping methods of NexGenID and EuroForMix v4.2.5 ([https://www.euroformix.com/]). Note, genotyping output from NexGenID was converted to EuroForMix-compatible format. Finally, likelihood ratios were calculated by both software packages under the specified hypothesis: Hp: person of interest included; Hd: all contributors unknown. Each known contributor to a given mixture was evaluated as the POI. The output included quantitative likelihood ratios (logLR) for weight-of-evidence reporting. Additional comparative statistical analyses were performed in JMP® v18.0.1 statistical discovery software. Usage notes: Allele frequencies, data quality metrics, and mixture deconvolution LogLRs are compiled in .csv tables. Sensitivity, in vitro mixture, and mock evidence sample results are provided in separate files. Neither SNP genotypes nor sequence data are not provided to maintain donor privacy.

创建时间：

2026-01-22

5,000+

优质数据集

54 个

任务类型

进入经典数据集