Deep Mutational Scanning of HIV tat and rev in a non-overlapped context
收藏NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://www.ncbi.nlm.nih.gov/sra/ERP018159
下载链接
链接失效反馈官方服务:
资源简介:
"This dataset provides allele counts and raw fastqs for deep mutational scanning of the HIV-1 genes tat and rev when not-overlapped with one another (placed in the nef locus) as described in Fernandes et al. \""Functional segregation of overlapping genes in HIV\"" Cell 2016 (in revision). Preselection (input) and post selection (replicate 1/2) files for every possible point mutant of these two HIV proteins from the NL4-3 background are given.Tab delimited files including codon counts across the amplicons are also included and are probably the most useful thing to most researchers. The data here was used to generate Figures 3 and 4 and 7 and might be of general use for people interested in deep mutational scanning, looking for signatures of epistasis in rev or tat, or reanalyzing and mining the data. FAQ: Why do the ends of each amplicon have such variation? In order to increase diversity across the flowcell, I pooled standard primers with N, NN, and NNN extensions to throw amplicons out of phase. When aligning you should trim the ends or ignore them. This means that the overlap between PE's can vary by 3 nt. Why are the filenames not easy to deal with? The filenames are tied to separate MiSeq runs. I hope to clean up the nomenclature and update this entry in the future while preserving the run information. You can get a sense of that as different residues will vary in Q-score, and that is mostly tied to the run they were pooled on and not any interesting biology. While this is makes it a little harder to follow, I think it's good to get a sense that doing this kind of analysis in high-throughput fashion leads to a reasonable amount of failure (i.e. RNA isolation, RT, fail) that led to repetition until we had good data for every position. Can you help me deal with this dataset? Yes. Please email me at jferna10@ucsc.edu, or contact me on twitter @jdf_ev. For reagents please contact Alan Frankel at frankel@cgl.ucsf.edu. Do you have the analysis scripts you used to process the data? Yes they are on github. https://github.com/nbstrauli/allele_frequency_trajectory_sim"
本数据集提供了HIV-1基因tat与rev在非重叠状态下(置于nef基因座中)的深度突变扫描所需的等位基因计数与原始FASTQ文件,相关实验设计参见Fernandes等人2016年发表于《Cell》(处于修订阶段)的论文"Functional segregation of overlapping genes in HIV"。
本数据集包含了来自NL4-3毒株背景下这两种HIV蛋白所有可能的单点突变体的预选择(输入组)与后选择(重复1/2组)文件。
此外还提供了包含扩增子密码子计数的制表符分隔文件,这类文件或许是大多数研究者最实用的数据集内容。
本数据集的数据曾用于生成图3、图4与图7,对于关注深度突变扫描、探寻rev或tat基因上位性特征,或希望对该数据集进行二次分析与数据挖掘的研究人员而言,具备通用参考价值。
常见问题:
1. 为何每个扩增子的末端存在显著变异?
为提升测序流动槽(flowcell)的序列多样性,我们将标准引物与带有N、NN及NNN延伸序列的引物混合,使扩增子产生相位偏移。序列比对时可对末端进行修剪或忽略。这意味着双端测序(PE)的重叠区域长度可存在3个核苷酸的差异。
2. 为何文件名不易处理?
文件名与各独立的MiSeq测序运行批次绑定。未来我计划清理命名规范并更新本数据集条目,同时保留原始运行信息。不同残基的质量值(Q-score)存在差异,这主要与其所属的测序混合批次相关,而非来自有生物学意义的差异。尽管这会增加数据分析的复杂度,但我认为保留这些信息有助于体现:采用高通量方式开展此类分析时,会出现一定比例的实验失败(如RNA提取、逆转录实验失败等),因此需要重复实验以确保每个位点都获得可靠数据。
3. 是否可协助处理本数据集?
可以。请通过邮箱jferna10@ucsc.edu联系我,或在Twitter上@jdf_ev。若需相关试剂,请联系Alan Frankel,邮箱为frankel@cgl.ucsf.edu。
4. 是否有用于处理本数据集的分析脚本?
有,相关脚本已上传至GitHub:https://github.com/nbstrauli/allele_frequency_trajectory_sim
创建时间:
2023-10-13



