five

Pf7: an open dataset of Plasmodium falciparum genome variation in 20,000 worldwide samples|疟疾研究数据集|基因组变异数据集

收藏
DataCite Commons2022-12-12 更新2024-07-29 收录
疟疾研究
基因组变异
下载链接:
https://figshare.com/articles/dataset/Pf7_an_open_dataset_of_Plasmodium_falciparum_genome_variation_in_20_000_worldwide_samples/21674321/1
下载链接
链接失效反馈
资源简介:
Data correct at time of upload (5 December 2022). Data maintained at https://www.malariagen.net/resource/34. This Figshare project provides information about the Pf7 dataset which contains genome variation data on over 20,000 worldwide samples of Plasmodium falciparum. The associated publication will be available from the above link once published. You can browse summary data using the Pf7 data exploration tool. <br> <strong>Background and previous releases</strong> This dataset is based on genome variation from the MalariaGEN network, including samples which were previously released through the Pf3k Project, Plasmodium falciparum Community Project and GenRe Mekong Project. It comprises multiple partner studies, each with its own research objectives and led by a local investigator. Genome sequencing is performed centrally, and partner studies are free to analyse and publish the genetic data produced on their own samples, in line with MalariaGEN’s guiding principles on equitable data sharing. This new open dataset is almost three times larger than the last dataset release (Pf6, published 2021), and includes samples from a wider geographic reach. The variants and genotypes described in this publication used version 3 of the analysis pipeline. Data produced using an earlier version of the data analysis pipeline can be explored using an interactive web application. <br> <strong>About the version 7 data pipeline</strong> Details of the methods can be found in the accompanying paper. <br> <strong>Content of the data release</strong> This release contains details on contributing partner studies, sample metadata and key sample attributes inferred from genomic data, and genomic data including raw sequence reads. Further details and analytical results can be found in the accompanying data release paper These data are available open access. Publications using these data should acknowledge and cite the source of the data using the following format: "This publication uses MalariaGEN data as described in ‘Pf7: an open dataset of Plasmodium falciparum genome variation in 20,000 worldwide samples' MalariaGEN et al, (doi to be added on publication). Study information: Details of the 82 contributing partner studies, including description, contact information and key people. Sample provenance and sequencing metadata: sample information including partner study information, location and year of collection, ENA accession numbers, and QC information for 20,864 samples from 33 countries. Measure of complexity of infections: characterisation of within-host diversity (FWS) for 16,203 QC pass samples. Drug resistance marker genotypes: genotypes at known markers of drug resistance for 16,203 samples, containing amino acid and copy number genotypes at six loci: crt, dhfr, dhps, mdr1, kelch13, plasmepsin 2-3. Inferred resistance status classification: classification of 16,203 QC pass samples into different types of resistance to 10 drugs or combinations of drugs and to RDT detection: chloroquine, pyrimethamine, sulfadoxine, mefloquine, artemisinin, piperaquine, sulfadoxine- pyrimethamine for treatment of uncomplicated malaria, sulfadoxine- pyrimethamine for intermittent preventive treatment in pregnancy, artesunate-mefloquine, dihydroartemisinin-piperaquine, hrp2 and hrp3 gene deletions. Drug resistance markers to inferred resistance status: details of the heuristics utilised to map genetic markers to resistance status classification. Genetic distances: Genetic distance matrix comparing all 20,864 samples. CRT haplotypes: Full crt gene haplotypes for 16,203 QC pass samples CSP C-terminal haplotypes:Full csp C-terminal haplotypes for 16,203 QC pass samples plus 6 lab strains. EBA175 calls: eba175 allelic type calls for 16,203 QC pass samples. Reference genome: the version of the 3D7 reference genome fasta file used for mapping. Annotation file: the version of the 3D7 reference annotation gff file used for genome annotations.
提供机构:
figshare
创建时间:
2022-12-08
用户留言
有没有相关的论文或文献参考?
这个数据集是基于什么背景创建的?
数据集的作者是谁?
能帮我联系到这个数据集的作者吗?
这个数据集如何下载?
点击留言
数据主题
具身智能
数据集  4098个
机构  8个
大模型
数据集  439个
机构  10个
无人机
数据集  37个
机构  6个
指令微调
数据集  36个
机构  6个
蛋白质结构
数据集  50个
机构  8个
空间智能
数据集  21个
机构  5个
5,000+
优质数据集
54 个
任务类型
进入经典数据集
热门数据集

LFW

人脸数据集;LFW数据集共有13233张人脸图像,每张图像均给出对应的人名,共有5749人,且绝大部分人仅有一张图片。每张图片的尺寸为250X250,绝大部分为彩色图像,但也存在少许黑白人脸图片。 URL: http://vis-www.cs.umass.edu/lfw/index.html#download

AI_Studio 收录

poi

本项目收集国内POI兴趣点,当前版本数据来自于openstreetmap。

github 收录

多个球状星团的光谱和测光数据集

该数据集是多个球状星团的光谱和测光综合数据集,由意大利国家天体物理学院-帕多瓦天体物理观测站等研究机构的研究人员整理。数据集包含了38个球状星团的恒星在14种化学元素上的丰度信息,包括锂、碳、氮、氧、钠、镁、铝、硅、钾、钙、钛、铁、镍和钡。这些数据来源于多个光谱测量项目,如Apache Point Observatory Galactic Evolution Experiment (APOGEE)、Gaia-ESO Survey (GES)和Galactic Archaeology with HERMES (GALAH)。数据集的目的是研究球状星团中不同恒星星族的化学组成,以揭示其形成和演化的机制。

arXiv 收录

Subway Dataset

该数据集包含了全球多个城市的地铁系统数据,包括车站信息、线路图、列车时刻表、乘客流量等。数据集旨在帮助研究人员和开发者分析和模拟城市交通系统,优化地铁运营和乘客体验。

www.kaggle.com 收录

MedDialog

MedDialog数据集(中文)包含了医生和患者之间的对话(中文)。它有110万个对话和400万个话语。数据还在不断增长,会有更多的对话加入。原始对话来自好大夫网。

github 收录