five

AmelHap pilot: filter1 data

收藏
Mendeley Data2024-05-10 更新2024-06-27 收录
下载链接:
https://zenodo.org/records/6563399
下载链接
链接失效反馈
资源简介:
Honey bee Apis mellifera drones are typically haploid, developing from an unfertilized egg, inheriting only their queen’s alleles and none from the many drones she mated with. Being haploid, the ordered combination or ‘phase’ of alleles is known, making drones a valuable haplotype resource. We collated whole genome sequence data for 688 drones, including 45 newly sequenced Scottish drones, which collectively represent 13 countries, 7 subspecies and various hybrids strains. After alignment to the reference assembly Amel_Hav3.1, and haploid variant calling, we identified 18.9M variants. Whole-genome sequencing data underpinning the dataset is available from the European Nucleotide Archive (ENA), https://www.ebi.ac.uk/ena, with the project accession codes: PRJEB16533, PRJNA311274, PRJNA363032, PRJNA516678, PRJNA544324, and PRJEB39369. Sequencing reads were aligned to the Amel_HAv3.1 reference genome using BWA-MEM v0.7.17. Reads were sorted with SAMtools v1.9 and duplicates marked (MarkDuplicates) with GATK v4.0.11.0. Variants for each sample were called using GATK’s HaplotypeCaller with the following non-default parameters --ERC GVCF, --sample-ploidy 1 and -A AlleleFraction. Joint variant calling was performed across all samples using GATK’s GenomicDBImport and GenotypeGVCFs with --sample-ploidy 1 and a window size of 2.5 Mb. This dataset is the result of applying filters to exclude variants with 'QD<20 || QD>40 || MQ < 50 || SOR >3' in the raw dataset, leaving 16.6M variants. The code used in filtering is outlined here: https://bitbucket.org/scriptBee/hapmap-pilot.
创建时间:
2023-06-28
用户留言
有没有相关的论文或文献参考?
这个数据集是基于什么背景创建的?
数据集的作者是谁?
能帮我联系到这个数据集的作者吗?
这个数据集如何下载?
点击留言
数据主题
具身智能
数据集  4099个
机构  8个
大模型
数据集  439个
机构  10个
无人机
数据集  37个
机构  6个
指令微调
数据集  36个
机构  6个
蛋白质结构
数据集  50个
机构  8个
空间智能
数据集  21个
机构  5个
5,000+
优质数据集
54 个
任务类型
进入经典数据集
热门数据集

MNIST

The MNIST database (Modified National Institute of Standards and Technology database) is a large collection of handwritten digits. It has a training set of 60,000 examples, and a test set of 10,000 examples. It is a subset of a larger NIST Special Database 3 (digits written by employees of the United States Census Bureau) and Special Database 1 (digits written by high school students) which contain monochrome images of handwritten digits. The digits have been size-normalized and centered in a fixed-size image. The original black and white (bilevel) images from NIST were size normalized to fit in a 20x20 pixel box while preserving their aspect ratio. The resulting images contain grey levels as a result of the anti-aliasing technique used by the normalization algorithm. the images were centered in a 28x28 image by computing the center of mass of the pixels, and translating the image so as to position this point at the center of the 28x28 field.

Papers with Code 收录

UIEB, U45, LSUI

本仓库提供了水下图像增强方法和数据集的实现,包括UIEB、U45和LSUI等数据集,用于支持水下图像增强的研究和开发。

github 收录

MedDialog

MedDialog数据集(中文)包含了医生和患者之间的对话(中文)。它有110万个对话和400万个话语。数据还在不断增长,会有更多的对话加入。原始对话来自好大夫网。

github 收录

Allen Brain Atlas

Allen Brain Atlas 是一个综合性的脑图谱数据库,提供了详细的大脑解剖结构、基因表达数据、神经元连接信息等。该数据集包括了小鼠、人类和其他模式生物的大脑数据,旨在帮助研究人员理解大脑的结构和功能。

portal.brain-map.org 收录

THCHS-30

“THCHS30是由清华大学语音与语言技术中心(CSLT)发布的开放式汉语语音数据库。原始录音是2002年在清华大学国家重点实验室的朱晓燕教授的指导下,由王东完成的。清华大学计算机科学系智能与系统,原名“TCMSD”,意思是“清华连续普通话语音数据库”,时隔13年出版,由王东博士发起,并得到了教授的支持。朱小燕。我们希望为语音识别领域的新研究人员提供一个玩具数据库。因此,该数据库对学术用户完全免费。整个软件包包含建立中文语音识别所需的全套语音和语言资源系统。”

OpenDataLab 收录