five

Chaos Game Representation (CGR images ) of SARS-CoV-2 Variants (Alpha,Beta, Delta, Gamma and Omicron)

收藏
NIAID Data Ecosystem2026-03-14 收录
下载链接:
https://data.mendeley.com/datasets/2x546shhwk
下载链接
链接失效反馈
官方服务:
资源简介:
Currently available genome sequence classification methods are based on text or sequence alignment techniques. Our aim is to build an image-based genome sequence classifier using deep learning technique. In 1990 H J Jeffry proposed a method Chaos Game Representation (CGR), which converts long one-dimensional sequences into two-dimensional images. This dataset contains the CGR images of genomic sequences of SARS-CoV-2 Variants - alpha, beta, delta, gamma, and omicron. The dataset is divided into three folders named train, test, and validate. Each folder contains five subfolders named alpha, beta, delta, gamma, and omicron. The "train" folder has a total of 17500 images - 3500 images in each subfolder. The "test" folder has 5000 images - 1000 from each category. The "validate" folder has 2500 images - 500 images from each individual class. Genomic sequences of the above-mentioned SARS- CoV-2 variants were downloaded from the GISAID database and the sequences were then converted to CGR images using a python script.

当前主流的基因组序列分类方法多基于文本或序列比对技术。本研究旨在依托深度学习技术构建基于图像的基因组序列分类器。1990年,H J Jeffry提出混沌游戏表示法(Chaos Game Representation, CGR),该方法可将长一维基因组序列转换为二维图像。本数据集涵盖阿尔法、贝塔、德尔塔、伽马、奥密克戎五种SARS-CoV-2变异株的基因组序列对应的CGR图像。该数据集按训练集(train)、测试集(test)与验证集(validate)划分为三个文件夹,每个文件夹下均设有五个子文件夹,分别对应上述五种变异株。其中训练集文件夹总计包含17500张图像,每类子文件夹各含3500张;测试集文件夹包含5000张图像,每类各1000张;验证集文件夹包含2500张图像,每类各500张。上述五种SARS-CoV-2变异株的基因组序列均从GISAID数据库下载,随后通过Python脚本转换为CGR图像。
创建时间:
2022-12-12
二维码
社区交流群
二维码
科研交流群
商业服务