five

Worms UCR Archive Dataset

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/11198401
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset is part of the UCR Archive maintained by University of Southampton researchers. Please cite a relevant or the latest full archive release if you use the datasets. See http://www.timeseriesclassification.com/. Caenorhabditis elegans is a roundworm commonly used as a model organism in the study of genetics. The movement of these worms is known to be a useful indicator for understanding behavioural genetics. Brown {\em et al.} "A dictionary of behavioral motifs reveals clusters of genes affecting Caenorhabditis elegans locomotion" describe a system for recording the motion of worms on an agar plate and measuring a range of human-defined features. It has been shown that the space of shapes Caenorhabditis elegans adopts on an agar plate can be represented by combinations of four base shapes, or eigenworms. Once the worm outline is extracted, each frame of worm motion can be captured by four scalars representing the amplitudes along each dimension when the shape is projected onto the four eigenworms. The data relates to 258 traces of worms converted into four "eigenworm" series. The eigenworm data are lengths from 17984 to 100674 (sampled at 30 Hz, so from 10 minutes to 1 hour) and in four dimensions (eigwnworm 1 to 4). There are five classes:N2,goa-1,unc-1,unc-38 and un63. N2 is wildtype (i.e. normal) the other 4 are mutant strains. These datasets are the first dimension only (first eigenworm) The problems Worms.arff and WormsTwoClass.arff are series of first eigenworm1 averaged down so that all series are lengths 900 (the single hour long series is discarded). This smoothing is likely to discard discriminatory information. The Yemini features obtains nearly 100\% accuracy, although we have not independently verified this.  we address the problem of classifying individual worms as wild-type or mutant based on the time series of the first eigenworm, down-sampled to second-long intervals. We have 257 cases, which we split 70\%/30\% into a train and test set. Each series has 900 observations, and each worm is classified as either wild-type (the N2 reference strain - 109 cases) or one of four mutant types: goa-1 (44 cases); unc-1 (35 cases); unc-38 (45 cases) and unc-63 (25 cases). The data were extracted from the {\em C. elegans} behavioural database~\cite{wormWeb}. The formatted classification problems are available from the website associated with this paper~\cite{tscWeb}. Donator: A. Bagnall

本数据集隶属于由南安普敦大学研究者维护的UCR档案库(UCR Archive)。若您使用该数据集,请引用相关或最新的完整档案库版本,详情请见http://www.timeseriesclassification.com/。 秀丽隐杆线虫(Caenorhabditis elegans)是一种常用于遗传学研究的模式生物圆虫。这类线虫的运动行为是解析行为遗传学的有效指标。Brown等人在其论文《行为基元词典揭示影响秀丽隐杆线虫运动的基因簇》("A dictionary of behavioral motifs reveals clusters of genes affecting Caenorhabditis elegans locomotion")中介绍了一套用于记录琼脂平板上线虫运动并测量一系列人工定义特征的系统。研究表明,秀丽隐杆线虫在琼脂平板上的姿态空间可通过四种基础姿态(或称本征蠕虫(eigenworms))的组合来表征。提取线虫轮廓后,每帧线虫运动姿态均可通过四个标量值进行捕捉,这些标量代表姿态投影至四种本征蠕虫维度上的振幅。 本数据集包含258条线虫运动轨迹,每条轨迹均转换为四条“本征蠕虫”序列。本征蠕虫数据的长度介于17984至100674之间(采样率为30 Hz,对应时长10分钟至1小时),共包含四个维度(本征蠕虫1至4)。数据集共分为五类:N2、goa-1、unc-1、unc-38与unc-63。其中N2为野生型(即正常型),其余四类为突变菌株。本次提供的数据集仅包含第一维度(即第一本征蠕虫)的数据。 Worms.arff与WormsTwoClass.arff这两个分类问题文件,是将第一本征蠕虫序列进行平均下采样至统一长度900后得到的(时长为1小时的序列已被剔除)。这种平滑处理可能会丢失部分具有区分度的信息。Yemini提出的特征提取方法可实现近100%的分类准确率,但我们未对该结果进行独立验证。 本研究旨在基于下采样至每秒采样一次的第一本征蠕虫时间序列,将单个线虫分类为野生型或突变型。本次数据集共包含257个样本,我们将其以70%与30%的比例划分为训练集与测试集。每条序列包含900个观测值,每条线虫被分为两类:野生型(N2参考菌株,共109个样本),以及四种突变型:goa-1(44个样本)、unc-1(35个样本)、unc-38(45个样本)与unc-63(25个样本)。本数据集提取自《秀丽隐杆线虫行为数据库》(*C. elegans* behavioural database)[引用文献:wormWeb]。格式化后的分类问题文件可从本论文关联网站获取[引用文献:tscWeb]。 捐赠者:A. 巴格纳尔(A. Bagnall)
创建时间:
2024-06-25
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作