five

Wrist-mounted IMU data towards the investigation of free-living smoking behavior - the Smoking Event Detection (SED) and Free-living Smoking Event Detection (SED-FL) datasets

收藏
NIAID Data Ecosystem2026-03-12 收录
下载链接:
https://zenodo.org/record/4507450
下载链接
链接失效反馈
官方服务:
资源简介:
Introduction The Smoking Event Detection (SED) and the Free-living Smoking Event Detection (SED-FL) datasets were created by the Multimedia Understanding Group towards the investigation of smoking behavior, both while smoking and in-the-wild. Both datasets contain the triaxial acceleration and orientation velocity signals ( DoF) that originate from a commercial smartwatch (Mobvoi TicWatch E™). The SED dataset consists of \(20\) smoking sessions provided by \(11\) unique subjects, while the SED-FL dataset contains \(10\) all-day recordings provided by \(7\) unique subjects. In addition, the start and end moments of each puff cycle are annotated throughout the SED dataset. Description  SED A total of \(11\) subjects were recorded while smoking a cigarette at interior or exterior areas. The total duration of the \(20\) sessions sums up to \(161\) minutes, with a mean duration of \(8.08\) minutes. Each participant was free to smoke naturally, with the only limitation being to not swap the cigarette between hands during the smoking session. Prior to the recording, the participant was asked to wear the smartwatch to the hand that he typically uses in his everyday life to smoke. A camera was already set facing the participant, including at least the whole length of the arms in its field of view. The purpose of video recording was to obtain ground truth information for each of the puff cycles that occur during the smoking session. Participants were also asked to perform a clapping hand movement both at the start and end of the meal, for synchronization purposes (as this movement is distinctive in the accelerometer signal). No other instructions were given to the participants. It should be noted that the SED dataset does not contain instances of electronic cigarettes (also known as vaping devices), or heated tobacco products. SED-FL SED-FL includes \(10\) in-the-wild sessions that belong to \(7\) unique subjects. This is achieved by recording the subjects’ meals as a small part part of their everyday life, unscripted, activities. Participants were instructed to wear the smartwatch to the hand of their preference well ahead before any smoking session and continue to wear it throughout the day until the battery is depleted. In addition, we followed a self-report labeling model, meaning that the ground truth is provided from the participant by documenting the start and end moments of their smoking sessions to the best of their abilities as well as the hand they wear the smartwatch on. The total duration of the recordings sums up to \(78.3\) hours, with a mean duration of \(7.83\) hours. For both datasets, the accompanying Python script read_dataset.py will visualize the IMU signals and ground truth for each of the recordings. Information on how to execute the Python scripts can be found below. # The script and the daataset's pickle file must be located in the same directory. # Tested with Python 3.6.4 # Requirements: Pandas, Pickle and Matplotlib # Visualize signals and ground truth python read_datasets.py Annotation For all recordings, we annotated the start and end points for each puff cycle (i.e., smoking gesture). The annotation process was performed in such a way that the start and end times of each smoking gesture do not overlap each other. Technical details SED We provide the SED dataset as a pickle. The file can be loaded using Python in the following way: import pickle as pkl import pandas as pd with open('./SED.pkl','rb') as fh: dataset = pkl.load(fh) The dataset variable in the snippet above is a dictionary with keys, each corresponding to a unique subject (numbered from to ). It should be mentioned that the subject identifier in SED is in-line with the subject identifier in the SED-FL dataset; i.e., SED’s subject with id equal to is the same person as SED-FL’s subject with id equal to . The content of a dataset ‘s subject is a list with length equal to corresponding subject’s number of recorded smoking sessions. For example, assuming that subject has recorded smoking sessions, the command: sessions = dataset['8'] would yield a list of length equal to . Each member of the list is a Pandas DataFrame with dimensions , where is the length of the recording. The columns of a session’s DataFrame are: 'T':                  The timestamps in seconds 'AccX':            The accelerometer measurements for the axis in \(m/s^2\) 'AccY':            The accelerometer measurements for the axis in \(m/s^2\) 'AccZ':            The accelerometer measurements for the axis in \(m/s^2\) 'GyrX':            The gyroscope measurements for the axis in \(rad/s\) 'GyrY':            The gyroscope measurements for the axis in \(rad/s\) 'GyrZ':            The gyroscope measurements for the axis in \(rad/s\) 'GT':                The manually annotated ground truth for puff cycles The contents of this DataFrame are essentially the accelerometer and gyroscope sensor streams, resampled at a constant sampling rate of Hz and aligned with each other and with their puff cycle ground truth. All sensor streams are transformed in such a way that reflects all participants wearing the smartwatch at the same hand with the same orientation, thusly achieving data uniformity. This transformation is in par with the signals in the SED-FL dataset. The ground truth is a signal with value during puff cycles, and elsewhere. No other preprocessing is performed on the data; e.g., the acceleration component due to the Earth's gravitational field is present at the processed acceleration measurements. The potential researcher can consult the article "Modeling Wrist Micromovements to Measure In-Meal Eating Behavior from Inertial Sensor Data" by Kyritsis et al. on how to further preprocess the IMU signals (i.e., smooth and remove the gravitational component). SED-FL Similar to SED, we provide the SED-FL dataset as a pickle. The file can be loaded using Python in the following way: import pickle as pkl import pandas as pd with open('./SED-FL.pkl','rb') as fh: dataset = pkl.load(fh) The dataset variable in the snippet above is a dictionary with keys, each corresponding to a unique subject. It should be mentioned that the subject identifier in SED-FL is in-line with the subject identifier in the SED dataset; i.e., SED-FL’s subject with id equal to  is the same person as SED’s subject with id equal to . The content of a dataset ‘s subject is a list with length equal to corresponding subject’s number of recorded daily sessions. For example, assuming that subject has recorded 2 daily sessions, the command: sessions = dataset['8'] would yield a list of length equal to \(2\). Each member of the list is a Pandas DataFrame with dimensions \(M \times 8\), where \(M\) is the length of the recording. The columns of a session’s DataFrame are exactly the same with the ones in the SED dataset. However, the 'GT' column contains ground truth that relates with the smoking sessions during the day (instead of puff cycles in SED). The contents of this DataFrame are essentially the accelerometer and gyroscope sensor streams, resampled at a constant sampling rate of \(50\) Hz and aligned with each other and with their smoking session ground truth. All sensor streams are transformed in such a way that reflects all participants wearing the smartwatch at the same hand with the same orientation, thusly achieving data uniformity. This transformation is in par with the signals in the SED dataset. The ground truth is a signal with value \(+1\) during smoking sessions, and \(-1\) elsewhere. No other preprocessing is performed on the data; e.g., the acceleration component due to the Earth's gravitational field is present at the processed acceleration measurements. The potential researcher can consult the article "Modeling Wrist Micromovements to Measure In-Meal Eating Behavior from Inertial Sensor Data" by Kyritsis et al. on how to further preprocess the IMU signals (i.e., smooth and remove the gravitational component). Ethics and funding Informed consent, including permission for third-party access to anonymized data, was obtained from all subjects prior to their engagement in the study. The work leading to these results has received funding from the EU Commission under Grant Agreement No. 965231, the REBECCA project (H2020). Contact Any inquiries regarding the SED and SED-FL datasets should be addressed to: Mr. Konstantinos KYRITSIS (Electrical & Computer Engineer, PhD candidate) Multimedia Understanding Group (MUG) Department of Electrical & Computer Engineering Aristotle University of Thessaloniki University Campus, Building C, 3rd floor Thessaloniki, Greece, GR54124 Tel: +30 2310 996359, 996365  Fax: +30 2310 996398 E-mail: kokirits [at] mug [dot] ee [dot] auth [dot] gr

## 引言 吸烟事件检测(Smoking Event Detection,SED)与自由活动场景吸烟事件检测(Free-living Smoking Event Detection,SED-FL)数据集由多媒体理解小组(Multimedia Understanding Group)创建,用于研究吸烟行为,涵盖实验室吸烟场景与真实野外(in-the-wild)场景。两个数据集均包含来自商用智能手表(Mobvoi TicWatch E™)的三轴加速度与方位角速度信号(6自由度,DoF)。其中,SED数据集包含11名独特受试者提供的20次吸烟会话,SED-FL数据集则包含7名独特受试者提供的10次全天录制数据。此外,SED数据集已对每一口吸烟周期的起止时刻进行了标注。 ## 数据集描述 ### SED数据集 共招募11名受试者,在室内或室外场景下录制其吸烟过程。20次吸烟会话总时长共计161分钟,平均单会话时长为8.08分钟。每名受试者可自然吸烟,唯一限制为吸烟过程中不可换手持烟。录制开始前,要求受试者将智能手表佩戴于日常吸烟所用的惯用手。提前架设摄像机对准受试者,拍摄视野需至少覆盖受试者整个手臂,视频录制的目的是获取吸烟过程中每一口吸烟周期的真实标签(ground truth)数据。此外,要求受试者在录制开始与结束时完成一次拍手动作,用于信号同步(该动作在加速度计信号中特征显著)。未向受试者提供其他额外指令。需注意,SED数据集不包含电子烟(亦称雾化装置)或加热烟草制品的相关样本。 ### SED-FL数据集 SED-FL数据集包含7名独特受试者的10次真实场景会话,录制内容为受试者日常生活中无脚本的活动,以其日常活动的一部分进行采集。要求受试者在任何吸烟活动开始前,提前将智能手表佩戴于自选惯用手,并全天佩戴直至电量耗尽。此外,本数据集采用自报告标注模式:由受试者尽可能准确地记录自身吸烟会话的起止时刻,以及智能手表的佩戴手,以此作为真实标签数据。所有录制数据总时长共计78.3小时,平均单会话时长为7.83小时。 两个数据集均附带Python脚本`read_dataset.py`,可可视化每段录制数据的惯性测量单元(Inertial Measurement Unit,IMU)信号与真实标签。Python脚本的执行方法详见下文: # 脚本与数据集的pickle文件需置于同一目录下 # 已在Python 3.6.4环境下测试通过 # 依赖库:Pandas、Pickle与Matplotlib # 可视化信号与真实标签 python read_dataset.py ## 标注规则 针对所有录制数据,我们已对每一口吸烟周期(即吸烟动作)的起止点进行标注。标注过程确保每一次吸烟动作的起止时间互不重叠。 ## 技术细节 ### SED数据集 我们以pickle格式提供SED数据集,可通过以下Python代码加载该文件: python import pickle as pkl import pandas as pd with open('./SED.pkl','rb') as fh: dataset = pkl.load(fh) 上述代码中的`dataset`变量为一个字典,其键值对应每名唯一受试者(编号从至)。需注意,SED数据集的受试者编号与SED-FL数据集的受试者编号保持一致:即SED中编号为X的受试者与SED-FL中编号为X的受试者为同一人。 单个受试者对应的数据内容为一个列表,列表长度等于该受试者的吸烟会话录制次数。例如,假设某受试者的吸烟会话录制次数为2,则执行命令: python sessions = dataset['8'] 将返回一个长度为2的列表。列表中的每个元素均为一个Pandas DataFrame,其维度为`M × 8`,其中`M`为该段录制数据的总长度。 会话DataFrame的列名及含义如下: - `'T'`:以秒为单位的时间戳 - `'AccX'`:X轴加速度测量值,单位为$m/s^2$ - `'AccY'`:Y轴加速度测量值,单位为$m/s^2$ - `'AccZ'`:Z轴加速度测量值,单位为$m/s^2$ - `'GyrX'`:X轴陀螺仪测量值,单位为$rad/s$ - `'GyrY'`:Y轴陀螺仪测量值,单位为$rad/s$ - `'GyrZ'`:Z轴陀螺仪测量值,单位为$rad/s$ - `'GT'`:针对吸烟周期的人工标注真实标签 该DataFrame的内容本质为加速度计与陀螺仪的传感器数据流,已以恒定采样率重采样,并与彼此及吸烟周期的真实标签对齐。所有传感器数据流均经过统一变换,使所有受试者的智能手表佩戴手与朝向保持一致,以实现数据标准化。该变换规则与SED-FL数据集的信号变换规则一致。真实标签为一类信号,在吸烟周期内具有特定取值,其余时刻为其他取值。 未对数据执行其他预处理操作:例如,处理后的加速度测量值中仍包含地球引力场产生的加速度分量。有需要的研究者可参考Kyritsis等人发表的论文《Modeling Wrist Micromovements to Measure In-Meal Eating Behavior from Inertial Sensor Data》,以了解如何对IMU信号进行进一步预处理(如平滑处理与去除引力分量)。 ### SED-FL数据集 与SED数据集类似,我们以pickle格式提供SED-FL数据集,可通过以下Python代码加载该文件: python import pickle as pkl import pandas as pd with open('./SED-FL.pkl','rb') as fh: dataset = pkl.load(fh) 上述代码中的`dataset`变量为一个字典,其键值对应每名唯一受试者。需注意,SED-FL数据集的受试者编号与SED数据集的受试者编号保持一致:即SED-FL中编号为X的受试者与SED中编号为X的受试者为同一人。 单个受试者对应的数据内容为一个列表,列表长度等于该受试者的全天录制会话次数。例如,假设某受试者的全天录制会话次数为2,则执行命令: python sessions = dataset['8'] 将返回一个长度为2的列表。列表中的每个元素均为一个Pandas DataFrame,其维度为`M × 8`,其中`M`为该段录制数据的总长度。 该DataFrame的列名与SED数据集的列名完全一致,但`'GT'`列包含的是全天录制中吸烟会话的真实标签(而非SED数据集中的吸烟周期标签)。 该DataFrame的内容本质为加速度计与陀螺仪的传感器数据流,已以50Hz的恒定采样率重采样,并与彼此及吸烟会话的真实标签对齐。所有传感器数据流均经过统一变换,使所有受试者的智能手表佩戴手与朝向保持一致,以实现数据标准化。该变换规则与SED数据集的信号变换规则一致。真实标签信号在吸烟会话期间取值为+1,其余时刻取值为-1。 未对数据执行其他预处理操作:例如,处理后的加速度测量值中仍包含地球引力场产生的加速度分量。有需要的研究者可参考Kyritsis等人发表的论文《Modeling Wrist Micromovements to Measure In-Meal Eating Behavior from Inertial Sensor Data》,以了解如何对IMU信号进行进一步预处理(如平滑处理与去除引力分量)。 ## 伦理与资助 所有受试者在参与本研究前均已签署知情同意书,其中包含允许第三方访问匿名化数据的条款。本研究成果的相关工作已获得欧盟委员会(EU Commission)REBECCA项目(H2020)的资助,项目编号为965231。 ## 联系方式 若您对SED与SED-FL数据集有任何疑问,请联系: 康斯坦丁诺斯·基里齐斯(Konstantinos KYRITSIS)先生 电气与计算机工程博士候选人 多媒体理解小组(Multimedia Understanding Group,MUG) 电气与计算机工程系 亚里士多德大学塞萨洛尼基分校 大学校园C楼3层 塞萨洛尼基,希腊,GR54124 电话:+30 2310 996359、+30 2310 996365 传真:+30 2310 996398 电子邮箱:kokirits [at] mug [dot] ee [dot] auth [dot] gr
创建时间:
2021-05-03
二维码
社区交流群
二维码
科研交流群
商业服务