S3 Dataset

Name: S3 Dataset
Creator: figshare
Published: 2025-06-01 03:14:05
License: 暂无描述

DataCite Commons2025-06-01 更新2024-07-28 收录

下载链接：

https://figshare.com/articles/dataset/S3Dataset_zip/14410229/2

下载链接

链接失效反馈

官方服务：

资源简介：

The S3 dataset contains the behavior (sensors, statistics of applications, and voice) of 21 volunteers interacting with their smartphones for more than 60 days. The type of users is diverse, males and females in the age range from 18 until 70 have been considered in the dataset generation. The wide range of age is a key aspect, due to the impact of age in terms of smartphone usage. To generate the dataset the volunteers installed a prototype of the smartphone application in on their Android mobile phones. All attributes of the different kinds of data are writed in a vector. The dataset contains the fellow vectors: Sensors: This type of vector contains data belonging to smartphone sensors (accelerometer and gyroscope) that has been acquired in a given windows of time. Each vector is obtained every 20 seconds, and the monitored features are:- Average of accelerometer and gyroscope values.- Maximum and minimum of accelerometer and gyroscope values.- Variance of accelerometer and gyroscope values.- Peak-to-peak (max-min) of X, Y, Z coordinates.- Magnitude for gyroscope and accelerometer. Statistics: These vectors contain data about the different applications used by the user recently. Each vector of statistics is calculated every 60 seconds and contains : - Foreground application counters (number of different and total apps) for the last minute and the last day.- Most common app ID and the number of usages in the last minute and the last day. - ID of the currently active app. - ID of the last active app prior to the current one.- ID of the application most frequently utilized prior to the current application. - Bytes transmitted and received through the network interfaces. Voice: This kind of vector is generated when the microphone is active in a call o voice note. The speaker vector is an embedding, extracted from the audio, and it contains information about the user's identity. This vector, is usually named "x-vector" in the Speaker Recognition field, and it is calculated following the steps detailed in "egs/sitw/v2" for the Kaldi library, with the models available for the extraction of the embedding. A summary of the details of the collected database. - Users: 21 - Sensors vectors: 417.128 - Statistics app's usage vectors: 151.034 - Speaker vectors: 2.720 - Call recordings: 629 - Voice messages: 2.091

S3数据集收录了21名志愿者使用智能手机超过60天的行为数据，涵盖传感器数据、应用统计数据与语音数据。本数据集的参与者覆盖多元群体，年龄跨度为18至70岁的男性与女性均纳入数据集构建流程。年龄跨度较大是本数据集的核心特征之一，因为年龄会对智能手机使用行为产生显著影响。为构建该数据集，志愿者在其安卓（Android）智能手机上安装了一款智能手机应用原型。各类数据的所有属性均以向量形式存储，本数据集包含以下三类向量： Sensors：此类向量包含特定时段采集的智能手机传感器数据，涵盖加速度计（accelerometer）与陀螺仪（gyroscope）数据。每20秒生成一条向量，所监测的特征包括：加速度计与陀螺仪数值的平均值、加速度计与陀螺仪数值的最大值与最小值、加速度计与陀螺仪数值的方差、X/Y/Z三轴的峰峰值（最大值减最小值）、陀螺仪与加速度计的模长。 Statistics：此类向量包含用户近期使用的各类应用相关数据。每60秒计算一条统计向量，其包含的字段如下：过去1分钟与1天内的前台应用计数（不同应用数量与总应用使用次数）、过去1分钟与1天内最常使用的应用ID及其使用次数、当前活跃应用的ID、当前应用之前最后一个活跃应用的ID、当前应用之前最常使用的应用ID、通过网络接口传输与接收的字节数。 Voice：此类向量生成于麦克风处于通话或语音笔记录制状态时。说话人向量是从音频中提取的嵌入（embedding）表征，包含用户身份相关信息。该向量在说话人识别（Speaker Recognition）领域通常被称为"x-vector"，其计算遵循Kaldi库"egs/sitw/v2"路径下详述的步骤，使用预训练模型完成嵌入提取。本次采集的数据库详情汇总如下： - 参与者：21名 - 传感器向量：417128条 - 应用使用统计向量：151034条 - 说话人向量：2720条 - 通话录音：629条 - 语音笔记：2091条

提供机构：

figshare

创建时间：

2021-04-13

搜集汇总

数据集介绍