Activity recognition from in-the-wild smartwatches (ArWISE)
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
http://datadryad.org/dataset/doi%253A10.5061%252Fdryad.jdfn2z3nm
下载链接
链接失效反馈官方服务:
资源简介:
The Activity recognition from in-the-WIld SmartwatchEs (ArWISE) dataset is based on sensor data and activity labels collected from smart watches as part of several studies for a total of 854 participants across 20 cohorts. The sensor data consisted of 10Hz accelerometer, gyroscope, and location information that has been processed into anonymized features computed from one minute windows of data: local time, date, and day of week; mean and standard deviation of yaw, pitch, roll, x/y/z/total rotation rate, x/y/z/total acceleration, speed, course, distance from home, and bearing from home. The activity label is one of eat, errands, exercise, hobby, housework, hygiene, relax, sleep, socialize, travel, work, other. There are 470M data points total, of which 37M are labeled.
Methods
We introduce ArWISE (Activity recognition from in-the-Wild SmartwatchEs), a dataset containing labeled and unlabeled data collected by Apple Watches. ArWISE represents readings collected from 20 studies in 2 countries over 8 years.
Data Collection
Data collection followed a consistent protocol for each study. Participants were given an Apple Watch to wear each day on their non-dominant arm. While they wore the watch, a custom app collected 3d accelerometer and gyroscope readings at 10Hz. Additionally, the app collected the person’s location every minute or when the magnitude of the acceleration vector exceeded a threshold.
At random times throughout each day, the smartwatch prompted the participant to select an activity from a scroll-down list that best described their current activity. The distribution of user-provided labels across 12 activity categories are Eat (6.5%), Errands (3.7%), Exercise (4.7%), Hobby (1.1%), Housework (19.7%), Hygiene (1.9%), Other (3.1%), Relax (37.7%), Sleep (3.0%), Socialize (3.7%), Travel (5.6%), Work (9.1%). The label was applied to five minutes of sensor readings ending at the time of the participant’s response.
Additionally, an external annotator provided labels for a much greater density of data collected for cohorts 7 and 18. This person used a tool that visualized 3D movement data, a map of visited locations, and time stamps, at arbitrary time frames.
While the data collection mechanism was the same for all study cohorts, other parameters varied. These include the number of participants, participant demographics, length of data collection, and other clinical variables that were collected. A summary of study cohort parameters is given in Table 1, where HOA=healthy older adult, SCD=subjective cognitive decline, and MCI=mild cognitive impairment.
Table 1. ArWISE Cohorts.
Cohort
Sample
Study/participant characteristics
1
4
Younger adults, self-reported activities
2
185
HOA/SCD/MCIa, English and Spanish self-reported activities
3
56
Younger adults, no activity labels
4
46
HOA/SCD/MCI, self-reported activities
5
10
Older adult pairs, no activity labels
6
35
HOA/SCD/MCI, no activity labels
7
37
HOA/SCD/MCI, self-reported activities and expert-annotated activities
8
9
Younger adults, self-reported activities
9
15
Younger adults, self-reported activities
10
13
Younger adults, self-reported activities
11
3
Younger adults, self-reported activities
12
18
Younger adults, self-reported activities
13
10
Younger adults, self-reported activities
14
22
Younger adults, self-reported activities
15
21
HOA/SCD/MCI, no activity labels
16
6
Younger adults, self-reported activities
17
103
HOA/SCD/MCI, self-reported activities
18
16
HOA/SCD/MCI, self-reported activities and expert-annotated activities
19
16
HOA/SCD/MCI, self-reported activities
20
229
HOA/SCD/MCI, no activity labels
Dataset Characteristics
The ArWISE dataset is unique among the resources that are typically available for human activity recognition. Some of the most-analyzed datasets reflect movement categories based on data that are collected in controlled settings [1], [2]. However, more recent wearable sensor datasets represent activities observed in uncontrolled settings. Although 150 participants are monitored for only 24 hours with movement-only sensors, Capture-24 [3] includes labels for functional activities of household chores, sports, and sleep in real-world settings. ExtraSensory [4] monitors a smaller set of 60 participants with up to 20 seconds of movement and location readings but provides diverse activity and location. The UK Biobank [5] offers 7 days of accelerometry data for 100,000+ participants and Intuition [6] longitudinally observes 23,004 participants, though no ground-truth labels are provided for these data.
The ArWISE dataset contains 37,578,059 labeled points from 503 participants across 15 cohorts and 469,881,358 total points for 854 participants across 20 cohorts. Each point represents one minute of data. ArWISE offers unique benefits for HAR analysis, including a large set of participants, functional activity labels, longitudinal observations, and consistency in the data collection mechanism.
Data Preprocessing
Our functional activity recognition models consider both raw time series data and engineered features. Table 2 summarizes the features that are available for both cases.
Table 2. ArWISE raw and engineered data features.
Type
Category
Feature
Raw (10Hz)
time
date and time
Raw (10Hz)
motion
yaw, pitch, roll, rotation rate (x,y,z), acceleration (x,y,z)
Raw (10Hz)
location
latitude, longitude, altitude, course, speed
Engineered (1 min)
time
time of day (radians, sin, cos), day of week
Engineered (1 min)
motion
mean & stdev (each raw movement variable),
mean & stdev (rotation vector magnitude, acceleration vector magnitude)
Engineered (1 min)
location
mean & stdev (course, speed)
mean & stdev (distance from home, latitude distance from home, longitude distance from home)
mode & stdev (bearing from home)
Class label
activity
eat, errands, exercise, hobby, housework, hygiene, relax, sleep, socialize, travel, work, other
We imputed missing values (with mode for location and median for other features) and dropped data points where there was not a complete minute of sensor readings leading up to the label. We also normalized each feature separately.
For the engineered features, we aggregated values over one minute leading up to the user (or expert) label. Time of day was represented as a set of sinusoidal features to maintain the periodic nature. We did not use raw location values here, to preserve user privacy and because the values do not easily generalize between individuals. Instead, we defined a person’s home as the location visited most often at the beginning of each day. We then extracted the Haversine distance and trigonometric bearing from the person’s home location.
References
[1] O. Napoli et al., “A benchmark for domain adaptation and generalization in smartphone-based human activity recognition,” Scientific Data, vol. 11, p. 1192, 2024.
[2] A. Reiss, D. Stricker, and G. Hendeby, “Towards robust activity recognition for everyday life: Methods and evaluation,” in Pervasive Computing Technologies for Healthcare, 2013, pp. 25–32.
[3] S. Chan et al., “CAPTURE-24: A large dataset of wrist-worn activity tracker data collected in the wild for human activity recognition,” Nature Scientific Data, vol. 11, p. 1135, 2024.
[4] Y. Vaizman, K. Ellis, and G. Lanckriet, “Recognizing detailed human context in the wild from smartphones and smartwatches,” IEEE Pervasive Computing, vol. 16, no. 4, pp. 62–74, 2017.
[5] C. Sudlow, J. Gallacher, N. Allen, V. Beral, P. Burton, and J. Danesh, “UK Biobank: An open access resource for identifying the causes of a wide range of complex diseases of middle and old age,” PLoS Medicine, vol. 12, no. 3, p. 1001779, 2015.
[6] P. M. Butler, J. Yang, R. Brown, M. Hobbs, and A. Becker, “Smartwatch- and smartphone-based remote assessment of brain health and detection of mild cognitive impairment,” Nature Medicine, 2025, doi: https://doi.org/10.1038/s41591-024-03475-9.
创建时间:
2025-03-21



