five

Benchmark datasets for home and work location detection: stop sequences and annotated labels

收藏
DataCite Commons2026-01-08 更新2026-04-25 收录
下载链接:
https://data.dtu.dk/articles/dataset/Benchmark_datasets_for_home_and_work_location_detection_stop_sequences_and_annotated_labels/28846325/1
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset contains privacy-preserving stop sequences and annotated home and work locations, designed to support the benchmarking of location detection algorithms such as <i>HoWDe</i><i>.</i><i> </i>It accompanies the paper <i>HoWDe: a validated algorithm for Home and Work location </i><i>Detection (De Sojo et al.)</i>, contributing to the development of standardized and validated approaches in this field.The dataset includes four files: two with stop sequences and two with annotated home and work labels, drawn from two distinct data sources. Below, we provide a detailed description of each source.<b>Data </b><b>So</b><b>urces</b><b>Dataset 1: DTU stop data and self-reported annotations.</b> This dataset includes stop location data processed using the InfoStop algorithm, collected between 2017 and 2019 across over 68 countries. In this release, we focus on 4,402 mobile phone and smart band users aged 25–64 who self-reported their home and work locations, resulting in 2,896 home labels and 2,732 work labels. Note that the dataset is biased toward urban areas and middle- to high-income countries.To protect privacy, spatial coordinates were removed, and the following privacy-preserving transformations were applied to the stop sequences: (i) time was discretized into 10-minute intervals; (ii) each user’s timeline was shifted so their first recorded activity aligns with the first week of 1970; (iii) entire days were randomly shuffled within each week, while maintaining the distinction between weekdays and weekends; (iv) we filtered to users with at least 10 days of data.The shared files for this dataset are:<i>D1_stops</i>: stop sequences for the 4,402 users.<i>D1_true_labels_uwy</i>: self-reported home and work locations for these users, including the week and year when the labels were reported.<b>Dataset 2: Veraset </b><b>Movements data and annotations.</b> This dataset consists of anonymized GPS points provided by Veraset for the World Bank’s “Monitoring COVID-19 Policy Response Through Human Mobility Data” project. Unlike most GPS providers, Veraset sources data from thousands of Software Development Kits (SDKs), reducing sample bias and covering ~5% of the global population. For this work, we focus on a 12-month period (January 1–December 31, 2020) across five middle-income countries — Brazil, Colombia, Indonesia, Mexico, and the Philippines — with 100 devices per country. Each sample includes one-third of users from high-, medium-, and low-wealth neighborhoods. Users were required to have recorded activity on at least 20% of days before the pandemic, and 20% of days across the entire period.Each user's stop location sequence was expert-annotated based on visit patterns, satellite imagery, and local amenities (see De Sojo et al. for details on the annotators’ interface and procedure). Of the 500 individuals, 287 have at least one annotated work location. Note that since annotators reviewed full longitudinal sequences, changes in home or work over time are not explicitly captured.To protect privacy, spatial coordinates were removed, and the same privacy-preserving transformations as in Dataset 1 were applied before sharing stop sequences.The shared files for this dataset are:<i>D2_stops</i>: stop sequences for the 500 users in Dataset 2.<i>D2_true_labels_u</i>: home and work location labels assigned through expert annotation.<b>Data Description</b><b>D1_stops,</b><b> D2_stops</b>: Each row represents a user’s stop at a given location, identified by a unique user ID. Columns:<b>useruuid</b> (string): Anonymized user identifier<b>loc</b> (integer): Location ID indexed uniquely per user (note: the same loc value does not refer to the same place across users)<b>start</b> (integer): Start time of the visit (Unix timestamp, aggregated into 10-minute bins)<b>end</b> (integer): End time of the visit (Unix timestamp, aggregated into 10-minute bins)<b>country</b> (string): Country code; GL0B indicates an undisclosed country<br><b>D1_true_labels_uwy</b>: Each row represents a user’s self-reported home or work location along with the week it was recorded. Each individual may have multiple home and work locations assigned over different weeks, capturing changes over time. Columns:<b>useruuid</b> (string): Anonymized user identifier<b>s_yy</b> (integer): Year<b>s_ww</b> (integer): Week of the year<b>loc</b> (integer): Location ID (matches loc in D1_stops)<b>true_location_type</b> (string): Ground truth label — "H" for home, "W" for work<br><b>D2_true_labels_u</b>: Each row represents a user’s annotated home or work location, assigned through expert annotation. While users may have multiple home and work locations assigned, the annotation procedure does not capture changes over time. Columns:<b>useruuid</b> (string): Anonymized user identifier<b>loc</b> (integer): Location ID (matches loc in D2_stops)<b>true_location_type</b> (string): Ground truth label — "H" for home, "W" for workAll datasets are provided in CSV format.<b>Compliance and Anonymization</b>This dataset is in accordance with the European Union’s General Data Protection Regulation 2016/679 (GDPR). All personal identifiers were anonymized, and stop sequences were randomized to prevent re-identification while preserving the structure and temporal order of mobility patterns.<b>How to cite</b>De Sojo Caso, Silvia; Lucchini, Lorenzo; Alessandretti, Laura (2025). Benchmark datasets for home and work location detection: stop sequences and annotated labels. Technical University of Denmark. Dataset. https://doi.org/10.11583/DTU.28846325
提供机构:
Technical University of Denmark
创建时间:
2026-01-08
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作