five

Accurately Inferring Personality Traits from the Use of Mobile Technology

收藏
Mendeley Data2024-03-27 更新2024-06-28 收录
下载链接:
https://zenodo.org/record/1316989
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset contains the features extracted from Spatio-Temporal Mobility and Context of Use and the Big5 scores from the 50-item IPIP survey of 55 volunteers from 6 countries located in 2 continents. The authors predict the Big5 traits by fitting 5 regularized linear regression models, one per trait, and select the regularization parameter and evaluate the prediction performance through nested leave-one-out cross validation. Feature extraction pipeline For each volunteer, we start the pipeline with 5 time series encoding, in time, her WGS84 coordinates (latitude and longitude), measurements related to her smartphone's battery (charging status and level), surrounding WiFi APs and BT devices, and whether her phone was connected to a WiFi access point. First, we refine the 5 raw time series to accurately describe the spatio-temporal mobility and the context of our volunteers. For example, we create a binary time series that peaks when the user is at home, or when the user is at work, and so on. Next, we process both the refined and the raw time series to extract the features, as follows: Statistical Features: We divide the raw time series in intervals of one day. We aggregate the different values within each day into a single numerical measurement (e.g., by computing the average, the count of unique values, the information entropy, or the repetitiveness). Finally, we aggregate the measurements obtained across all days into a single value --- the value of that feature for the selected user --- by measuring the mean (avg), the standard deviation (std), and the coefficient of variation (cov). Features prefixed with avg, std, or cov, have been extracted as described here. Spectral Analysis Features: We first apply the DFT to the raw time series. Then, we measure: The frequency of highest energy (we prefix its name with top_frequency); The periodicity of the series in the frequency domain; The energy at the daily and weekly frequencies (daily_energy and weekly_energy); The frequency, the periodicity, and the daily and weekly energy obtained after processing the time series with Welch's method and a two weeks window (w_top_frequency, w_periodicity, w_daily_energy, w_weekly_energy); The euclidean distance between the DFT and a pure sine wave with period equivalent to the top frequency of the series (distance_from_sine). The string b_day in each name specifies that the features only consider business days (i.e. they exclude holidays and weekends). The 5 columns named O, C, E, A, and N, score the users on the Big5 and represent the prediction targets. Source code The Python source code developed to engineer and evaluate the embeddings is available here.
创建时间:
2023-06-28
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作