five

NUS MULTI-SOURCE Social DATASET (NUS-MSS)

收藏
DataCite Commons2025-08-07 更新2024-07-13 收录
下载链接:
http://scholarbank.nus.edu.sg/handle/10635/137406
下载链接
链接失效反馈
官方服务:
资源简介:
With the rapid growth of multi-source social media resources, comprehensive user profile learning serves as an actual backbone in various application domains. Such user profile components as user mobility and user demography describe social media users from different views. However, there was no much research done on multi-source multimodal user profile learning. Moreover, there is not any benchmark dataset released towards user mobility and demographic profiling. Here we introduce a multi-source dataset created by Lab for Media Search in National University of Singapore. The dataset includes six types of features extracted from these data, including location semantics features, location semantics LDA-based features, text LDA-based features, text LIWC features, sentiment and writing style features, ImageNet image concept features; and ground-truth data from three geographical regions: Singapore, New York, and London. In order to cover the most popular data modalities (visual, textual and location data), we incorporate following social media sources: Foursquare (the largest location based social network) as a location data source; Twitter (microblog service with the biggest English-speaking users base) as a textual data source; Instagram (The most popular photo sharing service) as a visual data source and Facebook as a ground truth source. We also provide the baseline results for user Demographic profiling by learning from the text, image and location data using the ensemble model. The benchmark results show that it is possible to learn models from these data aiming to improve user profile learning. Please check more details about user profile learning and features description from slides. Our dataset can be used for both descriptive and prescriptive research. That is to say, we do not intend to constraint future research on user profile learning, since the available ground truth provides possibility to tackle other contemporary problems. We list some potential research topics that can be conducted on our released dataset: <strong>Complete demographic profiling.</strong> Researchers are encouraged to learn other demographics attributes, such as occupation, personality and social status. <strong>Extended mobility profiling.</strong> In current study, we focused on category-specific user mobility profiling; while it would be useful to incorporate spatio-temporal factors of users' movement <strong>Causality patterns extraction.</strong> It is important to discover potential causal relationships between events from multiple data sources. For example, the "flower" image concept could be temporally related with flower shop check-ins or tweets about flowers. <strong>Causality patterns extraction.</strong>Cross-source user identification. The alignment of user accounts across multiple social resources can benefit from user profile compilation <strong>Causality patterns extraction.</strong>Cross-region user profiling and community matching. This direction may over insight on differences and similarities between users' preferences. <strong>For more details of this dataset and to reuse this dataset, please visit http://nusmultisource.azurewebsites.net/</strong>
提供机构:
NUS
创建时间:
2017-11-13
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作