five

USER IDENTITY LINKAGE DATASET

收藏
larc.smu.edu.sg2025-03-25 收录
下载链接:
https://larc.smu.edu.sg/user-identity-linkage-dataset
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset is crawled from three popular on-line social networks (OSNs), namely, Twitter, Facebook and Foursquare. We collected this dataset as follows.We first gathered a set of Singapore-based Twitter users who declared Singapore as location in their user profiles. From the Singapore-based Twitter users, we retrieve a subset of Twitter users who declared their Facebook or Foursquare accounts in their short bio description. In total, we collected 1,998 Twitter-Facebook user identity pairs (known as TW-FB ground truth matching pairs}, and 3,602 Twitter-Foursquare user identity pairs (known as TW-FQ ground truth matching pairs).To simulate a real-world setting, where a user identity in the source OSN may not have its corresponding matching user identity in the target OSN, we expanded the datasets by adding Twitter, Facebook and Foursquare users who are connected to users in the TW-FB ground truth matching pairs and TW-FQ ground truth matching pairs sets. Note that isolated users who do not have links to other users are removed from the data sets.After collecting the datasets, we extract the following user features using the OSNs' APIs.• Username: The username of the account.• Screen name: The natural name of the user account. It is usually formed using the first and last name of the user.• Profile Image: The thumbnail or image provided by the user to visually present herself.• Network: The relationship links between users.

本数据集源自于三个广受欢迎的在线社交网络(OSN),具体包括Twitter、Facebook和Foursquare。数据收集过程如下:首先,我们搜集了一群在用户资料中声明新加坡为地理位置的本地Twitter用户。随后,从这些本地Twitter用户中,我们筛选出一部分在简短个人简介中提及其Facebook或Foursquare账户的用户。总计,我们收集了1,998个Twitter-Facebook用户身份配对(称为TW-FB地面实况匹配对),以及3,602个Twitter-Foursquare用户身份配对(称为TW-FQ地面实况匹配对)。为模拟现实世界的情境,即源OSN中的用户身份可能不存在于目标OSN中的对应匹配用户身份,我们对数据集进行了扩展,添加了与TW-FB地面实况匹配对和TW-FQ地面实况匹配对集中的用户相连的Twitter、Facebook和Foursquare用户。请注意,那些与其他用户无关联的孤立用户已被从数据集中剔除。在收集完数据集后,我们利用社交网络的API提取了以下用户特征:• 用户名:账户的登录名。• 屏幕名:用户的自然名称,通常由用户的首名和姓氏组成。• 个人头像:用户提供的缩略图或图像,用于视觉呈现自我形象。• 网络:用户之间的关系链接。
提供机构:
Living Analytics Research Centre
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作