five

SocialLink: knowledge transfer between social media and linked open data

收藏
NIAID Data Ecosystem2026-03-10 收录
下载链接:
https://figshare.com/articles/dataset/SocialLink_knowledge_transfer_between_social_media_and_linked_open_data/5235823
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset contains canonical citations (DOIs) for the SocialLink dataset (15th May 2017 release), alignment data and code and entity data in .csv and .json format. SocialLink is a publicly-available Linked Open Data dataset that matches social media accounts on Twitter to the corresponding entities in multiple language chapters of DBpedia. By effectively bridging the Twitter social media world and the Linked Open Data cloud, SocialLink enables knowledge transfer between the two: on the one hand, it supports Semantic Web practitioners in better harvesting the vast amounts of valuable, up-to-date information available in Twitter; on the other hand, it permits Social Media researchers to leverage DBpedia data when processing the noisy, semi-structured data of Twitter. The SocialLink dataset is created by the SocialLink Pipeline, which aligns 271,000 DBpedia persons and organisations to their Twitter profiles via data acquisition, candidate acquisition and candidate selection phases. Data files are stored in compressed .gz format that can be uncompressed using standard compression utilities. Diagrams are presented in .pdf format, .csv, .json and .java files can be accessed via text edit programs, .tql files can be accessed via MS SQL Server. Format descriptions: JSON JSON file is a single array containing an object for each DBpedia entity with similar structure. Where candidates property contain the list of candidate IDs for each entity, while scores property contains a confidence score for each candidate reported by our candidate selection algorithm. twitter_id might be present in case a certain threshold is met (thresholds are selected according to the high F1 setup from our paper) CSV For each row of our CSV file contains info about a certain entity. Each row looks like this: http://dbpedia.org/resource/MoShang,"[6887052,26735153,302784580,1331809652,2275404837,2597365788,1516978014,753046765809508356,1512300530,255873440]","[1.579205048600787,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0]",6887052 The columns contain the same data as in JSON format. If the Twitter ID can't be determined — 0 is used in the last column instead. approach.pdf and rdf.pdf provide visual representations of the SocialLink pipeline and RDF alignments. For more detailed information on the RDF modeling choices see the associated publication, while extensive documentation is available via the SocialLink website (url below), covering: (i) dataset scope, format, statistics, and access mechanisms; (ii) instructions for deploying and running the SocialLink pipeline to recreate the resource; (iii) example applications using the dataset; and, (iv) links to external resources like the GitHub repository and issue tracker. Code: https://github.com/Remper/sociallink SocialLink Website: http://sociallink.futuro.media/

本数据集包含2017年5月15日发布版SocialLink数据集的标准引用(数字对象标识符DOIs)、对齐数据、代码,以及.csv和.json格式的实体数据。 SocialLink是一款可公开获取的关联开放数据(Linked Open Data)数据集,其功能是将Twitter上的社交媒体账号与DBpedia多语言章节中的对应实体进行匹配。通过有效打通Twitter社交媒体领域与关联开放数据云之间的壁垒,SocialLink实现了二者间的知识流转:一方面,它可协助语义网(Semantic Web)从业者更好地获取Twitter平台上海量优质且实时的信息;另一方面,它允许社交媒体研究人员在处理Twitter嘈杂且半结构化的数据时,利用DBpedia的数据集。 本SocialLink数据集由SocialLink Pipeline构建,该流程通过数据采集、候选实体获取与候选实体筛选三个阶段,将27.1万个DBpedia人物与组织机构实体对齐至其对应的Twitter账号资料。 数据文件以压缩的.gz格式存储,可通过标准压缩工具解压。数据集附带的图表采用.pdf格式;.csv、.json与.java文件可通过文本编辑程序打开;.tql文件可通过Microsoft SQL Server(MS SQL Server)访问。 格式说明: JSON格式 JSON文件为单个数组,其中包含每个DBpedia实体对应的对象,结构统一。其中,`candidates`字段存储各实体的候选ID列表,`scores`字段存储候选实体筛选算法给出的各候选实体置信度得分。当满足特定阈值(阈值根据论文中的高F1配置选取)时,文件中将包含`twitter_id`字段。 CSV格式 CSV文件的每一行对应一个实体的相关信息,示例行如下: http://dbpedia.org/resource/MoShang,"[6887052,26735153,302784580,1331809652,2275404837,2597365788,1516978014,753046765809508356,1512300530,255873440]","[1.579205048600787,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0]",6887052 各列的数据与JSON格式一致。若无法确定该实体的Twitter账号ID,则最后一列将以0替代。 approach.pdf与rdf.pdf分别展示了SocialLink流程与资源描述框架(Resource Description Framework,RDF)对齐的可视化效果。 若需了解RDF建模相关的更多细节,请参阅配套论文;完整的文档可通过SocialLink官方网站(链接见下文)获取,文档内容涵盖:(i) 数据集的范围、格式、统计信息与访问方式;(ii) 部署并运行SocialLink流程以复现该资源的操作指南;(iii) 基于本数据集的示例应用;(iv) GitHub代码仓库与问题追踪器等外部资源的链接。 代码仓库:https://github.com/Remper/sociallink SocialLink官方网站:http://sociallink.futuro.media/
创建时间:
2018-03-07
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作