five

JSON Datasets for Exploratory OLAP

收藏
doi.org2025-03-22 收录
下载链接:
http://doi.org/10.17632/ct8f9skv97.1
下载链接
链接失效反馈
官方服务:
资源简介:
These datasets has been used to evaluate the EXODuS approach: EXploratory OLAP over Document Stores. - The games dataset has been collected by Sports Reference LLC. It contains around 32K nested documents representing NBA games in the period 1985-2013. Each document represents a game between two teams with at least 11 players each. It contains 47 attributes; 40 of them are numeric and represent team and player results. - The DBLP dataset contains 2M documents scraped from DBLP in XML format and converted into JSON. Documents are flat and represent eight kinds of publications including conference proceedings, journal articles, books, thesis, etc. The third portion of the dataset represent author pages, containing half the number of fields compared to other kinds. So, documents have shared attributes such as title, author, type, year and unshared ones such as journal and booktitle. - The Twitter dataset contains 2M tweets scraped from the Twitter API. Each document represents a tweet message and its metadata, which contains some nested objects: a user object that represent the author of the tweet, a place object that gives its location and a retweet object if it is a reply. The dataset is heterogeneous and mixes between tweets and documents of an API call for tweet deletes. The sources of the datasets are listed in the Related links Section.

本数据集已被用于评估EXODuS方法,即针对文档存储的探索性在线分析(EXploratory OLAP over Document Stores)。游戏数据集由Sports Reference LLC收集,其中包含约32K个嵌套文档,代表1985-2013年间的NBA比赛。每个文档均代表一场由至少11名球员组成的两个队伍之间的比赛,包含47个属性;其中40个为数值型,代表球队和球员的成果。DBLP数据集包含从DBLP爬取的200万份文档,以XML格式呈现并转换为JSON格式。文档结构扁平,代表包括会议论文、期刊文章、书籍、论文等八种类型的出版物。数据集的第三部分代表作者页面,其字段数量仅为其他部分的半数。文档共享属性包括标题、作者、类型和年份,而独有属性则包括期刊和书籍标题。Twitter数据集包含从Twitter API爬取的200万条推文,每个文档代表一条推文消息及其元数据,其中包含一些嵌套对象:一个用户对象代表推文的作者,一个位置对象提供其位置信息,如果是一条回复,则包含一个转发对象。该数据集异构性较强,混合了推文与API调用删除推文的文档。相关链接部分的“Related links Section”列出了数据集的来源。
提供机构:
Mendeley Data
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作