five

Synthetic and Real Key-Value Data Sets

收藏
Mendeley Data2024-01-31 更新2024-06-28 收录
下载链接:
https://data.mendeley.com/datasets/kxcb3tnr3t
下载链接
链接失效反馈
官方服务:
资源简介:
We present key-value data sets where each data set is composed of various data types. We present eight datasets including synthetic and real data sets for storing them in the key-value stores such as LevelDB of Google, RocksDB of Facebook, and Berkeley DB of Oracle. The key-value stores have a strength that can deal with various data types by assigning data of an arbitrary type as the value and the unique ID as the key. When we construct key-value data sets, we focus on various data types (i.e., variety) in real data sets and various sizes (i.e., volume) in synthetic data sets. We generate four synthetic data sets according to the various size of data set: (1) KVData1, (2) KVData2, (3) KVData3, and (4) KVData4. The total number of objects are varied from 10K to 10M. For each key-value pair, we generate a random string with a variable length and a unique ID for a key. For real datasets, we crawled user tweets and relevant information from Twitter using Tweepy library (https://www.tweepy.org/) and each data set consists of various data types: 1) Geo-location, 2) hashtag, 3) Tweets, and 4) the number of followers. That is, all the data sets are designed to have different data types such as geo-locations, texts, and integers. Table 2 shows the characteristics of the real data sets. We crawled four kinds of real data sets: (1) ID-Geo, consisting of the tweet ID and the location information of the tweet, (2) ID-Hashtag, consisting of the tweet ID and the hashtags in the tweet, (3) ID-Tweet data set, consisting of the tweet ID and the tweet text, and (4) User-Followers, consisting of the user ID and the number of followers of the user.
创建时间:
2024-01-31
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作