Synthetic and Real Key-Value Data Sets

Mendeley Data2024-01-31 更新2024-06-28 收录

下载链接：

https://data.mendeley.com/datasets/kxcb3tnr3t

下载链接

链接失效反馈

官方服务：

资源简介：

We present key-value data sets where each data set is composed of various data types. We present eight datasets including synthetic and real data sets for storing them in the key-value stores such as LevelDB of Google, RocksDB of Facebook, and Berkeley DB of Oracle. The key-value stores have a strength that can deal with various data types by assigning data of an arbitrary type as the value and the unique ID as the key. When we construct key-value data sets, we focus on various data types (i.e., variety) in real data sets and various sizes (i.e., volume) in synthetic data sets. We generate four synthetic data sets according to the various size of data set: (1) KVData1, (2) KVData2, (3) KVData3, and (4) KVData4. The total number of objects are varied from 10K to 10M. For each key-value pair, we generate a random string with a variable length and a unique ID for a key. For real datasets, we crawled user tweets and relevant information from Twitter using Tweepy library (https://www.tweepy.org/) and each data set consists of various data types: 1) Geo-location, 2) hashtag, 3) Tweets, and 4) the number of followers. That is, all the data sets are designed to have different data types such as geo-locations, texts, and integers. Table 2 shows the characteristics of the real data sets. We crawled four kinds of real data sets: (1) ID-Geo, consisting of the tweet ID and the location information of the tweet, (2) ID-Hashtag, consisting of the tweet ID and the hashtags in the tweet, (3) ID-Tweet data set, consisting of the tweet ID and the tweet text, and (4) User-Followers, consisting of the user ID and the number of followers of the user.

创建时间：

2024-01-31

5,000+

优质数据集

54 个

任务类型

进入经典数据集