five

Data for The Eclectic Reader

收藏
NIAID Data Ecosystem2026-05-10 收录
下载链接:
https://doi.org/10.7910/DVN/QHLDXA
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset contains two files related to our article about reader eclecticism. One file contains metadata about books, derived from their landing pages on Goodreads.com. It's formatted as JSON and structured like a Python dictionary, where the keys are urls for each book's works page on Goodreads. The values include the book's title (as a string), the author (string), the average rating (float), the number of ratings (integer), and some shelves (dictionary). The last of these refers to the shelf data available on each book's landing page; at the time of the scrape (fall 2021), Goodreads showed up to 10 of these, and included information about how many people had tagged the book with each shelf. They no longer do this, and reconstructing the weights is non-trivial (you can find detailed information about all of a book's shelves, but Goodreads sometimes groups shelves into an overarching category for the landing page). The information collected here does reflect user interaction with the book, but these are caveats worth considering. In any case, the sub-dictionary uses the shelves as keys and has their weights as values. The file contains information about 884,722 books. The second file shows how we've sorted all of the shelves in our dataset into just a few clusters. This file is very simple—just a two-column csv with the name of the shelf and its cluster—but producing it was complicated. First, we made a network out of our shelves. Each shelf is a node, and we draw an edge between two shelves if they appear in the same book. As we see additional books that combine those shelves, we add to the edge weight. In the end we got a network that shows how all 1,194 shelves in our network are used relative to each other. When we had the network, we used community detection to see how the shelves cluster together. There are many ways to do this, but we used the Louvain method. This approach is non-deterministic and sensitive to various decisions, like the granularity of the community detection. To shore up our sense of the community structure (sometimes called "modularity") of this network, we spent a lot of time on this process. We ran community detection 10,000 times each at a few different granularities. We examined the resulting communities to see which ones tended to show up often and which emerged rarely, and we also observed how shelves tended to show up together. In the end we settled on the eight communities you see in this spreadsheet. We picked the names of each community ourselves. If you want to repeat this process, you will probably wind up with a somewhat different picture. We request that any outputs resulting from use of this dataset acknowledge the Price Lab / J.D. Porter. We have chosen not to share data about specific Goodreads users, in order to protect their privacy. We are, however, open to corresponding with researchers about sharing and collaboration.
创建时间:
2025-09-25
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作