five

bstarrs/goodreads-books

收藏
Hugging Face2026-03-25 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/bstarrs/goodreads-books
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc0-1.0 task_categories: - text-classification - text-generation language: - en tags: - books - goodreads - web-scraping - recommendation-systems - literature size_categories: - 1K<n<10K --- # Goodreads Books Dataset ## Dataset Description A comprehensive dataset of books scraped from Goodreads, including ratings, authors, titles, and various book characteristics. This dataset contains **3045 books** with **20 features** each, scraped from Goodreads. It's perfect for: - 📚 Book recommendation systems - 📊 Literary data analysis - 🤖 Machine learning projects - 📈 Rating prediction models - 🔍 Book discovery algorithms ## Dataset Structure ### Features | Column | Type | Description | |--------|------|-------------| | rank | int64 | Book rank | | percentile_rank | float64 | Book percentile rank | | book_id | int64 | Book book id | | title | object | Book title | | author | object | Book author | | rating | float64 | Book rating | | rating_category | object | Book rating category | | rating_tier | object | Book rating tier | | is_high_rated | bool | Book is high rated | | title_length | int64 | Book title length | | title_complexity | object | Book title complexity | | word_count | int64 | Book word count | | author_count | int64 | Book author count | | author_name_length | int64 | Book author name length | | has_series_info | bool | Book has series info | | series_number | float64 | Book series number | | title_type | object | Book title type | | has_subtitle | bool | Book has subtitle | | has_middle_name | bool | Book has middle name | | estimated_popularity | object | Book estimated popularity | ### Statistics - **Total Records**: 3,045 - **File Size**: 0.43 MB - **Data Quality**: 97.0% complete - **Average Rating**: 4.06 - **Rating Range**: 0.00 - 4.93 ## Usage ```python from datasets import load_dataset # Load the dataset dataset = load_dataset("codealchemist01/goodreads-books") # Access the data df = dataset['train'].to_pandas() print(df.head()) ``` ## Data Collection The data was collected through web scraping of Goodreads.com using ethical scraping practices: - Respectful rate limiting - Robots.txt compliance - No personal user data collected ## Citation If you use this dataset in your research, please cite: ``` @dataset{goodreads_books_2025, title={Goodreads Books Dataset}, author={Kutay Ahin}, year={2025}, url={https://huggingface.co/datasets/codealchemist01/goodreads-books} } ``` ## License This dataset is released under the CC0 1.0 Universal License.
提供机构:
bstarrs
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作