mattismegevand/pitchfork
收藏Hugging Face2023-08-13 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/mattismegevand/pitchfork
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
language:
- en
task_categories:
- summarization
- text-generation
- question-answering
tags:
- music
size_categories:
- 10K<n<100K
---
# Pitchfork Music Reviews Dataset
This repository contains the code and dataset for scraping music reviews from Pitchfork.
## Dataset Overview
The Pitchfork Music Reviews dataset is a collection of music album reviews from the Pitchfork website. Each entry in the dataset represents a single review and includes the following attributes:
- `artist`: The artist of the album.
- `album`: The name of the album.
- `year_released`: The year the album was released.
- `rating`: The rating given to the album by the reviewer.
- `small_text`: A short snippet from the review.
- `review`: The full text of the review.
- `reviewer`: The name of the reviewer.
- `genre`: The genre(s) of the album.
- `label`: The record label that released the album.
- `release_date`: The release date of the review.
- `album_art_url`: The URL of the album art.
## Usage
This dataset is publicly available for research. The data is provided 'as is', and you assume full responsibility for any legal or ethical issues that may arise from the use of the data.
## Scraping Process
The dataset was generated by scraping the Pitchfork website. The Python script uses the `requests` and `BeautifulSoup` libraries to send HTTP requests to the website and parse the resulting HTML content.
The script saves the data in an SQLite database and can also export the data to a CSV file. Duplicate entries are avoided by checking for existing entries with the same artist and album name before inserting new ones into the database.
## Potential Applications
This dataset can be used for a variety of research purposes, such as:
- Music information retrieval
- Text mining and sentiment analysis
- Music recommendation systems
- Music trend analysis
## Acknowledgments
The dataset is sourced from [Pitchfork](https://pitchfork.com/), a website that publishes daily reviews, features, and news stories about music.
## License
Please ensure you comply with Pitchfork's terms of service before using or distributing this data.
提供机构:
mattismegevand
原始信息汇总
Pitchfork Music Reviews Dataset 概述
数据集基本信息
- 许可证: MIT
- 语言: 英语
- 任务类别:
- 摘要生成
- 文本生成
- 问答
- 标签: 音乐
- 大小类别: 10K<n<100K
数据集内容
- 艺术家: 专辑的艺术家
- 专辑: 专辑名称
- 发行年份: 专辑发行年份
- 评分: 专辑的评分
- 简短文本: 评论的简短片段
- 评论全文: 完整的评论文本
- 评论者: 评论者的名字
- 流派: 专辑的流派
- 唱片公司: 发行专辑的唱片公司
- 评论发布日期: 评论的发布日期
- 专辑封面URL: 专辑封面的URL
数据集用途
- 音乐信息检索
- 文本挖掘与情感分析
- 音乐推荐系统
- 音乐趋势分析
数据来源
- 数据来源于 Pitchfork 网站,该网站每日发布音乐评论、特写和新闻。



