PiriPiri71/spotify-tracks-dataset
收藏Hugging Face2026-02-26 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/PiriPiri71/spotify-tracks-dataset
下载链接
链接失效反馈官方服务:
资源简介:
---
license: bsd
task_categories:
- feature-extraction
- tabular-classification
- tabular-regression
language:
- en
tags:
- music
- art
pretty_name: Spotify Tracks Dataset
size_categories:
- 100K<n<1M
---
# Content
This is a dataset of Spotify tracks over a range of **125** different genres. Each track has some audio features associated with it. The data is in `CSV` format which is tabular and can be loaded quickly.
# Usage
The dataset can be used for:
- Building a **Recommendation System** based on some user input or preference
- **Classification** purposes based on audio features and available genres
- Any other application that you can think of. Feel free to discuss!
# Column Description
- **track_id**: The Spotify ID for the track
- **artists**: The artists' names who performed the track. If there is more than one artist, they are separated by a `;`
- **album_name**: The album name in which the track appears
- **track_name**: Name of the track
- **popularity**: **The popularity of a track is a value between 0 and 100, with 100 being the most popular**. The popularity is calculated by algorithm and is based, in the most part, on the total number of plays the track has had and how recent those plays are. Generally speaking, songs that are being played a lot now will have a higher popularity than songs that were played a lot in the past. Duplicate tracks (e.g. the same track from a single and an album) are rated independently. Artist and album popularity is derived mathematically from track popularity.
- **duration_ms**: The track length in milliseconds
- **explicit**: Whether or not the track has explicit lyrics (true = yes it does; false = no it does not OR unknown)
- **danceability**: Danceability describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. A value of 0.0 is least danceable and 1.0 is most danceable
- **energy**: Energy is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy. For example, death metal has high energy, while a Bach prelude scores low on the scale
- **key**: The key the track is in. Integers map to pitches using standard Pitch Class notation. E.g. `0 = C`, `1 = C♯/D♭`, `2 = D`, and so on. If no key was detected, the value is -1
- **loudness**: The overall loudness of a track in decibels (dB)
- **mode**: Mode indicates the modality (major or minor) of a track, the type of scale from which its melodic content is derived. Major is represented by 1 and minor is 0
- **speechiness**: Speechiness detects the presence of spoken words in a track. The more exclusively speech-like the recording (e.g. talk show, audio book, poetry), the closer to 1.0 the attribute value. Values above 0.66 describe tracks that are probably made entirely of spoken words. Values between 0.33 and 0.66 describe tracks that may contain both music and speech, either in sections or layered, including such cases as rap music. Values below 0.33 most likely represent music and other non-speech-like tracks
- **acousticness**: A confidence measure from 0.0 to 1.0 of whether the track is acoustic. 1.0 represents high confidence the track is acoustic
- **instrumentalness**: Predicts whether a track contains no vocals. "Ooh" and "aah" sounds are treated as instrumental in this context. Rap or spoken word tracks are clearly "vocal". The closer the instrumentalness value is to 1.0, the greater likelihood the track contains no vocal content
- **liveness**: Detects the presence of an audience in the recording. Higher liveness values represent an increased probability that the track was performed live. A value above 0.8 provides strong likelihood that the track is live
- **valence**: A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry)
- **tempo**: The overall estimated tempo of a track in beats per minute (BPM). In musical terminology, tempo is the speed or pace of a given piece and derives directly from the average beat duration
- **time_signature**: An estimated time signature. The time signature (meter) is a notational convention to specify how many beats are in each bar (or measure). The time signature ranges from 3 to 7 indicating time signatures of `3/4`, to `7/4`.
- **track_genre**: The genre in which the track belongs
# Sources and Methodology
The data was collected and cleaned using Spotify's Web API and Python.
license: BSD
任务类别:
- 特征提取
- 表格分类
- 表格回归
语言:
- 英语
标签:
- 音乐
- 艺术
友好名称:Spotify 曲目数据集(Spotify Tracks Dataset)
规模类别:10万 < n < 100万
# 内容
本数据集涵盖125种不同流派的Spotify曲目,每首曲目均附带相关音频特征。数据采用逗号分隔值(Comma-Separated Values,CSV)格式,属于表格型数据,可快速加载。
# 使用场景
本数据集可应用于以下场景:
- 基于用户输入或偏好构建推荐系统(Recommendation System)
- 基于音频特征与现有流派信息开展分类任务
- 其他任何你能想到的应用场景,欢迎交流探讨。
# 字段说明
- **track_id**:该曲目的Spotify官方标识符(Spotify ID)
- **artists**:演奏该曲目的艺术家姓名,若存在多位艺术家,则以分号`;`分隔
- **album_name**:该曲目所属专辑名称
- **track_name**:曲目名称
- **popularity**:**曲目流行度为0至100的数值,100代表最受欢迎**。流行度由算法计算得出,主要基于曲目总播放量与播放时效性:当前高频播放的歌曲通常比过往热门歌曲拥有更高的流行度。重复曲目(如同首曲目分别来自单曲与正规专辑)会被独立评分。艺术家与专辑的流行度由曲目流行度通过数学方式推导得出。
- **duration_ms**:曲目时长,单位为毫秒(ms)
- **explicit**:该曲目是否包含露骨歌词(`true`代表包含,`false`代表不包含或未知)
- **danceability**:舞曲性(danceability)描述曲目基于节奏稳定性、节拍强度与整体规律性等多项音乐元素综合判断的适合跳舞程度,取值范围为0.0至1.0,0.0代表最不适宜跳舞,1.0代表最适宜跳舞
- **energy**:能量值(energy)为0.0至1.0的感知强度与活跃度度量,高能量曲目通常节奏明快、音量响亮且充满动感,例如死亡金属具有较高能量值,而巴赫前奏曲的能量值则相对较低
- **key**:曲目所属调式,整数采用标准音高类记法(Pitch Class notation)映射至音高,例如`0 = C`,`1 = C♯/D♭`,`2 = D`,以此类推;若未检测到调式,则取值为-1
- **loudness**:曲目整体响度,单位为分贝(dB)
- **mode**:调式(mode)指示曲目的调式类型(大调或小调),即其旋律内容所源自的音阶类型,大调以1表示,小调以0表示
- **speechiness**:语音性(speechiness)用于检测曲目中的语音内容占比,越接近纯语音的录音(如脱口秀、有声书、诗歌朗诵),该属性值越接近1.0。取值高于0.66的曲目大概率完全由语音组成;取值介于0.33至0.66之间的曲目可能同时包含音乐与语音(分段呈现或叠加融合,例如说唱音乐);取值低于0.33的曲目则大概率为音乐及其他非语音类内容
- **acousticness**:声学置信度(acousticness)为0.0至1.0的度量值,代表曲目为原声音乐的置信水平,1.0代表极高置信度确认该曲目为原声音乐
- **instrumentalness**:器乐性(instrumentalness)用于预测曲目是否不含人声,“喔”“啊”类人声在此语境下被视为器乐声;说唱或语音类曲目则明确属于“含人声”范畴。器乐性数值越接近1.0,代表曲目不含人声的可能性越高
- **liveness**:现场感(liveness)用于检测录音中是否存在现场观众,数值越高则曲目为现场录制的概率越大,取值高于0.8时可高度确认该曲目为现场表演录音
- **valence**:愉悦度(valence)为0.0至1.0的度量值,用于描述曲目传递的音乐正向情绪:高愉悦度的曲目听起来更积极(例如欢快、愉悦、亢奋),而低愉悦度的曲目则传递消极情绪(例如悲伤、沮丧、愤怒)
- **tempo**:曲目整体预估节拍速度,单位为每分钟节拍数(BPM, beats per minute)。在音乐术语中,节拍速度指乐曲的快慢节奏,直接由平均节拍时长推导得出
- **time_signature**:预估拍号,拍号(meter)是用于指定每小节(或每乐句)内节拍数的记法惯例,取值范围为3至7,分别对应`3/4`至`7/4`的拍号
- **track_genre**:该曲目所属的音乐流派
# 来源与处理方法
本数据集通过Spotify官方Web API与Python语言完成数据采集与清洗工作。
提供机构:
PiriPiri71



