honeray/ai-music-comments-1.5M
收藏Hugging Face2026-03-23 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/honeray/ai-music-comments-1.5M
下载链接
链接失效反馈官方服务:
资源简介:
---
license: afl-3.0
language:
- en
tags:
- ai music
- sentiment analysis
- asba
- comments
task_categories:
- text-classification
- feature-extraction
pretty_name: 1.5M Comments on AI Music
size_categories:
- 1M<n<10M
---
## What this is
A corpus of >1.5M comments about AI music gathered from YouTube, Bilibili, Reddit, Twitter/X, and Weibo
## What the files are
`raw_data`: the full dataset of >1.5M comments
`proc_data`: the processed dataset (around 221K comments) that has filtered out any comments that either
- have <5 likes, or
- are under a video post (i.e. YouTube or Bilibili) with a views to comments ratio less than 0.001
`final_data`: the final dataset (1363 comments) with both a comment relevance and post relevance score >= 0.6
## What the features are
`post_title`: the title of the post
`post_views`: the number of views that the post has
`post_likes`: the number of likes that the post has
`creator_id`: the ID of the creator of the post
`creator_subs`: the number of subscribers the creator of the post has (in the case of YouTube or Bilibili)
`post_comments`: the total number of comments under the post that the current comment was published under
- this includes 弹幕 (bullet comments) in the case of Bilibili
`comment`: the actual content of the comment itself
`comment_likes`: the number of likes that the individual comment received (may be negative in the case of Reddit)
`platform`: the social media platform where the comment was published
`url`: the url of the post where the comment was published
`comment_id`: the ID of the comment
`retweets`: the number of retweets that the comment received (in the case of Twitter/X or Weibo)
- In the case of Weibo, the number of retweets is no longer an exact amount after 1 million; for posts with more than 1 million retweets, 1 million is used instead.
`replies`: the number of replies that the comment received
`quotes`: the number of times that the comment was quoted
`bookmarks`: the number of times that the comment was bookmarked
`is_retweet`: whether the comment is a retweet of some original tweet
`is_quote`: whether the comment is a quote from some original tweet
`comment_parent_id`: the ID of the comment that this comment replied to
`post_desc`: the description (if any) of the post that this comment was published under
`post_dislikes`: the number of dislikes that the post the comment was published under received
`post_favs`: the number of times that the post the comment was published under was favorited
- this includes coins in the case of Bilibili
`post_shares`: the number of times that the post the comment was published under was shared
`post_keyword`: the keywords of the post the comment was published under
`creator_likes`: the number of likes that the comment received from the creator of the post
## More Information
For more information on how these comments were collected, please see the associated paper at (link coming soon)
---
license: cc-by-nc-4.0
---
提供机构:
honeray



