five

honeray/ai-music-comments-1.5M

收藏
Hugging Face2026-03-23 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/honeray/ai-music-comments-1.5M
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: afl-3.0 language: - en tags: - ai music - sentiment analysis - asba - comments task_categories: - text-classification - feature-extraction pretty_name: 1.5M Comments on AI Music size_categories: - 1M<n<10M --- ## What this is A corpus of >1.5M comments about AI music gathered from YouTube, Bilibili, Reddit, Twitter/X, and Weibo ## What the files are `raw_data`: the full dataset of >1.5M comments `proc_data`: the processed dataset (around 221K comments) that has filtered out any comments that either - have <5 likes, or - are under a video post (i.e. YouTube or Bilibili) with a views to comments ratio less than 0.001 `final_data`: the final dataset (1363 comments) with both a comment relevance and post relevance score >= 0.6 ## What the features are `post_title`: the title of the post `post_views`: the number of views that the post has `post_likes`: the number of likes that the post has `creator_id`: the ID of the creator of the post `creator_subs`: the number of subscribers the creator of the post has (in the case of YouTube or Bilibili) `post_comments`: the total number of comments under the post that the current comment was published under - this includes 弹幕 (bullet comments) in the case of Bilibili `comment`: the actual content of the comment itself `comment_likes`: the number of likes that the individual comment received (may be negative in the case of Reddit) `platform`: the social media platform where the comment was published `url`: the url of the post where the comment was published `comment_id`: the ID of the comment `retweets`: the number of retweets that the comment received (in the case of Twitter/X or Weibo) - In the case of Weibo, the number of retweets is no longer an exact amount after 1 million; for posts with more than 1 million retweets, 1 million is used instead. `replies`: the number of replies that the comment received `quotes`: the number of times that the comment was quoted `bookmarks`: the number of times that the comment was bookmarked `is_retweet`: whether the comment is a retweet of some original tweet `is_quote`: whether the comment is a quote from some original tweet `comment_parent_id`: the ID of the comment that this comment replied to `post_desc`: the description (if any) of the post that this comment was published under `post_dislikes`: the number of dislikes that the post the comment was published under received `post_favs`: the number of times that the post the comment was published under was favorited - this includes coins in the case of Bilibili `post_shares`: the number of times that the post the comment was published under was shared `post_keyword`: the keywords of the post the comment was published under `creator_likes`: the number of likes that the comment received from the creator of the post ## More Information For more information on how these comments were collected, please see the associated paper at (link coming soon) --- license: cc-by-nc-4.0 ---
提供机构:
honeray
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作