Hacker News Posts
收藏www.kaggle.com2016-09-27 更新2025-03-24 收录
下载链接:
https://www.kaggle.com/hacker-news/hacker-news-posts
下载链接
链接失效反馈官方服务:
资源简介:
This data set is Hacker News posts from the last 12 months (up to September 26 2016).
It includes the following columns:
- title: title of the post (self explanatory)
- url: the url of the item being linked to
- num_points: the number of upvotes the post received
- num_comments: the number of comments the post received
- author: the name of the account that made the post
- created_at: the date and time the post was made (the time zone is Eastern Time in the US)
One fun project suggestion is a model to predict the number of votes a post will attract.
The scraper is written, so I can keep this up-to-date and add more historical data. I can also scrape the comments. Just make the request in this dataset's forum.
The is a fork of minimaxir's HN scraper (thanks minimaxir):
[https://github.com/minimaxir/get-all-hacker-news-submissions-comments][1]
[1]: https://github.com/minimaxir/get-all-hacker-news-submissions-comments
本数据集收录了过去12个月(截至2016年9月26日)的 Hacker News 论坛帖子。该数据集包含以下列:
- 标题(self explanatory):帖子的标题
- 链接URL:指向所链接项目的URL
- 点赞数(num_points):帖子收到的点赞总数
- 评论数(num_comments):帖子收到的评论总数
- 作者(author):发布帖子的账户名称
- 发布时间(created_at):帖子创建的日期和时间(美国东部时间时区)
一项有趣的项目建议是构建一个模型,以预测帖子将吸引的投票数。
爬虫已编写,因此我可以保持数据集的更新并添加更多历史数据。我还可以抓取评论。只需在此数据集论坛中提出请求。
本数据集为 minimaxir 的 HN 爬虫的分支(感谢 minimaxir):
[https://github.com/minimaxir/get-all-hacker-news-submissions-comments][1]
[1]: https://github.com/minimaxir/get-all-hacker-news-submissions-comments
提供机构:
Kaggle



