hacker-news
收藏Hugging Face2026-03-15 更新2026-03-20 收录
下载链接:
https://huggingface.co/datasets/open-index/hacker-news
下载链接
链接失效反馈官方服务:
资源简介:
Hacker News 完整存档数据集包含了自2006年10月以来Hacker News网站上的所有内容,包括故事、评论、Ask HN、Show HN、招聘信息以及投票等。该数据集由Y Combinator运营,是互联网上最具影响力的技术社区之一。数据集以每月一个Parquet文件的形式组织,并每5分钟实时更新一次,确保数据与网站同步。截至2026年3月15日,数据集已包含47,317,928条记录。数据集适用于文本生成、特征提取、文本分类和问答等多种自然语言处理任务。此外,数据集还提供了详细的统计信息,如内容类型分布、故事评分、最活跃的提交者等。
The complete archived Hacker News dataset aggregates all content published on the Hacker News platform since October 2006, including stories, comments, Ask HN, Show HN, job postings, and votes. As one of the most influential technical communities on the Internet, Hacker News is operated by Y Combinator. The dataset is organized into one Parquet file per month, with real-time updates every 5 minutes to maintain full synchronization with the official website. As of March 15, 2026, the dataset has accumulated 47,317,928 records. It supports a wide range of natural language processing (NLP) tasks, including text generation, feature extraction, text classification, and question answering. Furthermore, the dataset offers detailed statistical insights, such as content type distribution, story scores, and the most active submitters, among other metrics.
创建时间:
2026-03-14



