Pile-HackerNews
收藏魔搭社区2025-10-15 更新2024-08-31 收录
下载链接:
https://modelscope.cn/datasets/OmniData/Pile-HackerNews
下载链接
链接失效反馈官方服务:
资源简介:
displayName: Pile-HackerNews
license:
- MIT
taskTypes:
- Language Modelling
- Natural Language Generation
mediaTypes:
- Text
labelTypes:
- English Corpus
tags: []
publisher:
- EleutherAI
publishDate: '2023-07-18'
publishUrl: https://pile.eleuther.ai/
paperUrl: ''
---
# 数据介绍
## 简介
Pile-HackerNews数据集是由Hacker News网站获取的数据构建的。
Hacker News是由初创孵化器和投资基金Y Combinator运营的链接聚合网站。
Pile-HackerNews数据集可以用于各种自然语言处理(NLP)任务和研究。
## 数据内容
### 数据说明
Pile-HackerNews数据集涵盖了4.1G的数据。
### 数据示例
```
{
"id": "176599180",
"source_id": "",
"doc_id": "64429528",
"data_type": "text",
"data_source": "pile",
"data_url": "enwiki-c4-pile-ccnews",
"content": "\n\nTagged pointers and fast-pathed CFNumber integers in Lion - DHowett\nhttp://objectivistc.tumblr.com/post/7872364181/tagged-pointers-and-fast-pathed-cfnumber-integers-in\n\n======\nDHowett\nI don't usually think runtime hacks are the way to go (who am I kidding? I\nlove runtime hacks), but this is damn clever on Apple's part.\n\n",
"remark": {
"pile_set_name": "HackerNews"
},
"sub_path": "hackernews/train"
}
```
## 引文
```
@misc{conghui2022opendatalab,
title={OpenDataLab: Empowering General Artificial Intelligence with Open Datasets},
author={Conghui He, Wei Li, Zhenjiang Jin, Bin Wang, Chao Xu, Dahua Lin},
journal={https://opendatalab.com/},
year={2022}
}
```
## Download dataset
:modelscope-code[]{type="git"}
数据集显示名称:Pile-HackerNews
许可证:MIT许可证
任务类型:语言建模(Language Modelling)、自然语言生成(Natural Language Generation)
媒体类型:文本
标签类型:英语语料库
标签:无
发布方:EleutherAI
发布日期:2023年7月18日
发布网址:https://pile.eleuther.ai/
论文网址:无
---
# 数据介绍
## 简介
Pile-HackerNews数据集基于Hacker News网站的公开数据构建而成。Hacker News是由初创孵化器与投资基金Y Combinator运营的链接聚合类社区网站。该数据集可应用于各类自然语言处理(Natural Language Processing,NLP)任务及相关研究。
## 数据内容
### 数据说明
Pile-HackerNews数据集的数据规模达4.1GB。
### 数据示例
{
"id": "176599180",
"source_id": "",
"doc_id": "64429528",
"data_type": "text",
"data_source": "pile",
"data_url": "enwiki-c4-pile-ccnews",
"content": "
Tagged pointers and fast-pathed CFNumber integers in Lion - DHowett
http://objectivistc.tumblr.com/post/7872364181/tagged-pointers-and-fast-pathed-cfnumber-integers-in
======
DHowett
I don't usually think runtime hacks are the way to go (who am I kidding? I
love runtime hacks), but this is damn clever on Apple's part.
",
"remark": {
"pile_set_name": "HackerNews"
},
"sub_path": "hackernews/train"
}
## 引文
@misc{conghui2022opendatalab,
title={OpenDataLab: Empowering General Artificial Intelligence with Open Datasets},
author={Conghui He, Wei Li, Zhenjiang Jin, Bin Wang, Chao Xu, Dahua Lin},
journal={https://opendatalab.com/},
year={2022}
}
## 下载数据集
:modelscope-code[]{type="git"}
提供机构:
maas
创建时间:
2024-07-11



