simonko912/chan-shitpost-1.5
收藏Hugging Face2026-03-14 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/simonko912/chan-shitpost-1.5
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
task_categories:
- text-generation
language:
- en
tags:
- chan,
- 4chan
- shitpost
pretty_name: Chan shitpost 1.5
size_categories:
- 100K<n<1M
---
This is a dataset made by scraping irs, and chan like sites.
Use this to train a ai to talk like humans do.
List of scraped sites for this version of Chan Shitpost:
| Site | Rows |
-----------------------------|----------------------------|
| lolcow.farm | 122,927 rows |
| crystal.cafe | 16,891 rows |
| lainchan.org | 15,664 rows |
| a.4cdn.org | 11,374 rows |
| 8kun.top | 4,554 rows |
| wizchan.org | 2,561 rows |
| beehaw.org | 1,354 rows |
| whitequark-irc | 810 rows |
| t.me | 59 rows |
| 112chan.ro | 54 rows |
许可证: Apache-2.0
任务类别:
- 文本生成
语言:
- 英语
标签:
- chan
- 4chan
- 烂梗水帖(shitpost)
数据集名称: Chan 烂梗水帖 1.5版
规模类别:
- 10万<样本量<100万
本数据集通过爬取互联网中继聊天(IRC,原文疑似笔误为irs)及类Chan论坛站点构建而成,可用于训练人工智能模型以模拟人类自然对话风格。
本版Chan烂梗水帖数据集的爬取站点列表如下:
| 站点名称 | 条目数量 |
---------------------------|---------------------------|
| lolcow.farm | 122,927 条 |
| crystal.cafe | 16,891 条 |
| lainchan.org | 15,664 条 |
| a.4cdn.org | 11,374 条 |
| 8kun.top | 4,554 条 |
| wizchan.org | 2,561 条 |
| beehaw.org | 1,354 条 |
| whitequark-irc | 810 条 |
| t.me | 59 条 |
| 112chan.ro | 54 条 |
提供机构:
simonko912



