Project-Gutenberg
收藏魔搭社区2025-11-27 更新2025-11-03 收录
下载链接:
https://modelscope.cn/datasets/laion/Project-Gutenberg
下载链接
链接失效反馈官方服务:
资源简介:
<div style="text-align: center;">
<h2>Project Gutenberg</h2>
<img src="gutenberg.jpg" alt="Project Gutenberg" width="250" height="250" style="display: block; margin: 0 auto;">
</div>
Introducing Project Gutenberg, a dataset that provides access to all the books available in that project. In our dataset, we wanted to provide a bulk download option to have access to Gutenberg books in ten different languages such as English, German, French, Polish, Portuguese, Dutch, Spanish, Hebrew, Russian and Chinese.
English has the largest collection of books, followed by German. We are releasing this dataset for researchers and engineers to integrate these books for their artificial intelligence projects such as Embeddings, text-generation and fine-tuning. It is released under our Open-sci project at LAION AI.
### Dataset information
**Index date:** October 2024
Amount of books:
1. English - 56984
2. German - 2110
3. Polish - 30
4. Portuguese - 633
5. Spanish - 803
6. Hebrew - 6
7. Russian - 5
8. Chinese - 435
9. French - 3583
10. Dutch - 970
**Format of the books:** Epubs
**Where did we source the bulks?**
https://download.kiwix.org/zim/gutenberg/
## 古腾堡计划(Project Gutenberg)
古腾堡计划数据集可提供该计划旗下全部图书的访问权限。本数据集旨在提供批量下载选项,使用户可获取涵盖英语、德语、法语、波兰语、葡萄牙语、荷兰语、西班牙语、希伯来语、俄语及汉语在内的10种语言的古腾堡图书资源。
其中英语图书的馆藏规模最大,其次为德语图书。本数据集由LAION AI旗下的Open-sci项目发布,供研究人员与工程师将这些图书集成至其人工智能相关项目中,例如嵌入(Embeddings)、文本生成及微调(fine-tuning)任务。
### 数据集详情
**索引日期:** 2024年10月
**图书馆藏量:**
1. 英语:56984册
2. 德语:2110册
3. 波兰语:30册
4. 葡萄牙语:633册
5. 西班牙语:803册
6. 希伯来语:6册
7. 俄语:5册
8. 汉语:435册
9. 法语:3583册
10. 荷兰语:970册
**图书格式:** Epub格式
**批量资源获取来源:** https://download.kiwix.org/zim/gutenberg/
提供机构:
maas
创建时间:
2025-10-03



