mrfakename/fiction-instruct
收藏Hugging Face2024-05-07 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/mrfakename/fiction-instruct
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-sa-4.0
tags:
- book
- novel
- fiction
- writing
task_categories:
- text-generation
language:
- en
- fr
- de
- es
- it
- pt
- ca
- nl
- fi
- da
- tl
- sv
- hu
- af
- et
- no
- sl
- cy
- lt
- el
- ro
- sw
- hr
---
# Fiction Instruct
This is a processed version of the amazing [Brahe dataset](https://huggingface.co/datasets/Pclanglais/Brahe-Novels), which consists of short excerpts from public domain novels.
The original Brahe dataset included synthetically generated descriptions of the book (such as a short summary, tone, time, etc.). This dataset has transformed the book descriptpions into instructions (i.e. "Write me a story about ...").
The dataset is _highly_ multilingual. However, non-English passages normally have the languages in the instructions.
All credit goes to the original [Brahe dataset](https://huggingface.co/datasets/Pclanglais/Brahe-Novels).
## Languages
Languages (as detected by `langdetect`), by # of instances:
| Language | Instances |
| --- | --- |
| English | 4304 |
| French | 974 |
| German | 690 |
| Spanish | 474 |
| Italian | 414 |
| Portuguese | 343 |
| Catalan | 191 |
| Dutch | 184 |
| Finnish | 154 |
| Danish | 127 |
| Filipino | 81 |
| Swedish | 77 |
| Hungarian | 54 |
| Afrikaans | 42 |
| Estonian | 32 |
| Norwegian | 19 |
| Slovenian | 16 |
| Welsh | 16 |
| Lithuanian | 12 |
| Greek | 11 |
| Romanian | 6 |
| Swahili | 3 |
| Croatian | 2 |
## License
Novels are licensed under CC0, as the original Brahe dataset is licensed under that license. Instructions are licensed under CC-BY-SA 4.0. You can use the entire dataset combined under that license. No restrictions are placed on models trained on this dataset.
提供机构:
mrfakename
原始信息汇总
Fiction Instruct 数据集概述
数据集描述
- 来源:Fiction Instruct 数据集是基于 Brahe 数据集 的一个处理版本,包含公共领域小说的简短摘录。
- 特点:原 Brahe 数据集中的书籍描述已被转换为指令形式(例如:“给我写一个关于...的故事”)。
- 多语言性:数据集支持多种语言,非英语部分通常包含语言标识。
支持的语言
- 语言列表:英语、法语、德语、西班牙语、意大利语、葡萄牙语、加泰罗尼亚语、荷兰语、芬兰语、丹麦语、菲律宾语、瑞典语、匈牙利语、阿非利卡语、爱沙尼亚语、挪威语、斯洛文尼亚语、威尔士语、立陶宛语、希腊语、罗马尼亚语、斯瓦希里语、克罗地亚语。
- 实例数量:
- 英语:4304
- 法语:974
- 德语:690
- 西班牙语:474
- 意大利语:414
- 葡萄牙语:343
- 加泰罗尼亚语:191
- 荷兰语:184
- 芬兰语:154
- 丹麦语:127
- 菲律宾语:81
- 瑞典语:77
- 匈牙利语:54
- 阿非利卡语:42
- 爱沙尼亚语:32
- 挪威语:19
- 斯洛文尼亚语:16
- 威尔士语:16
- 立陶宛语:12
- 希腊语:11
- 罗马尼亚语:6
- 斯瓦希里语:3
- 克罗地亚语:2
许可证
- 小说部分:CC0 许可证。
- 指令部分:CC-BY-SA 4.0 许可证。
- 整体使用:整个数据集可按 CC-BY-SA 4.0 许可证使用。
- 模型训练:无限制。



