gpt4all-j-prompt-generations
收藏魔搭社区2025-12-03 更新2025-01-04 收录
下载链接:
https://modelscope.cn/datasets/nomic-ai/gpt4all-j-prompt-generations
下载链接
链接失效反馈官方服务:
资源简介:
# Dataset Card for [GPT4All-J Prompt Generations]
## Dataset Description
Dataset used to train [GPT4All-J](https://huggingface.co/nomic-ai/gpt4all-j) and [GPT4All-J-LoRA](https://huggingface.co/nomic-ai/gpt4all-j-lora)
We release several versions of datasets
- **v1.0:** The original dataset we used to finetune GPT-J on
- **v1.1-breezy**: A filtered dataset where we removed all instances of `AI language model`
- **v1.2-jazzy**: A filtered dataset where we also removed instances like `I'm sorry, I can't answer...` and `AI language model`
- **v1.3-groovy**: The v1.2 dataset with ShareGPT and Dolly added with ~8% of semantic duplicates removed from the dataset using [Atlas](https://atlas.nomic.ai/)
The dataset defaults to `main` which is `v1.0`. To download a specific version, you can pass an argument to the keyword `revision` in `load_dataset`:
```python
from datasets import load_dataset
jazzy = load_dataset("nomic-ai/gpt4all-j-prompt-generations", revision='v1.2-jazzy')
```
- **Homepage:** [gpt4all.io](https://gpt4all.io/)
- **Repository:** [gpt4all](https://github.com/nomic-ai/gpt4all)
- **Paper:** [Technical Report](https://static.nomic.ai/gpt4all/2023_GPT4All-J_Technical_Report_2.pdf)
- **Atlas Map:** [Map of Prompts](https://atlas.nomic.ai/map/gpt4all-j-prompts-curated) and [Responses](https://atlas.nomic.ai/map/gpt4all-j-response-curated)
# GPT4All-J 提示生成数据集卡片
## 数据集说明
本数据集用于训练GPT4All-J(https://huggingface.co/nomic-ai/gpt4all-j)与GPT4All-J-LoRA(https://huggingface.co/nomic-ai/gpt4all-j-lora)。
我们发布了多个版本的数据集:
- **v1.0**:用于对GPT-J进行微调的原始数据集
- **v1.1-breezy**:经过过滤的数据集,移除了所有包含`AI语言模型(AI language model)`的条目
- **v1.2-jazzy**:进一步过滤后的数据集,额外移除了形如`I'm sorry, I can't answer...`(抱歉,我无法回答……)与`AI语言模型(AI language model)`的条目
- **v1.3-groovy**:基于v1.2数据集,新增了ShareGPT与Dolly数据集,并通过Atlas(https://atlas.nomic.ai/)移除了约8%的语义重复样本
数据集默认分支为`main`,对应v1.0版本。若需下载特定版本,可在`load_dataset`函数中通过`revision`关键字参数指定版本号:
python
from datasets import load_dataset
jazzy = load_dataset("nomic-ai/gpt4all-j-prompt-generations", revision='v1.2-jazzy')
- **项目主页**:[gpt4all.io](https://gpt4all.io/)
- **代码仓库**:[gpt4all](https://github.com/nomic-ai/gpt4all)
- **技术报告**:[技术报告](https://static.nomic.ai/gpt4all/2023_GPT4All-J_Technical_Report_2.pdf)
- **Atlas地图**:[提示词地图](https://atlas.nomic.ai/map/gpt4all-j-prompts-curated) 与 [回复地图](https://atlas.nomic.ai/map/gpt4all-j-response-curated)
提供机构:
maas
创建时间:
2024-12-31



