MapleSage/msxgpt-dataset
收藏Hugging Face2023-07-15 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/MapleSage/msxgpt-dataset
下载链接
链接失效反馈官方服务:
资源简介:
# msxgpt
## Description
This dataset, "msxgpt," is designed for training the GPT-3.5-turbo/ GPT-4 based language model for a task. The data consists of JSON lines, each representing an individual example for the model.
The dataset has been created with an emphasis on encoding, which is pivotal to the functionality of Memory Features, Security, and API Endpoints. It is designed to process and store documents from various data sources continuously, using incoming webhooks to the upsert and delete endpoints.
Potential applications of this dataset could range from natural language understanding tasks to more specialized uses. For instance, tools like Zapier or Make can help configure the webhooks based on events or schedule, enabling sophisticated automation and workflow configuration capabilities.
## Data Structure
Each line in the `train.jsonl` file is a JSON object with the following structure:
```json
{
"input": "string",
"target": "string"
}
Usage
import json
with open('train.jsonl', 'r') as f:
for line in f:
obj = json.loads(line)
print(obj['input'], obj['target'])
License
This dataset is made available under the Creative Commons CC0 4.0 Universal (CC0 4.0) Public Domain Dedication. You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission.
The above example states that anyone can use the dataset for any purpose without needing to ask for permission, which maximizes the dataset's usability.
提供机构:
MapleSage
原始信息汇总
msxgpt 数据集概述
描述
"msxgpt" 数据集旨在用于训练基于 GPT-3.5-turbo/GPT-4 的语言模型。数据集由 JSON 行组成,每行代表模型的一个独立示例。该数据集强调编码,这对内存功能、安全性和 API 端点的功能至关重要。它设计用于持续处理和存储来自各种数据源的文档,使用传入的 webhook 到 upsert 和删除端点。
数据结构
train.jsonl 文件中的每一行都是一个 JSON 对象,具有以下结构:
json { "input": "string", "target": "string" }
使用示例
python import json
with open(train.jsonl, r) as f: for line in f: obj = json.loads(line) print(obj[input], obj[target])
许可证
该数据集在 Creative Commons CC0 4.0 Universal (CC0 4.0) Public Domain Dedication 下发布。您可以复制、修改、分发和执行作品,即使是商业目的,也无需请求许可。



