teknium/dataforge-economics
收藏Hugging Face2023-11-12 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/teknium/dataforge-economics
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- eng
pretty_name: "DataForge-Economics"
tags:
- economics
license: mit
---

# Dataset Card for dataforge-economics
## Table of Contents
- [Overview](#overview)
- [Dataset Description](#dataset-description)
- [Data Collection and Synthesis](#data-collection-and-synthesis)
- [Data Structure](#data-structure)
- [Licensing, Privacy, and Ethics](#licensing-privacy-and-ethics)
- [Access](#access)
- [Usage](#usage)
- [Citation](#citation)
- [Contributions](#contributions)
## Overview
This dataset, `teknium/dataforge-economics`, is a specialized collection of 1,000 synthetic examples in the field of economics. It has been generated using OpenAI's GPT-4 and a custom data synthesis pipeline named DataForge, developed by me.
## Dataset Description
### Data Collection and Synthesis
The data in `teknium/dataforge-economics` has been synthetically generated using OpenAI's GPT-4 language model. The synthesis process was enhanced and structured using the DataForge pipeline, which incorporates domain-specific knowledge and ensures relevance in economics topics.
### Data Structure
- **Size of dataset:** 1000 examples
- **Type of data:** Textual (Economics domain-specific)
- **Data format:** JSON
- **Fields:**
- - id: a randomly generated uuid
- conversations: single turn human & gpt turns in sharegpt format
- source: the dataset name itself, for metadata purposes when merging with others
- topic: the sub-topic for the domain
- system_prompt: type of system prompt used for generating the response.
## Licensing, Privacy, and Ethics
- **License:** MIT License
- **Special Considerations:** This datasest is purely generated from GPT-4 data, some information may be incorrect or invalid.
- **Privacy:** As the dataset is synthetically generated, it does not contain any real individual's data.
## Access
- **Availability:** General Access
## Usage
This dataset is a domain specialist dataset, the first to use my new pipeline called Data Forge, which can create domain expert knowledge (and tasks, as seen in the Trismegistus occult dataset)
This dataset was a proof of concept to improve upon Orca model's economics expertise, which surpassed my custom benchmark for economics when finetuned over stable beluga.
提供机构:
teknium
原始信息汇总
数据集卡片 for dataforge-economics
概述
该数据集 teknium/dataforge-economics 是一个专门收集的1,000个合成示例,涵盖经济学领域。它由OpenAI的GPT-4生成,并使用名为DataForge的自定义数据合成管道进行增强和结构化。
数据集描述
数据收集和合成
teknium/dataforge-economics 中的数据是通过OpenAI的GPT-4语言模型合成生成的。合成过程通过DataForge管道进行增强和结构化,该管道结合了领域特定知识,确保了经济学主题的相关性。
数据结构
- 数据集大小: 1000个示例
- 数据类型: 文本(经济学领域特定)
- 数据格式: JSON
- 字段:
- id: 随机生成的uuid
- conversations: 单轮人类与GPT对话,采用sharegpt格式
- source: 数据集名称本身,用于与其他数据集合并时的元数据
- topic: 领域子主题
- system_prompt: 用于生成响应的系统提示类型
许可、隐私和伦理
- 许可: MIT许可证
- 特别注意事项: 该数据集完全由GPT-4数据生成,某些信息可能不正确或无效。
- 隐私: 由于数据集是合成生成的,不包含任何真实个体的数据。
访问
- 可用性: 公开访问
使用
该数据集是一个领域专家数据集,首次使用名为Data Forge的新管道,该管道可以创建领域专家知识(如在Trismegistus神秘学数据集中所见)。该数据集是一个概念验证,旨在改进Orca模型的经济学专业知识,在stable beluga上微调后超过了自定义的经济学基准。



