GenRM/OpenOrca-Open-Orca
收藏Hugging Face2025-05-11 更新2025-10-25 收录
下载链接:
https://hf-mirror.com/datasets/GenRM/OpenOrca-Open-Orca
下载链接
链接失效反馈官方服务:
资源简介:
OpenOrca数据集是FLAN Collection数据的增强版本,主要用于自然语言处理领域的训练和评估。该数据集包含对FLAN Collection中问题的GPT-4和GPT-3.5的响应。数据结构包括唯一标识符、系统提示、问题和响应字段。数据主要使用英语,支持包括语言建模、文本生成和文本增强在内的各种任务。README还介绍了数据集的创建过程、使用案例和一些使用数据集的注意事项。它提供了关于支持的语言、数据集结构以及数据集创建理由的详细信息。数据集未进行分割,目前仍在进行中,正在不断生成以扩大其范围。README还详细介绍了在该数据集上训练的官方模型及其在各种基准测试上的性能。它还包括相关研究论文和数据集的引用。
The OpenOrca dataset is an augmented version of the FLAN Collection data, primarily used for training and evaluation in natural language processing. It includes responses from GPT-4 and GPT-3.5 for questions in the FLAN Collection. The dataset is structured with fields for unique identifiers, system prompts, questions, and responses, primarily in English. It supports various tasks such as language modeling, text generation, and text augmentation. The README provides details on the creation process, use cases, and caveats for using the dataset. It also includes information on supported languages, dataset structure, and creation rationale. The dataset is unsplit and is a work in progress, with ongoing generation to expand its scope. The README also mentions the official models trained on this dataset and their performance on various benchmarks. It includes citations for related research papers and datasets.
提供机构:
GenRM



