five

dataformer/dolly-llama-qa

收藏
Hugging Face2024-08-07 更新2025-04-26 收录
下载链接:
https://hf-mirror.com/datasets/dataformer/dolly-llama-qa
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - en size_categories: - 1K<n<10K language_bcp47: - en-US configs: - config_name: llama_3.1_8B data_files: - split: train path: llama_3-1_8B_Instruct.jsonl - config_name: llama_3_8B data_files: - split: train path: llama_3_8B_Instruct.jsonl task_categories: - text-generation - question-answering tags: - synthetic --- # Dataset Card for dolly-llama-qa <!-- Provide a quick summary of the dataset. --> This dataset has been created with [dataformer](https://github.com/DataformerAI/dataformer). ## Dataset Details ### Dataset Description <!-- Provide a longer summary of what this dataset is. --> The dolly-llama-qa dataset is a synthetic QA pair dataset created using the context from [databricks-dolly-15k](https://huggingface.co/datasets/databricks/databricks-dolly-15k). We used [Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) and [Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct) models for the generation and evolution part. Openai's gpt-4o was used for evaluating the refined questions and refined answers. ## Dataset Columns <!-- This section provides a description of the dataset fields, and additional information about the dataset structure such as criteria used to create the splits, relationships between data points, etc. --> * `context`: the context taken from [databricks-dolly-15k](https://huggingface.co/datasets/databricks/databricks-dolly-15k) * `seed_question`: the seed question generated from the context using the instruct model. * `refined_question`: the question after evolving the seed question using the instruct model. * `initial_answer`: the answer to the refined_question generated using the instruct model * `refined_answer`: the answer after evolution of the initial answer using the instruct model. * `question_quality`: the quality of the question on a scale of 1-10 evaluated using gpt-4o * `answer_quality`: the quality of the answer on a scale of 1-10 evaluated using gpt-4o
提供机构:
dataformer
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作