pinecone/refinedweb-generated-questions
收藏Hugging Face2024-01-18 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/pinecone/refinedweb-generated-questions
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
task_categories:
- question-answering
language:
- en
size_categories:
- 1K<n<10K
---
# Generated Questions and Answers from the Falcon RefinedWeb Dataset
This dataset contains 1k open-domain questions and answers generated using documents from Falcon's [refinedweb](https://huggingface.co/datasets/tiiuae/falcon-refinedweb) dataset using GPT-4. You can find more details about this work in the following [blogpost](https://www.pinecone.io/blog/rag-study/).
Each row consits of:
- **document_id** - an id of a text chunk from the refined web dataset, from which the question was generated. Each id contains the original document index from the refinedweb dataset, and the chunk index in the following format: "${REFINEDWEB_ID}_${CHUNK_INDEX}"
- **document_text** - the text of the chunk from which the question was generated.
- **generated_question** - the generated question.
- **generated_answer** - the corresponding generated answer.
提供机构:
pinecone
原始信息汇总
生成的来自Falcon RefinedWeb数据集的问题和答案
数据集概述
该数据集包含1k个开放域问题和答案,这些问题和答案是使用Falcon的refinedweb数据集中的文档通过GPT-4生成的。更多详细信息可以在以下博客文章中找到。
数据结构
每行数据包含以下字段:
- document_id - 从refined web数据集中提取的文本块的ID,格式为"${REFINEDWEB_ID}_${CHUNK_INDEX}"。
- document_text - 生成问题的文本块内容。
- generated_question - 生成的问题。
- generated_answer - 对应的生成答案。



