five

NomaDamas/split_search_qa

收藏
Hugging Face2024-01-04 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/NomaDamas/split_search_qa
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: unknown dataset_info: - config_name: corpus features: - name: query_id dtype: string - name: snippets dtype: string - name: air_date dtype: string - name: category dtype: string - name: value dtype: string - name: round dtype: string - name: show_number dtype: int32 - name: doc_id dtype: string - name: __index_level_0__ dtype: int64 splits: - name: train num_bytes: 6252715344 num_examples: 14120776 download_size: 3271155810 dataset_size: 6252715344 - config_name: qa_data features: - name: query_id dtype: string - name: question dtype: string - name: answer dtype: string - name: search_results struct: - name: related_links sequence: string - name: snippets sequence: string - name: titles sequence: string - name: urls sequence: string - name: doc_id sequence: string - name: __index_level_0__ dtype: int64 splits: - name: train num_bytes: 6503932619 num_examples: 173397 - name: test num_bytes: 1830028629 num_examples: 43350 download_size: 5008413626 dataset_size: 8333961248 configs: - config_name: corpus data_files: - split: train path: corpus/train-* - config_name: qa_data data_files: - split: train path: qa_data/train-* - split: test path: qa_data/test-* --- # preprocessed_SearchQA The SearchQA question-answer pairs originate from J! Archive2, which comprehensively archives all question-answer pairs from the renowned television show Jeopardy! The passages, sourced from Google search web page snippets. We offer passage metadata, encompassing details like 'air_date,' 'category,' 'value,' 'round,' and 'show_number,' enabling you to enhance retrieval performance at your discretion. Should you require further details about SearchQA, please refer to below links. [Github](https://github.com/nyu-dl/dl4ir-searchQA)<br> [Paper](https://arxiv.org/abs/1704.05179)<br> The dataset is derived from [searhQA](https://huggingface.co/datasets/search_qa).<br> This preprocessed dataset is for RAG. For more information about our task, visit our [repository](https://github.com/NomaDamas/RAGchain)!<br> Preprocess SearchQA dataset code for RAG benchmark. <br> More information, refer to this link! [huggingface](https://huggingface.co/datasets/NomaDamas/search_qa_split)
提供机构:
NomaDamas
原始信息汇总

数据集概述

数据集配置

  • corpus

    • 特征
      • query_id: 字符串
      • snippets: 字符串
      • air_date: 字符串
      • category: 字符串
      • value: 字符串
      • round: 字符串
      • show_number: 整数 (int32)
      • doc_id: 字符串
      • __index_level_0__: 整数 (int64)
    • 分割
      • train: 字节数 6252715344, 样本数 14120776
    • 下载大小: 3271155810 字节
    • 数据集大小: 6252715344 字节
  • qa_data

    • 特征
      • query_id: 字符串
      • question: 字符串
      • answer: 字符串
      • search_results: 结构体
        • related_links: 字符串序列
        • snippets: 字符串序列
        • titles: 字符串序列
        • urls: 字符串序列
      • doc_id: 字符串序列
      • __index_level_0__: 整数 (int64)
    • 分割
      • train: 字节数 6503932619, 样本数 173397
      • test: 字节数 1830028629, 样本数 43350
    • 下载大小: 5008413626 字节
    • 数据集大小: 8333961248 字节

数据文件

  • corpus

    • train: corpus/train-*
  • qa_data

    • train: qa_data/train-*
    • test: qa_data/test-*
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作