five

xPXXX/tevatron_wikipedia-nq_sample100

收藏
Hugging Face2024-04-17 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/xPXXX/tevatron_wikipedia-nq_sample100
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: - config_name: default features: - name: query_id dtype: string - name: query dtype: string - name: answers list: string - name: positive_passages list: - name: docid dtype: string - name: text dtype: string - name: title dtype: string - name: negative_passages list: - name: docid dtype: string - name: text dtype: string - name: title dtype: string splits: - name: train num_bytes: 6516114.769199276 num_examples: 100 download_size: 3750358 dataset_size: 6516114.769199276 - config_name: finetune_llama2_no_rag features: - name: query_id dtype: string - name: query dtype: string - name: answers list: string - name: positive_passages list: - name: docid dtype: string - name: text dtype: string - name: title dtype: string - name: negative_passages list: - name: docid dtype: string - name: text dtype: string - name: title dtype: string - name: finetune_llama2_no_rag_response dtype: string splits: - name: train num_bytes: 6539762 num_examples: 100 download_size: 3774951 dataset_size: 6539762 - config_name: finetune_llama2_rag features: - name: query_id dtype: string - name: query dtype: string - name: answers list: string - name: positive_passages list: - name: docid dtype: string - name: text dtype: string - name: title dtype: string - name: negative_passages list: - name: docid dtype: string - name: text dtype: string - name: title dtype: string - name: finetune_llama2_rag_response dtype: string - name: retrieval sequence: string splits: - name: train num_bytes: 6781481 num_examples: 100 download_size: 3872487 dataset_size: 6781481 - config_name: gpt3_no_rag features: - name: query_id dtype: string - name: query dtype: string - name: answers list: string - name: positive_passages list: - name: docid dtype: string - name: text dtype: string - name: title dtype: string - name: negative_passages list: - name: docid dtype: string - name: text dtype: string - name: title dtype: string - name: gpt3_no_rag_response dtype: string splits: - name: train num_bytes: 6530548 num_examples: 100 download_size: 3769177 dataset_size: 6530548 - config_name: gpt3_rag features: - name: query_id dtype: string - name: query dtype: string - name: answers list: string - name: positive_passages list: - name: docid dtype: string - name: text dtype: string - name: title dtype: string - name: negative_passages list: - name: docid dtype: string - name: text dtype: string - name: title dtype: string - name: gpt3_rag_response dtype: string - name: retrieval sequence: string splits: - name: train num_bytes: 6794643 num_examples: 100 download_size: 3882537 dataset_size: 6794643 - config_name: gpt4_no_rag features: - name: query_id dtype: string - name: query dtype: string - name: answers list: string - name: positive_passages list: - name: docid dtype: string - name: text dtype: string - name: title dtype: string - name: negative_passages list: - name: docid dtype: string - name: text dtype: string - name: title dtype: string - name: gpt4_no_rag_response dtype: string splits: - name: train num_bytes: 6565730 num_examples: 100 download_size: 3789108 dataset_size: 6565730 - config_name: gpt4_rag features: - name: query_id dtype: string - name: query dtype: string - name: answers list: string - name: positive_passages list: - name: docid dtype: string - name: text dtype: string - name: title dtype: string - name: negative_passages list: - name: docid dtype: string - name: text dtype: string - name: title dtype: string - name: gpt4_rag_response dtype: string - name: retrieval sequence: string splits: - name: train num_bytes: 6801342 num_examples: 100 download_size: 3885780 dataset_size: 6801342 - config_name: llama2_no_rag features: - name: query_id dtype: string - name: query dtype: string - name: answers list: string - name: positive_passages list: - name: docid dtype: string - name: text dtype: string - name: title dtype: string - name: negative_passages list: - name: docid dtype: string - name: text dtype: string - name: title dtype: string - name: llama2_no_rag_response dtype: string splits: - name: train num_bytes: 6550530 num_examples: 100 download_size: 3781797 dataset_size: 6550530 - config_name: llama2_rag features: - name: query_id dtype: string - name: query dtype: string - name: answers list: string - name: positive_passages list: - name: docid dtype: string - name: text dtype: string - name: title dtype: string - name: negative_passages list: - name: docid dtype: string - name: text dtype: string - name: title dtype: string - name: llama2_rag_response dtype: string - name: retrieval sequence: string splits: - name: train num_bytes: 6789056 num_examples: 100 download_size: 3877897 dataset_size: 6789056 configs: - config_name: default data_files: - split: train path: data/train-* - config_name: finetune_llama2_no_rag data_files: - split: train path: finetune_llama2_no_rag/train-* - config_name: finetune_llama2_rag data_files: - split: train path: finetune_llama2_rag/train-* - config_name: gpt3_no_rag data_files: - split: train path: gpt3_no_rag/train-* - config_name: gpt3_rag data_files: - split: train path: gpt3_rag/train-* - config_name: gpt4_no_rag data_files: - split: train path: gpt4_no_rag/train-* - config_name: gpt4_rag data_files: - split: train path: gpt4_rag/train-* - config_name: llama2_no_rag data_files: - split: train path: llama2_no_rag/train-* - config_name: llama2_rag data_files: - split: train path: llama2_rag/train-* ---
提供机构:
xPXXX
原始信息汇总

数据集概述

数据集配置

默认配置 (default)

  • 特征:
    • query_id: 字符串
    • query: 字符串
    • answers: 字符串列表
    • positive_passages: 列表
      • docid: 字符串
      • text: 字符串
      • title: 字符串
    • negative_passages: 列表
      • docid: 字符串
      • text: 字符串
      • title: 字符串
  • 分割:
    • train:
      • 字节数: 6516114.769199276
      • 样本数: 100
  • 下载大小: 3750358
  • 数据集大小: 6516114.769199276

微调 Llama2 无 RAG (finetune_llama2_no_rag)

  • 特征:
    • query_id: 字符串
    • query: 字符串
    • answers: 字符串列表
    • positive_passages: 列表
      • docid: 字符串
      • text: 字符串
      • title: 字符串
    • negative_passages: 列表
      • docid: 字符串
      • text: 字符串
      • title: 字符串
    • finetune_llama2_no_rag_response: 字符串
  • 分割:
    • train:
      • 字节数: 6539762
      • 样本数: 100
  • 下载大小: 3774951
  • 数据集大小: 6539762

微调 Llama2 RAG (finetune_llama2_rag)

  • 特征:
    • query_id: 字符串
    • query: 字符串
    • answers: 字符串列表
    • positive_passages: 列表
      • docid: 字符串
      • text: 字符串
      • title: 字符串
    • negative_passages: 列表
      • docid: 字符串
      • text: 字符串
      • title: 字符串
    • finetune_llama2_rag_response: 字符串
    • retrieval: 字符串序列
  • 分割:
    • train:
      • 字节数: 6781481
      • 样本数: 100
  • 下载大小: 3872487
  • 数据集大小: 6781481

GPT-3 无 RAG (gpt3_no_rag)

  • 特征:
    • query_id: 字符串
    • query: 字符串
    • answers: 字符串列表
    • positive_passages: 列表
      • docid: 字符串
      • text: 字符串
      • title: 字符串
    • negative_passages: 列表
      • docid: 字符串
      • text: 字符串
      • title: 字符串
    • gpt3_no_rag_response: 字符串
  • 分割:
    • train:
      • 字节数: 6530548
      • 样本数: 100
  • 下载大小: 3769177
  • 数据集大小: 6530548

GPT-3 RAG (gpt3_rag)

  • 特征:
    • query_id: 字符串
    • query: 字符串
    • answers: 字符串列表
    • positive_passages: 列表
      • docid: 字符串
      • text: 字符串
      • title: 字符串
    • negative_passages: 列表
      • docid: 字符串
      • text: 字符串
      • title: 字符串
    • gpt3_rag_response: 字符串
    • retrieval: 字符串序列
  • 分割:
    • train:
      • 字节数: 6794643
      • 样本数: 100
  • 下载大小: 3882537
  • 数据集大小: 6794643

GPT-4 无 RAG (gpt4_no_rag)

  • 特征:
    • query_id: 字符串
    • query: 字符串
    • answers: 字符串列表
    • positive_passages: 列表
      • docid: 字符串
      • text: 字符串
      • title: 字符串
    • negative_passages: 列表
      • docid: 字符串
      • text: 字符串
      • title: 字符串
    • gpt4_no_rag_response: 字符串
  • 分割:
    • train:
      • 字节数: 6565730
      • 样本数: 100
  • 下载大小: 3789108
  • 数据集大小: 6565730

GPT-4 RAG (gpt4_rag)

  • 特征:
    • query_id: 字符串
    • query: 字符串
    • answers: 字符串列表
    • positive_passages: 列表
      • docid: 字符串
      • text: 字符串
      • title: 字符串
    • negative_passages: 列表
      • docid: 字符串
      • text: 字符串
      • title: 字符串
    • gpt4_rag_response: 字符串
    • retrieval: 字符串序列
  • 分割:
    • train:
      • 字节数: 6801342
      • 样本数: 100
  • 下载大小: 3885780
  • 数据集大小: 6801342

Llama2 无 RAG (llama2_no_rag)

  • 特征:
    • query_id: 字符串
    • query: 字符串
    • answers: 字符串列表
    • positive_passages: 列表
      • docid: 字符串
      • text: 字符串
      • title: 字符串
    • negative_passages: 列表
      • docid: 字符串
      • text: 字符串
      • title: 字符串
    • llama2_no_rag_response: 字符串
  • 分割:
    • train:
      • 字节数: 6550530
      • 样本数: 100
  • 下载大小: 3781797
  • 数据集大小: 6550530

Llama2 RAG (llama2_rag)

  • 特征:
    • query_id: 字符串
    • query: 字符串
    • answers: 字符串列表
    • positive_passages: 列表
      • docid: 字符串
      • text: 字符串
      • title: 字符串
    • negative_passages: 列表
      • docid: 字符串
      • text: 字符串
      • title: 字符串
    • llama2_rag_response: 字符串
    • retrieval: 字符串序列
  • 分割:
    • train:
      • 字节数: 6789056
      • 样本数: 100
  • 下载大小: 3877897
  • 数据集大小: 6789056
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作