five

hgissbkh/Syntec-reranking

收藏
Hugging Face2024-05-23 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/hgissbkh/Syntec-reranking
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: - config_name: bge-m3-custom-fr features: - name: query dtype: string - name: docs sequence: string - name: query_enc sequence: float64 - name: docs_enc sequence: sequence: float64 - name: cos_scores sequence: float64 - name: target sequence: int64 splits: - name: train num_bytes: 5880805 num_examples: 100 download_size: 2603698 dataset_size: 5880805 - config_name: multilingual-e5-base features: - name: query dtype: string - name: docs sequence: string - name: query_enc sequence: float64 - name: docs_enc sequence: sequence: float64 - name: cos_scores sequence: float64 - name: target sequence: int64 splits: - name: train num_bytes: 4584421 num_examples: 100 download_size: 1921909 dataset_size: 4584421 - config_name: multilingual-e5-large features: - name: query dtype: string - name: docs sequence: string - name: query_enc sequence: float64 - name: docs_enc sequence: sequence: float64 - name: cos_scores sequence: float64 - name: target sequence: int64 splits: - name: train num_bytes: 5880805 num_examples: 100 download_size: 2599703 dataset_size: 5880805 - config_name: multilingual-e5-small features: - name: query dtype: string - name: docs sequence: string - name: query_enc sequence: float64 - name: docs_enc sequence: sequence: float64 - name: cos_scores sequence: float64 - name: target sequence: int64 splits: - name: train num_bytes: 2639845 num_examples: 100 download_size: 944927 dataset_size: 2639845 - config_name: sentence-t5-large features: - name: query dtype: string - name: docs sequence: string - name: query_enc sequence: float64 - name: docs_enc sequence: sequence: float64 - name: cos_scores sequence: float64 - name: target sequence: int64 splits: - name: train num_bytes: 4584421 num_examples: 100 download_size: 1921682 dataset_size: 4584421 - config_name: sentence_croissant_alpha_v0.4 features: - name: query dtype: string - name: docs sequence: string - name: query_enc sequence: float64 - name: docs_enc sequence: sequence: float64 - name: cos_scores sequence: float64 - name: target sequence: int64 splits: - name: train num_bytes: 11066341 num_examples: 100 download_size: 7715536 dataset_size: 11066341 configs: - config_name: bge-m3-custom-fr data_files: - split: train path: bge-m3-custom-fr/train-* - config_name: multilingual-e5-base data_files: - split: train path: multilingual-e5-base/train-* - config_name: multilingual-e5-large data_files: - split: train path: multilingual-e5-large/train-* - config_name: multilingual-e5-small data_files: - split: train path: multilingual-e5-small/train-* - config_name: sentence-t5-large data_files: - split: train path: sentence-t5-large/train-* - config_name: sentence_croissant_alpha_v0.4 data_files: - split: train path: sentence_croissant_alpha_v0.4/train-* ---
提供机构:
hgissbkh
原始信息汇总

数据集概述

1. bge-m3-custom-fr

  • 特征:
    • query: 字符串
    • docs: 字符串序列
    • query_enc: 浮点数序列
    • docs_enc: 浮点数序列
    • cos_scores: 浮点数序列
    • target: 整数序列
  • 分割:
    • train: 100个样本,5880805字节
  • 下载大小: 2603698字节
  • 数据集大小: 5880805字节

2. multilingual-e5-base

  • 特征:
    • query: 字符串
    • docs: 字符串序列
    • query_enc: 浮点数序列
    • docs_enc: 浮点数序列
    • cos_scores: 浮点数序列
    • target: 整数序列
  • 分割:
    • train: 100个样本,4584421字节
  • 下载大小: 1921909字节
  • 数据集大小: 4584421字节

3. multilingual-e5-large

  • 特征:
    • query: 字符串
    • docs: 字符串序列
    • query_enc: 浮点数序列
    • docs_enc: 浮点数序列
    • cos_scores: 浮点数序列
    • target: 整数序列
  • 分割:
    • train: 100个样本,5880805字节
  • 下载大小: 2599703字节
  • 数据集大小: 5880805字节

4. multilingual-e5-small

  • 特征:
    • query: 字符串
    • docs: 字符串序列
    • query_enc: 浮点数序列
    • docs_enc: 浮点数序列
    • cos_scores: 浮点数序列
    • target: 整数序列
  • 分割:
    • train: 100个样本,2639845字节
  • 下载大小: 944927字节
  • 数据集大小: 2639845字节

5. sentence-t5-large

  • 特征:
    • query: 字符串
    • docs: 字符串序列
    • query_enc: 浮点数序列
    • docs_enc: 浮点数序列
    • cos_scores: 浮点数序列
    • target: 整数序列
  • 分割:
    • train: 100个样本,4584421字节
  • 下载大小: 1921682字节
  • 数据集大小: 4584421字节

6. sentence_croissant_alpha_v0.4

  • 特征:
    • query: 字符串
    • docs: 字符串序列
    • query_enc: 浮点数序列
    • docs_enc: 浮点数序列
    • cos_scores: 浮点数序列
    • target: 整数序列
  • 分割:
    • train: 100个样本,11066341字节
  • 下载大小: 7715536字节
  • 数据集大小: 11066341字节
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作