hgissbkh/SciDocs-reranking
收藏Hugging Face2024-05-22 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/hgissbkh/SciDocs-reranking
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
- config_name: GIST-large-Embedding-v0
features:
- name: query
dtype: string
- name: docs
sequence: string
- name: query_enc
sequence: float64
- name: docs_enc
sequence:
sequence: float64
- name: cos_scores
sequence: float64
- name: target
sequence: int64
splits:
- name: train
num_bytes: 1015868866
num_examples: 3978
download_size: 781180384
dataset_size: 1015868866
- config_name: ember-v1
features:
- name: query
dtype: string
- name: docs
sequence: string
- name: query_enc
sequence: float64
- name: docs_enc
sequence:
sequence: float64
- name: cos_scores
sequence: float64
- name: target
sequence: int64
splits:
- name: train
num_bytes: 1015868866
num_examples: 3978
download_size: 780823540
dataset_size: 1015868866
- config_name: multilingual-e5-base
features:
- name: query
dtype: string
- name: docs
sequence: string
- name: query_enc
sequence: float64
- name: docs_enc
sequence:
sequence: float64
- name: cos_scores
sequence: float64
- name: target
sequence: int64
splits:
- name: train
num_bytes: 764829122
num_examples: 3978
download_size: 585932187
dataset_size: 764829122
- config_name: multilingual-e5-large
features:
- name: query
dtype: string
- name: docs
sequence: string
- name: query_enc
sequence: float64
- name: docs_enc
sequence:
sequence: float64
- name: cos_scores
sequence: float64
- name: target
sequence: int64
splits:
- name: train
num_bytes: 1015868866
num_examples: 3978
download_size: 778483474
dataset_size: 1015868866
- config_name: multilingual-e5-small
features:
- name: query
dtype: string
- name: docs
sequence: string
- name: query_enc
sequence: float64
- name: docs_enc
sequence:
sequence: float64
- name: cos_scores
sequence: float64
- name: target
sequence: int64
splits:
- name: train
num_bytes: 388269506
num_examples: 3978
download_size: 295290293
dataset_size: 388269506
- config_name: mxbai-embed-large-v1
features:
- name: query
dtype: string
- name: docs
sequence: string
- name: query_enc
sequence: float64
- name: docs_enc
sequence:
sequence: float64
- name: cos_scores
sequence: float64
- name: target
sequence: int64
splits:
- name: train
num_bytes: 1015868866
num_examples: 3978
download_size: 780858538
dataset_size: 1015868866
configs:
- config_name: GIST-large-Embedding-v0
data_files:
- split: train
path: GIST-large-Embedding-v0/train-*
- config_name: ember-v1
data_files:
- split: train
path: ember-v1/train-*
- config_name: multilingual-e5-base
data_files:
- split: train
path: multilingual-e5-base/train-*
- config_name: multilingual-e5-large
data_files:
- split: train
path: multilingual-e5-large/train-*
- config_name: multilingual-e5-small
data_files:
- split: train
path: multilingual-e5-small/train-*
- config_name: mxbai-embed-large-v1
data_files:
- split: train
path: mxbai-embed-large-v1/train-*
---
The dataset includes multiple configurations, each with the same feature structure including query, docs, query_enc, docs_enc, cos_scores, and target. Each configuration has a training set (train) with provided data sizes and example counts. The dataset is primarily used for text embedding and similarity calculation tasks.
提供机构:
hgissbkh
原始信息汇总
数据集概述
数据集配置名称及特征
-
GIST-large-Embedding-v0
- 特征:
- query: 字符串
- docs: 字符串序列
- query_enc: 浮点数序列
- docs_enc: 浮点数序列
- cos_scores: 浮点数序列
- target: 整数序列
- 训练集:
- 字节数: 1015868866
- 示例数: 3978
- 下载大小: 781180384
- 数据集大小: 1015868866
- 特征:
-
ember-v1
- 特征:
- query: 字符串
- docs: 字符串序列
- query_enc: 浮点数序列
- docs_enc: 浮点数序列
- cos_scores: 浮点数序列
- target: 整数序列
- 训练集:
- 字节数: 1015868866
- 示例数: 3978
- 下载大小: 780823540
- 数据集大小: 1015868866
- 特征:
-
multilingual-e5-base
- 特征:
- query: 字符串
- docs: 字符串序列
- query_enc: 浮点数序列
- docs_enc: 浮点数序列
- cos_scores: 浮点数序列
- target: 整数序列
- 训练集:
- 字节数: 764829122
- 示例数: 3978
- 下载大小: 585932187
- 数据集大小: 764829122
- 特征:
-
multilingual-e5-large
- 特征:
- query: 字符串
- docs: 字符串序列
- query_enc: 浮点数序列
- docs_enc: 浮点数序列
- cos_scores: 浮点数序列
- target: 整数序列
- 训练集:
- 字节数: 1015868866
- 示例数: 3978
- 下载大小: 778483474
- 数据集大小: 1015868866
- 特征:
-
multilingual-e5-small
- 特征:
- query: 字符串
- docs: 字符串序列
- query_enc: 浮点数序列
- docs_enc: 浮点数序列
- cos_scores: 浮点数序列
- target: 整数序列
- 训练集:
- 字节数: 388269506
- 示例数: 3978
- 下载大小: 295290293
- 数据集大小: 388269506
- 特征:
-
mxbai-embed-large-v1
- 特征:
- query: 字符串
- docs: 字符串序列
- query_enc: 浮点数序列
- docs_enc: 浮点数序列
- cos_scores: 浮点数序列
- target: 整数序列
- 训练集:
- 字节数: 1015868866
- 示例数: 3978
- 下载大小: 780858538
- 数据集大小: 1015868866
- 特征:
数据集文件路径
- GIST-large-Embedding-v0:
GIST-large-Embedding-v0/train-* - ember-v1:
ember-v1/train-* - multilingual-e5-base:
multilingual-e5-base/train-* - multilingual-e5-large:
multilingual-e5-large/train-* - multilingual-e5-small:
multilingual-e5-small/train-* - mxbai-embed-large-v1:
mxbai-embed-large-v1/train-*



