hgissbkh/StackOverflow-reranking
收藏Hugging Face2024-05-22 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/hgissbkh/StackOverflow-reranking
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
- config_name: GIST-large-Embedding-v0
features:
- name: query
dtype: string
- name: docs
sequence: string
- name: query_enc
sequence: float64
- name: docs_enc
sequence:
sequence: float64
- name: cos_scores
sequence: float64
- name: target
sequence: int64
splits:
- name: train
num_bytes: 763807167
num_examples: 2992
download_size: 586480566
dataset_size: 763807167
- config_name: ember-v1
features:
- name: query
dtype: string
- name: docs
sequence: string
- name: query_enc
sequence: float64
- name: docs_enc
sequence:
sequence: float64
- name: cos_scores
sequence: float64
- name: target
sequence: int64
splits:
- name: train
num_bytes: 763807167
num_examples: 2992
download_size: 586248937
dataset_size: 763807167
- config_name: multilingual-e5-base
features:
- name: query
dtype: string
- name: docs
sequence: string
- name: query_enc
sequence: float64
- name: docs_enc
sequence:
sequence: float64
- name: cos_scores
sequence: float64
- name: target
sequence: int64
splits:
- name: train
num_bytes: 574444991
num_examples: 2992
download_size: 440200590
dataset_size: 574444991
- config_name: multilingual-e5-large
features:
- name: query
dtype: string
- name: docs
sequence: string
- name: query_enc
sequence: float64
- name: docs_enc
sequence:
sequence: float64
- name: cos_scores
sequence: float64
- name: target
sequence: int64
splits:
- name: train
num_bytes: 763807167
num_examples: 2992
download_size: 584158979
dataset_size: 763807167
- config_name: multilingual-e5-small
features:
- name: query
dtype: string
- name: docs
sequence: string
- name: query_enc
sequence: float64
- name: docs_enc
sequence:
sequence: float64
- name: cos_scores
sequence: float64
- name: target
sequence: int64
splits:
- name: train
num_bytes: 290401727
num_examples: 2992
download_size: 220412320
dataset_size: 290401727
- config_name: mxbai-embed-large-v1
features:
- name: query
dtype: string
- name: docs
sequence: string
- name: query_enc
sequence: float64
- name: docs_enc
sequence:
sequence: float64
- name: cos_scores
sequence: float64
- name: target
sequence: int64
splits:
- name: train
num_bytes: 763807167
num_examples: 2992
download_size: 586271164
dataset_size: 763807167
configs:
- config_name: GIST-large-Embedding-v0
data_files:
- split: train
path: GIST-large-Embedding-v0/train-*
- config_name: ember-v1
data_files:
- split: train
path: ember-v1/train-*
- config_name: multilingual-e5-base
data_files:
- split: train
path: multilingual-e5-base/train-*
- config_name: multilingual-e5-large
data_files:
- split: train
path: multilingual-e5-large/train-*
- config_name: multilingual-e5-small
data_files:
- split: train
path: multilingual-e5-small/train-*
- config_name: mxbai-embed-large-v1
data_files:
- split: train
path: mxbai-embed-large-v1/train-*
---
提供机构:
hgissbkh
原始信息汇总
数据集概述
数据集配置信息
GIST-large-Embedding-v0
- 特征:
- query: 字符串
- docs: 字符串序列
- query_enc: 浮点数序列
- docs_enc: 浮点数序列
- cos_scores: 浮点数序列
- target: 整数序列
- 分割:
- train: 2992个样本,763807167字节
- 下载大小: 586480566字节
- 数据集大小: 763807167字节
ember-v1
- 特征:
- query: 字符串
- docs: 字符串序列
- query_enc: 浮点数序列
- docs_enc: 浮点数序列
- cos_scores: 浮点数序列
- target: 整数序列
- 分割:
- train: 2992个样本,763807167字节
- 下载大小: 586248937字节
- 数据集大小: 763807167字节
multilingual-e5-base
- 特征:
- query: 字符串
- docs: 字符串序列
- query_enc: 浮点数序列
- docs_enc: 浮点数序列
- cos_scores: 浮点数序列
- target: 整数序列
- 分割:
- train: 2992个样本,574444991字节
- 下载大小: 440200590字节
- 数据集大小: 574444991字节
multilingual-e5-large
- 特征:
- query: 字符串
- docs: 字符串序列
- query_enc: 浮点数序列
- docs_enc: 浮点数序列
- cos_scores: 浮点数序列
- target: 整数序列
- 分割:
- train: 2992个样本,763807167字节
- 下载大小: 584158979字节
- 数据集大小: 763807167字节
multilingual-e5-small
- 特征:
- query: 字符串
- docs: 字符串序列
- query_enc: 浮点数序列
- docs_enc: 浮点数序列
- cos_scores: 浮点数序列
- target: 整数序列
- 分割:
- train: 2992个样本,290401727字节
- 下载大小: 220412320字节
- 数据集大小: 290401727字节
mxbai-embed-large-v1
- 特征:
- query: 字符串
- docs: 字符串序列
- query_enc: 浮点数序列
- docs_enc: 浮点数序列
- cos_scores: 浮点数序列
- target: 整数序列
- 分割:
- train: 2992个样本,763807167字节
- 下载大小: 586271164字节
- 数据集大小: 763807167字节



