hgissbkh/Alloprof-reranking
收藏Hugging Face2024-05-23 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/hgissbkh/Alloprof-reranking
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
- config_name: bge-m3-custom-fr
features:
- name: query
dtype: string
- name: docs
sequence: string
- name: query_enc
sequence: float64
- name: docs_enc
sequence:
sequence: float64
- name: cos_scores
sequence: float64
- name: target
sequence: int64
splits:
- name: train
num_bytes: 318998948
num_examples: 2316
download_size: 219114214
dataset_size: 318998948
- config_name: multilingual-e5-base
features:
- name: query
dtype: string
- name: docs
sequence: string
- name: query_enc
sequence: float64
- name: docs_enc
sequence:
sequence: float64
- name: cos_scores
sequence: float64
- name: target
sequence: int64
splits:
- name: train
num_bytes: 263469476
num_examples: 2316
download_size: 175986096
dataset_size: 263469476
- config_name: multilingual-e5-large
features:
- name: query
dtype: string
- name: docs
sequence: string
- name: query_enc
sequence: float64
- name: docs_enc
sequence:
sequence: float64
- name: cos_scores
sequence: float64
- name: target
sequence: int64
splits:
- name: train
num_bytes: 318998948
num_examples: 2316
download_size: 218608593
dataset_size: 318998948
- config_name: multilingual-e5-small
features:
- name: query
dtype: string
- name: docs
sequence: string
- name: query_enc
sequence: float64
- name: docs_enc
sequence:
sequence: float64
- name: cos_scores
sequence: float64
- name: target
sequence: int64
splits:
- name: train
num_bytes: 180175268
num_examples: 2316
download_size: 111515323
dataset_size: 180175268
- config_name: sentence-t5-large
features:
- name: query
dtype: string
- name: docs
sequence: string
- name: query_enc
sequence: float64
- name: docs_enc
sequence:
sequence: float64
- name: cos_scores
sequence: float64
- name: target
sequence: int64
splits:
- name: train
num_bytes: 263469476
num_examples: 2316
download_size: 176118797
dataset_size: 263469476
- config_name: sentence_croissant_alpha_v0.4
features:
- name: query
dtype: string
- name: docs
sequence: string
- name: query_enc
sequence: float64
- name: docs_enc
sequence:
sequence: float64
- name: cos_scores
sequence: float64
- name: target
sequence: int64
splits:
- name: train
num_bytes: 541116836
num_examples: 2316
download_size: 390071420
dataset_size: 541116836
configs:
- config_name: bge-m3-custom-fr
data_files:
- split: train
path: bge-m3-custom-fr/train-*
- config_name: multilingual-e5-base
data_files:
- split: train
path: multilingual-e5-base/train-*
- config_name: multilingual-e5-large
data_files:
- split: train
path: multilingual-e5-large/train-*
- config_name: multilingual-e5-small
data_files:
- split: train
path: multilingual-e5-small/train-*
- config_name: sentence-t5-large
data_files:
- split: train
path: sentence-t5-large/train-*
- config_name: sentence_croissant_alpha_v0.4
data_files:
- split: train
path: sentence_croissant_alpha_v0.4/train-*
---
数据集信息:
- 配置名称:bge-m3-custom-fr
特征:
- 字段名:查询(query),数据类型:字符串
- 字段名:文档(docs),类型:字符串序列
- 字段名:查询编码(query_enc),类型:64位浮点数序列
- 字段名:文档编码(docs_enc),类型:嵌套64位浮点数序列
- 字段名:余弦相似度得分(cos_scores),类型:64位浮点数序列
- 字段名:目标标签(target),类型:64位整数序列
数据集划分:
- 划分名称:训练集(train),字节数:318998948,样本数量:2316
下载大小:219114214
数据集总大小:318998948
- 配置名称:multilingual-e5-base
特征:
- 字段名:查询,数据类型:字符串
- 字段名:文档,类型:字符串序列
- 字段名:查询编码,类型:64位浮点数序列
- 字段名:文档编码,类型:嵌套64位浮点数序列
- 字段名:余弦相似度得分,类型:64位浮点数序列
- 字段名:目标标签,类型:64位整数序列
数据集划分:
- 划分名称:训练集(train),字节数:263469476,样本数量:2316
下载大小:175986096
数据集总大小:263469476
- 配置名称:multilingual-e5-large
特征:
- 字段名:查询,数据类型:字符串
- 字段名:文档,类型:字符串序列
- 字段名:查询编码,类型:64位浮点数序列
- 字段名:文档编码,类型:嵌套64位浮点数序列
- 字段名:余弦相似度得分,类型:64位浮点数序列
- 字段名:目标标签,类型:64位整数序列
数据集划分:
- 划分名称:训练集(train),字节数:318998948,样本数量:2316
下载大小:218608593
数据集总大小:318998948
- 配置名称:multilingual-e5-small
特征:
- 字段名:查询,数据类型:字符串
- 字段名:文档,类型:字符串序列
- 字段名:查询编码,类型:64位浮点数序列
- 字段名:文档编码,类型:嵌套64位浮点数序列
- 字段名:余弦相似度得分,类型:64位浮点数序列
- 字段名:目标标签,类型:64位整数序列
数据集划分:
- 划分名称:训练集(train),字节数:180175268,样本数量:2316
下载大小:111515323
数据集总大小:180175268
- 配置名称:sentence-t5-large
特征:
- 字段名:查询,数据类型:字符串
- 字段名:文档,类型:字符串序列
- 字段名:查询编码,类型:64位浮点数序列
- 字段名:文档编码,类型:嵌套64位浮点数序列
- 字段名:余弦相似度得分,类型:64位浮点数序列
- 字段名:目标标签,类型:64位整数序列
数据集划分:
- 划分名称:训练集(train),字节数:263469476,样本数量:2316
下载大小:176118797
数据集总大小:263469476
- 配置名称:sentence_croissant_alpha_v0.4
特征:
- 字段名:查询,数据类型:字符串
- 字段名:文档,类型:字符串序列
- 字段名:查询编码,类型:64位浮点数序列
- 字段名:文档编码,类型:嵌套64位浮点数序列
- 字段名:余弦相似度得分,类型:64位浮点数序列
- 字段名:目标标签,类型:64位整数序列
数据集划分:
- 划分名称:训练集(train),字节数:541116836,样本数量:2316
下载大小:390071420
数据集总大小:541116836
配置列表:
- 配置名称:bge-m3-custom-fr
数据文件:
- 划分:训练集(train),路径:bge-m3-custom-fr/train-*
- 配置名称:multilingual-e5-base
数据文件:
- 划分:训练集(train),路径:multilingual-e5-base/train-*
- 配置名称:multilingual-e5-large
数据文件:
- 划分:训练集(train),路径:multilingual-e5-large/train-*
- 配置名称:multilingual-e5-small
数据文件:
- 划分:训练集(train),路径:multilingual-e5-small/train-*
- 配置名称:sentence-t5-large
数据文件:
- 划分:训练集(train),路径:sentence-t5-large/train-*
- 配置名称:sentence_croissant_alpha_v0.4
数据文件:
- 划分:训练集(train),路径:sentence_croissant_alpha_v0.4/train-*
提供机构:
hgissbkh
原始信息汇总
数据集概述
1. bge-m3-custom-fr
- 特征:
- query: 字符串类型
- docs: 字符串序列
- query_enc: 浮点数序列
- docs_enc: 浮点数序列
- cos_scores: 浮点数序列
- target: 整数序列
- 分割:
- train: 2316个样本,占用318998948字节
- 下载大小: 219114214字节
- 数据集大小: 318998948字节
2. multilingual-e5-base
- 特征:
- query: 字符串类型
- docs: 字符串序列
- query_enc: 浮点数序列
- docs_enc: 浮点数序列
- cos_scores: 浮点数序列
- target: 整数序列
- 分割:
- train: 2316个样本,占用263469476字节
- 下载大小: 175986096字节
- 数据集大小: 263469476字节
3. multilingual-e5-large
- 特征:
- query: 字符串类型
- docs: 字符串序列
- query_enc: 浮点数序列
- docs_enc: 浮点数序列
- cos_scores: 浮点数序列
- target: 整数序列
- 分割:
- train: 2316个样本,占用318998948字节
- 下载大小: 218608593字节
- 数据集大小: 318998948字节
4. multilingual-e5-small
- 特征:
- query: 字符串类型
- docs: 字符串序列
- query_enc: 浮点数序列
- docs_enc: 浮点数序列
- cos_scores: 浮点数序列
- target: 整数序列
- 分割:
- train: 2316个样本,占用180175268字节
- 下载大小: 111515323字节
- 数据集大小: 180175268字节
5. sentence-t5-large
- 特征:
- query: 字符串类型
- docs: 字符串序列
- query_enc: 浮点数序列
- docs_enc: 浮点数序列
- cos_scores: 浮点数序列
- target: 整数序列
- 分割:
- train: 2316个样本,占用263469476字节
- 下载大小: 176118797字节
- 数据集大小: 263469476字节
6. sentence_croissant_alpha_v0.4
- 特征:
- query: 字符串类型
- docs: 字符串序列
- query_enc: 浮点数序列
- docs_enc: 浮点数序列
- cos_scores: 浮点数序列
- target: 整数序列
- 分割:
- train: 2316个样本,占用541116836字节
- 下载大小: 390071420字节
- 数据集大小: 541116836字节
搜集汇总
数据集介绍

以上内容由遇见数据集搜集并总结生成



