hgissbkh/CMedQAv1-reranking
收藏Hugging Face2024-05-22 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/hgissbkh/CMedQAv1-reranking
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
- config_name: acge_text_embedding
features:
- name: query
dtype: string
- name: docs
sequence: string
- name: query_enc
sequence: float64
- name: docs_enc
sequence:
sequence: float64
- name: cos_scores
sequence: float64
- name: target
sequence: int64
splits:
- name: train
num_bytes: 1481827155
num_examples: 1000
download_size: 1130586044
dataset_size: 1481827155
- config_name: gte-large-zh
features:
- name: query
dtype: string
- name: docs
sequence: string
- name: query_enc
sequence: float64
- name: docs_enc
sequence:
sequence: float64
- name: cos_scores
sequence: float64
- name: target
sequence: int64
splits:
- name: train
num_bytes: 861283155
num_examples: 1000
download_size: 657766982
dataset_size: 861283155
- config_name: multilingual-e5-base
features:
- name: query
dtype: string
- name: docs
sequence: string
- name: query_enc
sequence: float64
- name: docs_enc
sequence:
sequence: float64
- name: cos_scores
sequence: float64
- name: target
sequence: int64
splits:
- name: train
num_bytes: 654435155
num_examples: 1000
download_size: 498425634
dataset_size: 654435155
- config_name: multilingual-e5-large
features:
- name: query
dtype: string
- name: docs
sequence: string
- name: query_enc
sequence: float64
- name: docs_enc
sequence:
sequence: float64
- name: cos_scores
sequence: float64
- name: target
sequence: int64
splits:
- name: train
num_bytes: 861283155
num_examples: 1000
download_size: 656363527
dataset_size: 861283155
- config_name: multilingual-e5-small
features:
- name: query
dtype: string
- name: docs
sequence: string
- name: query_enc
sequence: float64
- name: docs_enc
sequence:
sequence: float64
- name: cos_scores
sequence: float64
- name: target
sequence: int64
splits:
- name: train
num_bytes: 344163155
num_examples: 1000
download_size: 258565410
dataset_size: 344163155
- config_name: stella-mrl-large-zh-v3.5-1792d
features:
- name: query
dtype: string
- name: docs
sequence: string
- name: query_enc
sequence: float64
- name: docs_enc
sequence:
sequence: float64
- name: cos_scores
sequence: float64
- name: target
sequence: int64
splits:
- name: train
num_bytes: 1481827155
num_examples: 1000
download_size: 1130825185
dataset_size: 1481827155
configs:
- config_name: acge_text_embedding
data_files:
- split: train
path: acge_text_embedding/train-*
- config_name: gte-large-zh
data_files:
- split: train
path: gte-large-zh/train-*
- config_name: multilingual-e5-base
data_files:
- split: train
path: multilingual-e5-base/train-*
- config_name: multilingual-e5-large
data_files:
- split: train
path: multilingual-e5-large/train-*
- config_name: multilingual-e5-small
data_files:
- split: train
path: multilingual-e5-small/train-*
- config_name: stella-mrl-large-zh-v3.5-1792d
data_files:
- split: train
path: stella-mrl-large-zh-v3.5-1792d/train-*
---
数据集信息:
- 配置名称:acge_text_embedding
特征字段:
- 字段名:查询(query),数据类型:字符串
- 字段名:文档(docs),数据类型:字符串序列
- 字段名:查询编码(query_enc),数据类型:64位浮点数序列
- 字段名:文档编码(docs_enc),数据类型:双层64位浮点数序列
- 字段名:余弦相似度得分(cos_scores),数据类型:64位浮点数序列
- 字段名:目标标签(target),数据类型:64位整数序列
数据集划分:
- 划分集:训练集(train),占用字节数:1481827155,样本数量:1000
下载大小:1130586044,数据集存储大小:1481827155
- 配置名称:gte-large-zh
特征字段:
- 字段名:查询(query),数据类型:字符串
- 字段名:文档(docs),数据类型:字符串序列
- 字段名:查询编码(query_enc),数据类型:64位浮点数序列
- 字段名:文档编码(docs_enc),数据类型:双层64位浮点数序列
- 字段名:余弦相似度得分(cos_scores),数据类型:64位浮点数序列
- 字段名:目标标签(target),数据类型:64位整数序列
数据集划分:
- 划分集:训练集(train),占用字节数:861283155,样本数量:1000
下载大小:657766982,数据集存储大小:861283155
- 配置名称:multilingual-e5-base
特征字段:
- 字段名:查询(query),数据类型:字符串
- 字段名:文档(docs),数据类型:字符串序列
- 字段名:查询编码(query_enc),数据类型:64位浮点数序列
- 字段名:文档编码(docs_enc),数据类型:双层64位浮点数序列
- 字段名:余弦相似度得分(cos_scores),数据类型:64位浮点数序列
- 字段名:目标标签(target),数据类型:64位整数序列
数据集划分:
- 划分集:训练集(train),占用字节数:654435155,样本数量:1000
下载大小:498425634,数据集存储大小:654435155
- 配置名称:multilingual-e5-large
特征字段:
- 字段名:查询(query),数据类型:字符串
- 字段名:文档(docs),数据类型:字符串序列
- 字段名:查询编码(query_enc),数据类型:64位浮点数序列
- 字段名:文档编码(docs_enc),数据类型:双层64位浮点数序列
- 字段名:余弦相似度得分(cos_scores),数据类型:64位浮点数序列
- 字段名:目标标签(target),数据类型:64位整数序列
数据集划分:
- 划分集:训练集(train),占用字节数:861283155,样本数量:1000
下载大小:656363527,数据集存储大小:861283155
- 配置名称:multilingual-e5-small
特征字段:
- 字段名:查询(query),数据类型:字符串
- 字段名:文档(docs),数据类型:字符串序列
- 字段名:查询编码(query_enc),数据类型:64位浮点数序列
- 字段名:文档编码(docs_enc),数据类型:双层64位浮点数序列
- 字段名:余弦相似度得分(cos_scores),数据类型:64位浮点数序列
- 字段名:目标标签(target),数据类型:64位整数序列
数据集划分:
- 划分集:训练集(train),占用字节数:344163155,样本数量:1000
下载大小:258565410,数据集存储大小:344163155
- 配置名称:stella-mrl-large-zh-v3.5-1792d
特征字段:
- 字段名:查询(query),数据类型:字符串
- 字段名:文档(docs),数据类型:字符串序列
- 字段名:查询编码(query_enc),数据类型:64位浮点数序列
- 字段名:文档编码(docs_enc),数据类型:双层64位浮点数序列
- 字段名:余弦相似度得分(cos_scores),数据类型:64位浮点数序列
- 字段名:目标标签(target),数据类型:64位整数序列
数据集划分:
- 划分集:训练集(train),占用字节数:1481827155,样本数量:1000
下载大小:1130825185,数据集存储大小:1481827155
配置项:
- 配置名称:acge_text_embedding,数据文件:
- 划分集:训练集(train),文件路径:acge_text_embedding/train-*
- 配置名称:gte-large-zh,数据文件:
- 划分集:训练集(train),文件路径:gte-large-zh/train-*
- 配置名称:multilingual-e5-base,数据文件:
- 划分集:训练集(train),文件路径:multilingual-e5-base/train-*
- 配置名称:multilingual-e5-large,数据文件:
- 划分集:训练集(train),文件路径:multilingual-e5-large/train-*
- 配置名称:multilingual-e5-small,数据文件:
- 划分集:训练集(train),文件路径:multilingual-e5-small/train-*
- 配置名称:stella-mrl-large-zh-v3.5-1792d,数据文件:
- 划分集:训练集(train),文件路径:stella-mrl-large-zh-v3.5-1792d/train-*
提供机构:
hgissbkh
原始信息汇总
数据集概述
数据集配置
1. acge_text_embedding
- 特征:
query: 字符串类型docs: 字符串序列query_enc: 浮点数序列docs_enc: 浮点数序列的序列cos_scores: 浮点数序列target: 整数序列
- 分割:
train:- 字节数: 1481827155
- 样本数: 1000
- 下载大小: 1130586044
- 数据集大小: 1481827155
2. gte-large-zh
- 特征:
query: 字符串类型docs: 字符串序列query_enc: 浮点数序列docs_enc: 浮点数序列的序列cos_scores: 浮点数序列target: 整数序列
- 分割:
train:- 字节数: 861283155
- 样本数: 1000
- 下载大小: 657766982
- 数据集大小: 861283155
3. multilingual-e5-base
- 特征:
query: 字符串类型docs: 字符串序列query_enc: 浮点数序列docs_enc: 浮点数序列的序列cos_scores: 浮点数序列target: 整数序列
- 分割:
train:- 字节数: 654435155
- 样本数: 1000
- 下载大小: 498425634
- 数据集大小: 654435155
4. multilingual-e5-large
- 特征:
query: 字符串类型docs: 字符串序列query_enc: 浮点数序列docs_enc: 浮点数序列的序列cos_scores: 浮点数序列target: 整数序列
- 分割:
train:- 字节数: 861283155
- 样本数: 1000
- 下载大小: 656363527
- 数据集大小: 861283155
5. multilingual-e5-small
- 特征:
query: 字符串类型docs: 字符串序列query_enc: 浮点数序列docs_enc: 浮点数序列的序列cos_scores: 浮点数序列target: 整数序列
- 分割:
train:- 字节数: 344163155
- 样本数: 1000
- 下载大小: 258565410
- 数据集大小: 344163155
6. stella-mrl-large-zh-v3.5-1792d
- 特征:
query: 字符串类型docs: 字符串序列query_enc: 浮点数序列docs_enc: 浮点数序列的序列cos_scores: 浮点数序列target: 整数序列
- 分割:
train:- 字节数: 1481827155
- 样本数: 1000
- 下载大小: 1130825185
- 数据集大小: 1481827155
数据文件路径
- acge_text_embedding:
train: acge_text_embedding/train-*
- gte-large-zh:
train: gte-large-zh/train-*
- multilingual-e5-base:
train: multilingual-e5-base/train-*
- multilingual-e5-large:
train: multilingual-e5-large/train-*
- multilingual-e5-small:
train: multilingual-e5-small/train-*
- stella-mrl-large-zh-v3.5-1792d:
train: stella-mrl-large-zh-v3.5-1792d/train-*
搜集汇总
数据集介绍

以上内容由遇见数据集搜集并总结生成



