lyon-nlp/mteb-fr-reranking-alloprof-s2p
收藏Hugging Face2024-06-04 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/lyon-nlp/mteb-fr-reranking-alloprof-s2p
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
- config_name: documents
features:
- name: text
dtype: string
- name: doc_id
dtype: string
splits:
- name: test
num_bytes: 9476376
num_examples: 2556
download_size: 4995857
dataset_size: 9476376
- config_name: queries
features:
- name: query
dtype: string
- name: positive
sequence: string
- name: negative
sequence: string
splits:
- name: test
num_bytes: 1435051
num_examples: 2316
- name: train
num_bytes: 5773011
num_examples: 9264
download_size: 2617958
dataset_size: 7208062
configs:
- config_name: documents
data_files:
- split: test
path: documents/test-*
- config_name: queries
data_files:
- split: test
path: queries/test-*
- split: train
path: queries/train-*
license: mit
language:
- fr
pretty_name: Alloprof
---
### Description
This dataset was built upon [Alloprof](https://arxiv.org/abs/2302.07738) Q&A dataset, negative samples were created using BM25. Please refer to our paper for more details.
### Citation
If you use this dataset in your work, please consider citing:
```
@misc{ciancone2024extending,
title={Extending the Massive Text Embedding Benchmark to French},
author={Mathieu Ciancone and Imene Kerboua and Marion Schaeffer and Wissam Siblini},
year={2024},
eprint={2405.20468},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
```
提供机构:
lyon-nlp
原始信息汇总
数据集概述
配置信息
-
documents
- 特征:
text: 字符串类型doc_id: 字符串类型
- 分割:
test:- 字节数: 9476376
- 样本数: 2556
- 下载大小: 4995857 字节
- 数据集大小: 9476376 字节
- 特征:
-
queries
- 特征:
query: 字符串类型positive: 字符串序列negative: 字符串序列
- 分割:
test:- 字节数: 1435051
- 样本数: 2316
train:- 字节数: 5773011
- 样本数: 9264
- 下载大小: 2617958 字节
- 数据集大小: 7208062 字节
- 特征:
文件配置
-
documents:
test:documents/test-*
-
queries:
test:queries/test-*train:queries/train-*
其他信息
- 许可证: MIT
- 语言: 法语
- 名称: Alloprof



