sentence-transformers/paq
收藏Hugging Face2024-05-01 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/sentence-transformers/paq
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- en
multilinguality:
- monolingual
size_categories:
- 10M<n<100M
task_categories:
- feature-extraction
- sentence-similarity
pretty_name: PAQ
tags:
- sentence-transformers
dataset_info:
config_name: pair
features:
- name: query
dtype: string
- name: answer
dtype: string
splits:
- name: train
num_bytes: 43922325977
num_examples: 64371441
download_size: 29712181667
dataset_size: 43922325977
configs:
- config_name: pair
data_files:
- split: train
path: pair/train-*
---
# Dataset Card for PAQ
This dataset contains query-answer pairs from the [PAQ dataset](https://github.com/facebookresearch/PAQ), formatted to be easily used with Sentence Transformers to train embedding models.
## Dataset Subsets
### `pair` subset
* Columns: "query", "answer"
* Column types: `str`, `str`
* Examples:
```python
{
'query': 'in which year was footballer paul ince born',
'answer': 'Paul Ince Paul Emerson Carlyle Ince (; born 21 October 1967) is an English football manager and a former professional footballer who played as a midfielder from 1982 to 2007. Born in Ilford, London, Ince spent the majority of his playing career at the highest level; after leaving West Ham United he joined Manchester United where he played in the Premier League. After two years in Serie A with Internazionale he returned to England to play in the top flight for Liverpool, Middlesbrough and Wolverhampton Wanderers. After a spell as player-coach of Swindon Town, he retired from playing while player-manager',
}
```
* Collection strategy: Reading the PAQ dataset from [embedding-training-data](https://huggingface.co/datasets/sentence-transformers/embedding-training-data).
* Deduplified: No
提供机构:
sentence-transformers
原始信息汇总
数据集概述
基本信息
- 名称: PAQ
- 语言: 英语
- 多语言性: 单语种
- 大小: 10M<n<100M
- 任务类别:
- 特征提取
- 句子相似度
- 标签: 句子转换器
数据集配置
- 配置名称: pair
- 特征:
- query: 字符串类型
- answer: 字符串类型
数据集分割
- 训练集:
- 字节数: 43922325977
- 示例数: 64371441
- 下载大小: 29712181667
- 数据集大小: 43922325977
数据集文件
- 配置名称: pair
- 数据文件:
- 分割: 训练
- 路径: pair/train-*



