资源简介:
---
language:
- en
multilinguality:
- monolingual
size_categories:
- 100K<n<1M
task_categories:
- feature-extraction
- sentence-similarity
pretty_name: Natural Questions
tags:
- sentence-transformers
dataset_info:
config_name: pair
features:
- name: query
dtype: string
- name: answer
dtype: string
splits:
- name: train
num_bytes: 67154228
num_examples: 100231
download_size: 43995757
dataset_size: 67154228
configs:
- config_name: pair
data_files:
- split: train
path: pair/train-*
---
# Dataset Card for Natural Questions
This dataset is a collection of question-answer pairs from the Natural Questions dataset. See [Natural Questions](https://ai.google.com/research/NaturalQuestions) for additional information.
This dataset can be used directly with Sentence Transformers to train embedding models.
## Dataset Subsets
### `pair` subset
* Columns: "question", "answer"
* Column types: `str`, `str`
* Examples:
```python
{
'query': 'the si unit of the electric field is',
'answer': 'Electric field An electric field is a field that surrounds electric charges. It represents charges attracting or repelling other electric charges by exerting force.[1] [2] Mathematically the electric field is a vector field that associates to each point in space the force, called the Coulomb force, that would be experienced per unit of charge, by an infinitesimal test charge at that point.[3] The units of the electric field in the SI system are newtons per coulomb (N/C), or volts per meter (V/m). Electric fields are created by electric charges, and by time-varying magnetic fields. Electric fields are important in many areas of physics, and are exploited practically in electrical technology. On an atomic scale, the electric field is responsible for the attractive force between the atomic nucleus and electrons that holds atoms together, and the forces between atoms that cause chemical bonding. The electric field and the magnetic field together form the electromagnetic force, one of the four fundamental forces of nature.',
}
```
* Collection strategy: Reading the NQ train dataset from [embedding-training-data](https://huggingface.co/datasets/sentence-transformers/embedding-training-data).
* Deduplified: No
---
语言:
- 英语(en)
多语言类型:
- 单语言
样本规模类别:
- 10万 < 样本数 < 100万
任务类别:
- 特征提取
- 句子相似度
易读名称:自然问题(Natural Questions)
标签:
- 句子转换器(sentence-transformers)
数据集信息:
配置名称:pair
特征:
- 字段名:query(查询),数据类型:字符串
- 字段名:answer(答案),数据类型:字符串
数据划分:
- 划分名称:训练集(train),字节大小:67154228,样本数量:100231
下载大小:43995757
数据集总大小:67154228
配置项:
- 配置名称:pair
数据文件:
- 划分:训练集(train),路径:pair/train-*
---
# 自然问题(Natural Questions)数据集卡片
本数据集为自然问题(Natural Questions)数据集的问答对集合。如需获取更多详细信息,请参阅[自然问题(Natural Questions)](https://ai.google.com/research/NaturalQuestions)官方页面。
本数据集可直接配合句子转换器(sentence-transformers)用于训练嵌入模型。
## 数据集子集
### `pair` 子集
* 字段列:"question"(查询)、"answer"(答案)
* 字段类型:字符串、字符串
* 示例:
python
{
'query': 'the si unit of the electric field is',
'answer': 'Electric field An electric field is a field that surrounds electric charges. It represents charges attracting or repelling other electric charges by exerting force.[1] [2] Mathematically the electric field is a vector field that associates to each point in space the force, called the Coulomb force, that would be experienced per unit of charge, by an infinitesimal test charge at that point.[3] The units of the electric field in the SI system are newtons per coulomb (N/C), or volts per meter (V/m). Electric fields are created by electric charges, and by time-varying magnetic fields. Electric fields are important in many areas of physics, and are exploited practically in electrical technology. On an atomic scale, the electric field is responsible for the attractive force between the atomic nucleus and electrons that holds atoms together, and the forces between atoms that cause chemical bonding. The electric field and the magnetic field together form the electromagnetic force, one of the four fundamental forces of nature.',
}
* 采集策略:从[embedding-training-data](https://huggingface.co/datasets/sentence-transformers/embedding-training-data)中读取自然问题训练集。
* 是否去重:否