yahoo-answers
收藏魔搭社区2025-11-07 更新2025-01-11 收录
下载链接:
https://modelscope.cn/datasets/sentence-transformers/yahoo-answers
下载链接
链接失效反馈官方服务:
资源简介:
# Dataset Card for Yahoo Answers
This dataset is a collection of pairs containing titles, questions, and answers collected from Yahoo Answers. See the [Yahoo Answers](https://www.kaggle.com/datasets/soumikrakshit/yahoo-answers-dataset) dataset for additional information. This dataset can be used directly with Sentence Transformers to train embedding models.
## Dataset Subsets
### `title-question-answer-pair` subset
* Columns: "question", "answer"
* Column types: `str`, `str`
* Examples:
```python
{
'question': "why doesn't an optical mouse work on a glass table? or even on some surfaces?",
'answer': "why doesn't an optical mouse work on a glass table? Optical mice use an LED and a camera to rapidly capture images of the surface beneath the mouse. The infomation from the camera is analyzed by a DSP (Digital Signal Processor) and used to detect imperfections in the underlying surface and determine motion. Some materials, such as glass, mirrors or other very shiny, uniform surfaces interfere with the ability of the DSP to accurately analyze the surface beneath the mouse. \\nSince glass is transparent and very uniform, the mouse is unable to pick up enough imperfections in the underlying surface to determine motion. Mirrored surfaces are also a problem, since they constantly reflect back the same image, causing the DSP not to recognize motion properly. When the system is unable to see surface changes associated with movement, the mouse will not work properly.",
}
```
* Collection strategy: Reading the `title-answer-pair` and `title-question-pair` datasets, matching up the titles, filtering on just 1 question and 1 answer, and then concatenating the title + the question as the question.
* Deduplified: No
### `title-answer-pair` subset
* Columns: "title", "answer"
* Column types: `str`, `str`
* Examples:
```python
{
'title': "why doesn't an optical mouse work on a glass table?",
'answer': 'Optical mice use an LED and a camera to rapidly capture images of the surface beneath the mouse. The infomation from the camera is analyzed by a DSP (Digital Signal Processor) and used to detect imperfections in the underlying surface and determine motion. Some materials, such as glass, mirrors or other very shiny, uniform surfaces interfere with the ability of the DSP to accurately analyze the surface beneath the mouse. \\nSince glass is transparent and very uniform, the mouse is unable to pick up enough imperfections in the underlying surface to determine motion. Mirrored surfaces are also a problem, since they constantly reflect back the same image, causing the DSP not to recognize motion properly. When the system is unable to see surface changes associated with movement, the mouse will not work properly.',
}
```
* Collection strategy: Reading the Yahoo Answers (title, answer) dataset from [embedding-training-data](https://huggingface.co/datasets/sentence-transformers/embedding-training-data).
* Deduplified: No
### `title-question-pair` subset
* Columns: "title", "question"
* Column types: `str`, `str`
* Examples:
```python
{
'title': "why doesn't an optical mouse work on a glass table?",
'questions': 'or even on some surfaces?',
}
```
* Collection strategy: Reading the Yahoo Answers (title, question) dataset from [embedding-training-data](https://huggingface.co/datasets/sentence-transformers/embedding-training-data).
* Deduplified: No
### `question-answer-pair` subset
* Columns: "question", "answer"
* Column types: `str`, `str`
* Examples:
```python
{
'question': 'or even on some surfaces?',
'answer': 'Optical mice use an LED and a camera to rapidly capture images of the surface beneath the mouse. The infomation from the camera is analyzed by a DSP (Digital Signal Processor) and used to detect imperfections in the underlying surface and determine motion. Some materials, such as glass, mirrors or other very shiny, uniform surfaces interfere with the ability of the DSP to accurately analyze the surface beneath the mouse. \\nSince glass is transparent and very uniform, the mouse is unable to pick up enough imperfections in the underlying surface to determine motion. Mirrored surfaces are also a problem, since they constantly reflect back the same image, causing the DSP not to recognize motion properly. When the system is unable to see surface changes associated with movement, the mouse will not work properly.',
}
```
* Collection strategy: Reading the Yahoo Answers (question, answer) dataset from [embedding-training-data](https://huggingface.co/datasets/sentence-transformers/embedding-training-data).
* Deduplified: No
# 雅虎问答(Yahoo Answers)数据集卡片
本数据集为从雅虎问答(Yahoo Answers)平台采集的标题、问题与答案配对集合,详细信息可参阅[雅虎问答数据集](https://www.kaggle.com/datasets/soumikrakshit/yahoo-answers-dataset)。该数据集可直接配合Sentence Transformers用于嵌入模型的训练。
## 数据集子集
### `title-question-answer-pair` 子集
* 列名:"question"(问题)、"answer"(答案)
* 列类型:`str`(字符串)、`str`(字符串)
* 示例:
python
{
'question': "why doesn't an optical mouse work on a glass table? or even on some surfaces?",
'answer': "why doesn't an optical mouse work on a glass table? Optical mice use an LED and a camera to rapidly capture images of the surface beneath the mouse. The infomation from the camera is analyzed by a DSP (Digital Signal Processor,数字信号处理器) and used to detect imperfections in the underlying surface and determine motion. Some materials, such as glass, mirrors or other very shiny, uniform surfaces interfere with the ability of the DSP to accurately analyze the surface beneath the mouse. \nSince glass is transparent and very uniform, the mouse is unable to pick up enough imperfections in the underlying surface to determine motion. Mirrored surfaces are also a problem, since they constantly reflect back the same image, causing the DSP not to recognize motion properly. When the system is unable to see surface changes associated with movement, the mouse will not work properly.",
}
* 采集策略:读取`title-answer-pair`与`title-question-pair`数据集,通过标题进行匹配,筛选出仅包含单条问题与单条答案的数据,随后将标题与问题拼接作为最终的`question`字段。
* 去重状态:未去重
### `title-answer-pair` 子集
* 列名:"title"(标题)、"answer"(答案)
* 列类型:`str`(字符串)、`str`(字符串)
* 示例:
python
{
'title': "why doesn't an optical mouse work on a glass table?",
'answer': 'Optical mice use an LED and a camera to rapidly capture images of the surface beneath the mouse. The infomation from the camera is analyzed by a DSP (Digital Signal Processor,数字信号处理器) and used to detect imperfections in the underlying surface and determine motion. Some materials, such as glass, mirrors or other very shiny, uniform surfaces interfere with the ability of the DSP to accurately analyze the surface beneath the mouse. \nSince glass is transparent and very uniform, the mouse is unable to pick up enough imperfections in the underlying surface to determine motion. Mirrored surfaces are also a problem, since they constantly reflect back the same image, causing the DSP not to recognize motion properly. When the system is unable to see surface changes associated with movement, the mouse will not work properly.',
}
* 采集策略:从[embedding-training-data](https://huggingface.co/datasets/sentence-transformers/embedding-training-data)读取雅虎问答的(标题,答案)数据集。
* 去重状态:未去重
### `title-question-pair` 子集
* 列名:"title"(标题)、"question"(问题)
* 列类型:`str`(字符串)、`str`(字符串)
* 示例:
python
{
'title': "why doesn't an optical mouse work on a glass table?",
'questions': 'or even on some surfaces?',
}
* 采集策略:从[embedding-training-data](https://huggingface.co/datasets/sentence-transformers/embedding-training-data)读取雅虎问答的(标题,问题)数据集。
* 去重状态:未去重
### `question-answer-pair` 子集
* 列名:"question"(问题)、"answer"(答案)
* 列类型:`str`(字符串)、`str`(字符串)
* 示例:
python
{
'question': 'or even on some surfaces?',
'answer': 'Optical mice use an LED and a camera to rapidly capture images of the surface beneath the mouse. The infomation from the camera is analyzed by a DSP (Digital Signal Processor,数字信号处理器) and used to detect imperfections in the underlying surface and determine motion. Some materials, such as glass, mirrors or other very shiny, uniform surfaces interfere with the ability of the DSP to accurately analyze the surface beneath the mouse. \nSince glass is transparent and very uniform, the mouse is unable to pick up enough imperfections in the underlying surface to determine motion. Mirrored surfaces are also a problem, since they constantly reflect back the same image, causing the DSP not to recognize motion properly. When the system is unable to see surface changes associated with movement, the mouse will not work properly.',
}
* 采集策略:从[embedding-training-data](https://huggingface.co/datasets/sentence-transformers/embedding-training-data)读取雅虎问答的(问题,答案)数据集。
* 去重状态:未去重
提供机构:
maas
创建时间:
2025-01-06



