five

sentence-transformers/yahoo-answers

收藏
Hugging Face2024-04-30 更新2024-06-15 收录
下载链接:
https://hf-mirror.com/datasets/sentence-transformers/yahoo-answers
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - en multilinguality: - monolingual size_categories: - 1M<n<10M task_categories: - feature-extraction - sentence-similarity pretty_name: Yahoo Answers tags: - sentence-transformers dataset_info: - config_name: question-answer-pair features: - name: question dtype: string - name: answer dtype: string splits: - name: train num_bytes: 441860501 num_examples: 681164 download_size: 296974225 dataset_size: 441860501 - config_name: title-answer-pair features: - name: title dtype: string - name: answer dtype: string splits: - name: train num_bytes: 532353635 num_examples: 1198260 download_size: 359777740 dataset_size: 532353635 - config_name: title-question-answer-pair features: - name: question dtype: string - name: answer dtype: string splits: - name: train num_bytes: 462195629 num_examples: 599417 download_size: 308542541 dataset_size: 462195629 - config_name: title-question-pair features: - name: title dtype: string - name: questions dtype: string splits: - name: train num_bytes: 190935497 num_examples: 659896 download_size: 132675030 dataset_size: 190935497 configs: - config_name: question-answer-pair data_files: - split: train path: question-answer-pair/train-* - config_name: title-answer-pair data_files: - split: train path: title-answer-pair/train-* - config_name: title-question-answer-pair data_files: - split: train path: title-question-answer-pair/train-* - config_name: title-question-pair data_files: - split: train path: title-question-pair/train-* --- # Dataset Card for Yahoo Answers This dataset is a collection of pairs containing titles, questions, and answers collected from Yahoo Answers. See the [Yahoo Answers](https://www.kaggle.com/datasets/soumikrakshit/yahoo-answers-dataset) dataset for additional information. This dataset can be used directly with Sentence Transformers to train embedding models. ## Dataset Subsets ### `title-question-answer-pair` subset * Columns: "question", "answer" * Column types: `str`, `str` * Examples: ```python { 'question': "why doesn't an optical mouse work on a glass table? or even on some surfaces?", 'answer': "why doesn't an optical mouse work on a glass table? Optical mice use an LED and a camera to rapidly capture images of the surface beneath the mouse. The infomation from the camera is analyzed by a DSP (Digital Signal Processor) and used to detect imperfections in the underlying surface and determine motion. Some materials, such as glass, mirrors or other very shiny, uniform surfaces interfere with the ability of the DSP to accurately analyze the surface beneath the mouse. \\nSince glass is transparent and very uniform, the mouse is unable to pick up enough imperfections in the underlying surface to determine motion. Mirrored surfaces are also a problem, since they constantly reflect back the same image, causing the DSP not to recognize motion properly. When the system is unable to see surface changes associated with movement, the mouse will not work properly.", } ``` * Collection strategy: Reading the `title-answer-pair` and `title-question-pair` datasets, matching up the titles, filtering on just 1 question and 1 answer, and then concatenating the title + the question as the question. * Deduplified: No ### `title-answer-pair` subset * Columns: "title", "answer" * Column types: `str`, `str` * Examples: ```python { 'title': "why doesn't an optical mouse work on a glass table?", 'answer': 'Optical mice use an LED and a camera to rapidly capture images of the surface beneath the mouse. The infomation from the camera is analyzed by a DSP (Digital Signal Processor) and used to detect imperfections in the underlying surface and determine motion. Some materials, such as glass, mirrors or other very shiny, uniform surfaces interfere with the ability of the DSP to accurately analyze the surface beneath the mouse. \\nSince glass is transparent and very uniform, the mouse is unable to pick up enough imperfections in the underlying surface to determine motion. Mirrored surfaces are also a problem, since they constantly reflect back the same image, causing the DSP not to recognize motion properly. When the system is unable to see surface changes associated with movement, the mouse will not work properly.', } ``` * Collection strategy: Reading the Yahoo Answers (title, answer) dataset from [embedding-training-data](https://huggingface.co/datasets/sentence-transformers/embedding-training-data). * Deduplified: No ### `title-question-pair` subset * Columns: "title", "question" * Column types: `str`, `str` * Examples: ```python { 'title': "why doesn't an optical mouse work on a glass table?", 'questions': 'or even on some surfaces?', } ``` * Collection strategy: Reading the Yahoo Answers (title, question) dataset from [embedding-training-data](https://huggingface.co/datasets/sentence-transformers/embedding-training-data). * Deduplified: No ### `question-answer-pair` subset * Columns: "question", "answer" * Column types: `str`, `str` * Examples: ```python { 'question': 'or even on some surfaces?', 'answer': 'Optical mice use an LED and a camera to rapidly capture images of the surface beneath the mouse. The infomation from the camera is analyzed by a DSP (Digital Signal Processor) and used to detect imperfections in the underlying surface and determine motion. Some materials, such as glass, mirrors or other very shiny, uniform surfaces interfere with the ability of the DSP to accurately analyze the surface beneath the mouse. \\nSince glass is transparent and very uniform, the mouse is unable to pick up enough imperfections in the underlying surface to determine motion. Mirrored surfaces are also a problem, since they constantly reflect back the same image, causing the DSP not to recognize motion properly. When the system is unable to see surface changes associated with movement, the mouse will not work properly.', } ``` * Collection strategy: Reading the Yahoo Answers (question, answer) dataset from [embedding-training-data](https://huggingface.co/datasets/sentence-transformers/embedding-training-data). * Deduplified: No
提供机构:
sentence-transformers
原始信息汇总

数据集概述

基本信息

  • 语言: 英语
  • 多语言性: 单语种
  • 数据集大小: 1M<n<10M
  • 任务类别: 特征提取、句子相似性
  • 标签: sentence-transformers
  • 数据集名称: Yahoo Answers

数据集配置

question-answer-pair

  • 特征:
    • question: 字符串
    • answer: 字符串
  • 分割:
    • train:
      • 字节数: 441860501
      • 样本数: 681164
  • 下载大小: 296974225
  • 数据集大小: 441860501

title-answer-pair

  • 特征:
    • title: 字符串
    • answer: 字符串
  • 分割:
    • train:
      • 字节数: 532353635
      • 样本数: 1198260
  • 下载大小: 359777740
  • 数据集大小: 532353635

title-question-answer-pair

  • 特征:
    • question: 字符串
    • answer: 字符串
  • 分割:
    • train:
      • 字节数: 462195629
      • 样本数: 599417
  • 下载大小: 308542541
  • 数据集大小: 462195629

title-question-pair

  • 特征:
    • title: 字符串
    • questions: 字符串
  • 分割:
    • train:
      • 字节数: 190935497
      • 样本数: 659896
  • 下载大小: 132675030
  • 数据集大小: 190935497

数据集子集

title-question-answer-pair 子集

  • : "question", "answer"

  • 列类型: str, str

  • 示例: python { question: "why doesnt an optical mouse work on a glass table? or even on some surfaces?", answer: "why doesnt an optical mouse work on a glass table? Optical mice use an LED and a camera to rapidly capture images of the surface beneath the mouse. The infomation from the camera is analyzed by a DSP (Digital Signal Processor) and used to detect imperfections in the underlying surface and determine motion. Some materials, such as glass, mirrors or other very shiny, uniform surfaces interfere with the ability of the DSP to accurately analyze the surface beneath the mouse. \nSince glass is transparent and very uniform, the mouse is unable to pick up enough imperfections in the underlying surface to determine motion. Mirrored surfaces are also a problem, since they constantly reflect back the same image, causing the DSP not to recognize motion properly. When the system is unable to see surface changes associated with movement, the mouse will not work properly.", }

  • 收集策略: 读取 title-answer-pairtitle-question-pair 数据集,匹配标题,过滤单个问题和单个答案,然后连接标题和问题作为问题。

  • 去重: 否

title-answer-pair 子集

  • : "title", "answer"

  • 列类型: str, str

  • 示例: python { title: "why doesnt an optical mouse work on a glass table?", answer: Optical mice use an LED and a camera to rapidly capture images of the surface beneath the mouse. The infomation from the camera is analyzed by a DSP (Digital Signal Processor) and used to detect imperfections in the underlying surface and determine motion. Some materials, such as glass, mirrors or other very shiny, uniform surfaces interfere with the ability of the DSP to accurately analyze the surface beneath the mouse. \nSince glass is transparent and very uniform, the mouse is unable to pick up enough imperfections in the underlying surface to determine motion. Mirrored surfaces are also a problem, since they constantly reflect back the same image, causing the DSP not to recognize motion properly. When the system is unable to see surface changes associated with movement, the mouse will not work properly., }

  • 收集策略: 读取 Yahoo Answers (title, answer) 数据集。

  • 去重: 否

title-question-pair 子集

  • : "title", "question"

  • 列类型: str, str

  • 示例: python { title: "why doesnt an optical mouse work on a glass table?", questions: or even on some surfaces?, }

  • 收集策略: 读取 Yahoo Answers (title, question) 数据集。

  • 去重: 否

question-answer-pair 子集

  • : "question", "answer"

  • 列类型: str, str

  • 示例: python { question: or even on some surfaces?, answer: Optical mice use an LED and a camera to rapidly capture images of the surface beneath the mouse. The infomation from the camera is analyzed by a DSP (Digital Signal Processor) and used to detect imperfections in the underlying surface and determine motion. Some materials, such as glass, mirrors or other very shiny, uniform surfaces interfere with the ability of the DSP to accurately analyze the surface beneath the mouse. \nSince glass is transparent and very uniform, the mouse is unable to pick up enough imperfections in the underlying surface to determine motion. Mirrored surfaces are also a problem, since they constantly reflect back the same image, causing the DSP not to recognize motion properly. When the system is unable to see surface changes associated with movement, the mouse will not work properly., }

  • 收集策略: 读取 Yahoo Answers (question, answer) 数据集。

  • 去重: 否

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作