five

sentence-transformers/paq

收藏
Hugging Face2024-05-01 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/sentence-transformers/paq
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - en multilinguality: - monolingual size_categories: - 10M<n<100M task_categories: - feature-extraction - sentence-similarity pretty_name: PAQ tags: - sentence-transformers dataset_info: config_name: pair features: - name: query dtype: string - name: answer dtype: string splits: - name: train num_bytes: 43922325977 num_examples: 64371441 download_size: 29712181667 dataset_size: 43922325977 configs: - config_name: pair data_files: - split: train path: pair/train-* --- # Dataset Card for PAQ This dataset contains query-answer pairs from the [PAQ dataset](https://github.com/facebookresearch/PAQ), formatted to be easily used with Sentence Transformers to train embedding models. ## Dataset Subsets ### `pair` subset * Columns: "query", "answer" * Column types: `str`, `str` * Examples: ```python { 'query': 'in which year was footballer paul ince born', 'answer': 'Paul Ince Paul Emerson Carlyle Ince (; born 21 October 1967) is an English football manager and a former professional footballer who played as a midfielder from 1982 to 2007. Born in Ilford, London, Ince spent the majority of his playing career at the highest level; after leaving West Ham United he joined Manchester United where he played in the Premier League. After two years in Serie A with Internazionale he returned to England to play in the top flight for Liverpool, Middlesbrough and Wolverhampton Wanderers. After a spell as player-coach of Swindon Town, he retired from playing while player-manager', } ``` * Collection strategy: Reading the PAQ dataset from [embedding-training-data](https://huggingface.co/datasets/sentence-transformers/embedding-training-data). * Deduplified: No
提供机构:
sentence-transformers
原始信息汇总

数据集概述

基本信息

  • 名称: PAQ
  • 语言: 英语
  • 多语言性: 单语种
  • 大小: 10M<n<100M
  • 任务类别:
    • 特征提取
    • 句子相似度
  • 标签: 句子转换器

数据集配置

  • 配置名称: pair
  • 特征:
    • query: 字符串类型
    • answer: 字符串类型

数据集分割

  • 训练集:
    • 字节数: 43922325977
    • 示例数: 64371441
    • 下载大小: 29712181667
    • 数据集大小: 43922325977

数据集文件

  • 配置名称: pair
  • 数据文件:
    • 分割: 训练
    • 路径: pair/train-*
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作