five

embedding-data/Amazon-QA

收藏
Hugging Face2022-08-02 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/embedding-data/Amazon-QA
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit language: - en paperswithcode_id: embedding-data/Amazon-QA pretty_name: Amazon-QA task_categories: - sentence-similarity - paraphrase-mining task_ids: - semantic-similarity-classification --- # Dataset Card for "Amazon-QA" ## Table of Contents - [Dataset Description](#dataset-description) - [Dataset Summary](#dataset-summary) - [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards) - [Languages](#languages) - [Dataset Structure](#dataset-structure) - [Data Instances](#data-instances) - [Data Fields](#data-fields) - [Data Splits](#data-splits) - [Dataset Creation](#dataset-creation) - [Curation Rationale](#curation-rationale) - [Source Data](#source-data) - [Annotations](#annotations) - [Personal and Sensitive Information](#personal-and-sensitive-information) - [Considerations for Using the Data](#considerations-for-using-the-data) - [Social Impact of Dataset](#social-impact-of-dataset) - [Discussion of Biases](#discussion-of-biases) - [Other Known Limitations](#other-known-limitations) - [Additional Information](#additional-information) - [Dataset Curators](#dataset-curators) - [Licensing Information](#licensing-information) - [Citation Information](#citation-information) - [Contributions](#contributions) ## Dataset Description - **Homepage:** [http://jmcauley.ucsd.edu/data/amazon/qa/](http://jmcauley.ucsd.edu/data/amazon/qa/) - **Repository:** [More Information Needed](http://jmcauley.ucsd.edu/data/amazon/qa/) - **Paper:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) - **Point of Contact:** [Julian McAuley](https://cseweb.ucsd.edu//~jmcauley/#) - **Size of downloaded dataset files:** - **Size of the generated dataset:** - **Total amount of disk used:** 247 MB ### Dataset Summary This dataset contains Question and Answer data from Amazon. Disclaimer: The team releasing Amazon-QA did not upload the dataset to the Hub and did not write a dataset card. These steps were done by the Hugging Face team. ### Supported Tasks - [Sentence Transformers](https://huggingface.co/sentence-transformers) training; useful for semantic search and sentence similarity. ### Languages - English. ## Dataset Structure Each example in the dataset contains pairs of query and answer sentences and is formatted as a dictionary: ``` {"query": [sentence_1], "pos": [sentence_2]} {"query": [sentence_1], "pos": [sentence_2]} ... {"query": [sentence_1], "pos": [sentence_2]} ``` This dataset is useful for training Sentence Transformers models. Refer to the following post on how to train models using similar sentences. ### Usage Example Install the 🤗 Datasets library with `pip install datasets` and load the dataset from the Hub with: ```python from datasets import load_dataset dataset = load_dataset("embedding-data/Amazon-QA") ``` The dataset is loaded as a `DatasetDict` and has the format: ```python DatasetDict({ train: Dataset({ features: ['query', 'pos'], num_rows: 1095290 }) }) ``` Review an example `i` with: ```python dataset["train"][0] ``` ### Data Instances ### Data Fields ### Data Splits ## Dataset Creation ### Curation Rationale [More Information Needed](http://jmcauley.ucsd.edu/data/amazon/qa/) ### Source Data #### Initial Data Collection and Normalization [More Information Needed](http://jmcauley.ucsd.edu/data/amazon/qa/) #### Who are the source language producers? [More Information Needed](http://jmcauley.ucsd.edu/data/amazon/qa/) ### Annotations #### Annotation process [More Information Needed](http://jmcauley.ucsd.edu/data/amazon/qa/) #### Who are the annotators? [More Information Needed](http://jmcauley.ucsd.edu/data/amazon/qa/) ### Personal and Sensitive Information [More Information Needed](http://jmcauley.ucsd.edu/data/amazon/qa/) ## Considerations for Using the Data ### Social Impact of Dataset [More Information Needed](http://jmcauley.ucsd.edu/data/amazon/qa/) ### Discussion of Biases [More Information Needed](http://jmcauley.ucsd.edu/data/amazon/qa/) ### Other Known Limitations [More Information Needed](http://jmcauley.ucsd.edu/data/amazon/qa/s) ## Additional Information ### Dataset Curators [More Information Needed](http://jmcauley.ucsd.edu/data/amazon/qa/) ### Licensing Information [More Information Needed](http://jmcauley.ucsd.edu/data/amazon/qa/) ### Citation Information ### Contributions
提供机构:
embedding-data
原始信息汇总

数据集概述:Amazon-QA

数据集描述

数据集总结

  • 内容: 包含来自亚马逊的问题和回答数据。
  • 用途: 用于训练Sentence Transformers模型,适用于语义搜索和句子相似性分析。

支持的任务

  • 任务: Sentence Transformers训练。
  • 应用: 语义搜索和句子相似性。

语言

  • 语言: 英语。

数据集结构

数据实例

  • 格式: 每个实例包含查询和回答句子的配对,格式为字典。

  • 示例:

    {"query": [sentence_1], "pos": [sentence_2]}

数据字段

  • 字段: query 和 pos。

数据分割

  • 分割: 数据集被加载为DatasetDict,包含训练集。
  • 示例: python DatasetDict({ train: Dataset({ features: [query, pos], num_rows: 1095290 }) })

数据集创建

来源数据

  • 初始数据收集和标准化: 信息待补充。
  • 源语言生产者: 信息待补充。

注释

  • 注释过程: 信息待补充。
  • 注释者: 信息待补充。

个人和敏感信息

  • 处理: 信息待补充。

使用数据的考虑

数据集的社会影响

  • 影响: 信息待补充。

偏见讨论

  • 偏见: 信息待补充。

其他已知限制

  • 限制: 信息待补充。

附加信息

数据集管理员

  • 管理员: 信息待补充。

许可信息

  • 许可: MIT。

引用信息

  • 引用: 信息待补充。

贡献

  • 贡献: 信息待补充。
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作