five

nyuuzyou/wb-questions

收藏
Hugging Face2024-01-15 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/nyuuzyou/wb-questions
下载链接
链接失效反馈
官方服务:
资源简介:
--- annotations_creators: - crowdsourced language: - ru language_creators: - crowdsourced license: - cc0-1.0 multilinguality: - monolingual pretty_name: Wildberries Q&A size_categories: - 1M<n<10M source_datasets: - original task_categories: - text-generation - question-answering task_ids: - language-modeling - open-domain-qa --- # Dataset Card for Wildberries questions ### Dataset Summary This is a dataset of questions and answers scraped from product pages from the Russian marketplace [Wildberries](https://www.wildberries.ru). Dataset contains all questions and answers, as well as all metadata from the API. However, the "productName" field may be empty in some cases because the API does not return the name for old products. ### Languages The dataset is mostly in Russian, but there may be other languages present. ## Dataset Structure ### Data Fields This dataset consists of the following fields: - `imtId` - An identifier for the item (integer) - `nmId` - A numeric identifier associated with the item (integer) - `productName` - The name of the product (string, can be empty) - `supplierArticle` - The article number provided by the supplier (string) - `supplierId` - The identifier for the supplier (integer) - `supplierName` - The name of the supplier (string) - `brandName` - The name of the brand (string) - `question` - The customer's question regarding the product (string) - `answer` - The provided answer to the question (string) ### Data Splits All 7410007 examples are in the train split, there is no validation split. ## Additional Information ### License This dataset is dedicated to the public domain under the Creative Commons Zero (CC0) license. This means you can: * Use it for any purpose, including commercial projects. * Modify it however you like. * Distribute it without asking permission. No attribution is required, but it's always appreciated! CC0 license: https://creativecommons.org/publicdomain/zero/1.0/deed.en To learn more about CC0, visit the Creative Commons website: https://creativecommons.org/publicdomain/zero/1.0/ ### Dataset Curators - [nyuuzyou](https://ducks.party)
提供机构:
nyuuzyou
原始信息汇总

数据集卡片 for Wildberries Q&A

数据集概述

这是一个从俄罗斯市场Wildberries产品页面抓取的问题和答案数据集。数据集包含所有问题和答案,以及来自API的所有元数据。然而,在某些情况下,“productName”字段可能为空,因为API不会返回旧产品的名称。

语言

数据集主要为俄语,但也可能包含其他语言。

数据集结构

数据字段

数据集包含以下字段:

  • imtId - 商品的标识符(整数)
  • nmId - 与商品关联的数字标识符(整数)
  • productName - 产品的名称(字符串,可能为空)
  • supplierArticle - 供应商提供的文章编号(字符串)
  • supplierId - 供应商的标识符(整数)
  • supplierName - 供应商的名称(字符串)
  • brandName - 品牌的名称(字符串)
  • question - 客户关于产品的提问(字符串)
  • answer - 对问题的回答(字符串)

数据分割

所有7410007个样本都在训练分割中,没有验证分割。

附加信息

许可证

该数据集根据Creative Commons Zero (CC0) 许可证公开到公共领域。这意味着你可以:

  • 将其用于任何目的,包括商业项目。
  • 随意修改。
  • 无需请求许可即可分发。

无需署名,但欢迎署名!

CC0许可证:https://creativecommons.org/publicdomain/zero/1.0/deed.en

了解更多关于CC0的信息,请访问Creative Commons网站:https://creativecommons.org/publicdomain/zero/1.0/

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作