five

nyuuzyou/PM-products

收藏
Hugging Face2024-02-04 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/nyuuzyou/PM-products
下载链接
链接失效反馈
官方服务:
资源简介:
--- annotations_creators: - crowdsourced language: - ru language_creators: - crowdsourced license: - cc0-1.0 multilinguality: - monolingual pretty_name: PochtaMarket products size_categories: - 10K<n<100K source_datasets: - original task_categories: - text-generation task_ids: - language-modeling --- # Dataset Card for PochtaMarket products ### Dataset Summary This dataset was scraped from product pages on the Russian marketplace [PochtaMarket](https://market.pochta.ru). It includes all information from the product card. The dataset was collected by processing around 500 thousand, starting from the first one. At the time the dataset was collected, it is assumed that these were all the products available on this marketplace. Some fields may be empty, but the string is expected to contain some data, empty responses have been sorted. ### Languages The dataset is mostly in Russian, but there may be other languages present. ## Dataset Structure ### Data Fields This dataset includes the following fields: - `id`: Identifier for the product (integer) - `name`: Name of the product (string) - `description`: Short description of the product (string) - `longDescription`: Detailed description of the product (string) - `seoKeywords`: Search engine optimization keywords for the product (string) - `brand`: Brand name associated with the product (string) - `providerName`: Name of the provider or seller (string) ### Data Splits All examples are in the train split, there is no validation split. ## Additional Information ### License This dataset is dedicated to the public domain under the Creative Commons Zero (CC0) license. This means you can: * Use it for any purpose, including commercial projects. * Modify it however you like. * Distribute it without asking permission. No attribution is required, but it's always appreciated! CC0 license: https://creativecommons.org/publicdomain/zero/1.0/deed.en To learn more about CC0, visit the Creative Commons website: https://creativecommons.org/publicdomain/zero/1.0/ ### Dataset Curators - [nyuuzyou](https://ducks.party)
提供机构:
nyuuzyou
原始信息汇总

数据集卡片 for PochtaMarket products

数据集概述

该数据集是从俄罗斯市场平台PochtaMarket的产品页面抓取的。它包含了产品卡片的所有信息。数据集是通过处理大约50万个产品页面收集的,从第一个产品开始。在数据集收集时,假设这些是该市场平台上所有可用的产品。某些字段可能为空,但字符串应包含某些数据,空响应已被排序。

语言

数据集主要为俄语,但也可能包含其他语言。

数据集结构

数据字段

该数据集包括以下字段:

  • id: 产品标识符(整数)
  • name: 产品名称(字符串)
  • description: 产品简短描述(字符串)
  • longDescription: 产品详细描述(字符串)
  • seoKeywords: 产品搜索引擎优化关键词(字符串)
  • brand: 与产品关联的品牌名称(字符串)
  • providerName: 提供者或卖家名称(字符串)

数据分割

所有示例都在训练分割中,没有验证分割。

附加信息

许可证

该数据集根据Creative Commons Zero (CC0) 许可证发布到公共领域。这意味着您可以:

  • 将其用于任何目的,包括商业项目。
  • 以任何方式修改它。
  • 无需请求许可即可分发它。

不需要署名,但总是受到欢迎!

CC0许可证:https://creativecommons.org/publicdomain/zero/1.0/deed.en

要了解更多关于CC0的信息,请访问Creative Commons网站:https://creativecommons.org/publicdomain/zero/1.0/

数据集策展人

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作