nyuuzyou/PM-products
收藏Hugging Face2024-02-04 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/nyuuzyou/PM-products
下载链接
链接失效反馈官方服务:
资源简介:
---
annotations_creators:
- crowdsourced
language:
- ru
language_creators:
- crowdsourced
license:
- cc0-1.0
multilinguality:
- monolingual
pretty_name: PochtaMarket products
size_categories:
- 10K<n<100K
source_datasets:
- original
task_categories:
- text-generation
task_ids:
- language-modeling
---
# Dataset Card for PochtaMarket products
### Dataset Summary
This dataset was scraped from product pages on the Russian marketplace [PochtaMarket](https://market.pochta.ru). It includes all information from the product card. The dataset was collected by processing around 500 thousand, starting from the first one. At the time the dataset was collected, it is assumed that these were all the products available on this marketplace. Some fields may be empty, but the string is expected to contain some data, empty responses have been sorted.
### Languages
The dataset is mostly in Russian, but there may be other languages present.
## Dataset Structure
### Data Fields
This dataset includes the following fields:
- `id`: Identifier for the product (integer)
- `name`: Name of the product (string)
- `description`: Short description of the product (string)
- `longDescription`: Detailed description of the product (string)
- `seoKeywords`: Search engine optimization keywords for the product (string)
- `brand`: Brand name associated with the product (string)
- `providerName`: Name of the provider or seller (string)
### Data Splits
All examples are in the train split, there is no validation split.
## Additional Information
### License
This dataset is dedicated to the public domain under the Creative Commons Zero (CC0) license. This means you can:
* Use it for any purpose, including commercial projects.
* Modify it however you like.
* Distribute it without asking permission.
No attribution is required, but it's always appreciated!
CC0 license: https://creativecommons.org/publicdomain/zero/1.0/deed.en
To learn more about CC0, visit the Creative Commons website: https://creativecommons.org/publicdomain/zero/1.0/
### Dataset Curators
- [nyuuzyou](https://ducks.party)
提供机构:
nyuuzyou
原始信息汇总
数据集卡片 for PochtaMarket products
数据集概述
该数据集是从俄罗斯市场平台PochtaMarket的产品页面抓取的。它包含了产品卡片的所有信息。数据集是通过处理大约50万个产品页面收集的,从第一个产品开始。在数据集收集时,假设这些是该市场平台上所有可用的产品。某些字段可能为空,但字符串应包含某些数据,空响应已被排序。
语言
数据集主要为俄语,但也可能包含其他语言。
数据集结构
数据字段
该数据集包括以下字段:
id: 产品标识符(整数)name: 产品名称(字符串)description: 产品简短描述(字符串)longDescription: 产品详细描述(字符串)seoKeywords: 产品搜索引擎优化关键词(字符串)brand: 与产品关联的品牌名称(字符串)providerName: 提供者或卖家名称(字符串)
数据分割
所有示例都在训练分割中,没有验证分割。
附加信息
许可证
该数据集根据Creative Commons Zero (CC0) 许可证发布到公共领域。这意味着您可以:
- 将其用于任何目的,包括商业项目。
- 以任何方式修改它。
- 无需请求许可即可分发它。
不需要署名,但总是受到欢迎!
CC0许可证:https://creativecommons.org/publicdomain/zero/1.0/deed.en
要了解更多关于CC0的信息,请访问Creative Commons网站:https://creativecommons.org/publicdomain/zero/1.0/



