five

sayurio/rokomari-bd-product-data

收藏
Hugging Face2026-04-05 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/sayurio/rokomari-bd-product-data
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: other task_categories: - text-generation - table-question-answering language: - bn - en tags: - e-commerce - books - literature - bangladesh - web-scraped - uncleaned pretty_name: Rokomari Book & Product Archive size_categories: - 100K<n<1M --- # Rokomari Book & Product Archive ## Overview This repository contains a massive, large-scale dataset scraped from [rokomari.com](https://www.rokomari.com/), the largest online bookstore and e-commerce platform for literature, electronics, and stationery in Bangladesh. The dataset serves as a comprehensive archive of book metadata, author details, pricing, and product descriptions. ⚠️ **Note on Data Quality:** This dataset currently contains raw, uncleaned scrape data. It includes over 300,000 entries straight from the web. Users should expect to encounter residual HTML artifacts, missing values, or minor formatting inconsistencies. A comprehensive cleanup, deduplication, and standardization pass is planned for a future update. ## Purpose and Usage This dataset is published publicly and strictly for **educational, research, and analytical purposes**. Due to its massive scale, it is an exceptional resource for data scientists, developers, and NLP researchers looking to: * Build and train Book Recommendation Engines or Retrieval-Augmented Generation (RAG) systems tailored to Bengali literature. * Perform market analysis, track historical pricing trends, and study the publishing landscape in Bangladesh. * Train or evaluate models for e-commerce parsing, tabular data extraction, and structured JSON generation. * Analyze cross-lingual (Bengali/English) metadata and product descriptions. ## Dataset Details * **Source:** rokomari.com * **Collection Method:** Web scraping * **Content Type:** E-commerce and literature data (including book titles, author names, publishers, prices, descriptions, and categories). * **Scale:** 300,000+ entries * **Repository:** `sayurio/rokomari-bd-product-data` ## Copyright and Fair Use Disclaimer This archive is created under the principles of **Fair Use** (under Section 107 of the Copyright Act) for purposes such as criticism, comment, research, and scholarship. * **No Ownership Claimed:** The creator of this repository does not claim any ownership, authorship, or copyright over the original book descriptions, cover images, summaries, or branding. All rights, title, and interest in the original content, logos, and trademarks remain entirely with their respective authors, publishers, and Onnorokom Web Services Ltd. (Rokomari). * **Non-Commercial:** This dataset is provided completely free of charge and is strictly not intended for commercial gain, competitive market manipulation, or profit. * **Transformative Use:** The data has been aggregated, extracted from its original web formatting, and compiled specifically for computational analysis and educational study. This represents a transformative use of the original publicly available material. **Takedown Requests:** If you are a copyright holder, publisher, author, or representative of the source website and wish for specific data to be removed from this archive, please open an issue or contact the repository owner directly. Please submit a removal request specifying the exact URLs, ISBNs, or product identifiers you wish to have taken down so they can be accurately located within the 300k+ dataset and removed.
提供机构:
sayurio
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作