sayurio/cookpad-scrape-recipes

Name: sayurio/cookpad-scrape-recipes
Creator: sayurio
Published: 2026-03-27 20:09:13
License: 暂无描述

Hugging Face2026-03-27 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/sayurio/cookpad-scrape-recipes

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: mit task_categories: - text-classification - token-classification - question-answering language: - en - hi - bn tags: - food - recipes - cooking - india - web-scraped - non-ai pretty_name: Cookpad India Recipe Archive size_categories: - 100K<n<1M --- # Cookpad India Recipe Archive [Request More Scrapes](https://docs.google.com/forms/d/e/1FAIpQLSdQhqM-YE-1KvLh8E2CKknVBySZh6c58p5SfgfjdSpDUnDdtg/viewform?usp=publish-editor) [Order Private Scrapes](https://discord.gg/eZ92ZVcDyC) ## Overview This repository contains a dataset scraped from [cookpad.com/in](https://cookpad.com/in), a popular community-driven recipe sharing platform. The dataset serves as an extensive archive of diverse, human-created culinary data, capturing home-cooked recipes, ingredient lists, step-by-step instructions, and related web metadata. ## Purpose and Usage This dataset is published publicly and strictly for **educational, research, and archival purposes**. It is an excellent resource for Natural Language Processing (NLP) researchers, data scientists, and developers looking to: * Train or fine-tune models for procedural text generation (e.g., generating recipe steps from a list of ingredients). * Build conversational AI or Retrieval-Augmented Generation (RAG) systems for cooking and culinary assistance. * Perform Named Entity Recognition (NER) to extract measurements, cooking techniques, and ingredients. * Analyze regional culinary trends, dietary preferences, and flavor profiles across India. ## Dataset Details * **Source:** cookpad.com/in * **Collection Method:** Web scraping * **Content Type:** Culinary data (including recipe titles, ingredient lists, procedural instructions, and generic web metadata). * **Repository:** `sayurio/cookpad-scrape-recipes` ## Copyright and Fair Use Disclaimer This archive is created under the principles of **Fair Use** (under Section 107 of the Copyright Act) for purposes such as criticism, comment, teaching, scholarship, and research. * **No Ownership Claimed:** The creator of this repository does not claim any ownership, authorship, or copyright over the original recipes, images, or user-submitted content. All rights, title, and interest in the original text remain entirely with their respective home-chef authors and Cookpad Inc. * **Non-Commercial:** This dataset is provided completely free of charge and is strictly not intended for commercial gain, monetization, or profit. * **Transformative Use:** The data has been aggregated, extracted from its original web formatting, and compiled specifically for computational analysis, archiving, and educational study. This represents a transformative use of the original publicly available material. **Takedown Requests:** If you are a copyright holder, a recipe author, or a representative of the source website and wish for specific content to be removed from this archive, please open an issue or contact the repository owner directly. Please submit a removal request specifying the exact recipe URLs or titles you wish to have taken down so they can be accurately located within the dataset and removed. ## How to Use You can load this dataset directly into your Python environment using the Hugging Face `datasets` library: ```python from datasets import load_dataset # Load the dataset dataset = load_dataset("sayurio/cookpad-scrape-recipes") # View the structure of the first recipe entry print(dataset['train'][0]) ```

license: MIT许可证 task_categories: - 文本分类 - Token分类（Token Classification） - 问答任务 language: - 英语 - 印地语 - 孟加拉语 tags: - 食品 - 食谱 - 烹饪 - 印度 - 网络爬取 - 非人工智能 pretty_name: Cookpad印度食谱档案库 size_categories: - 10万<n<100万 --- # Cookpad印度食谱档案库 [申请更多爬取任务](https://docs.google.com/forms/d/e/1FAIpQLSdQhqM-YE-1KvLh8E2CKknVBySZh6c58p5SfgfjdSpDUnDdtg/viewform?usp=publish-editor) [订购专属爬取服务](https://discord.gg/eZ92ZVcDyC) ## 数据集概览本仓库包含从热门社区型食谱分享平台[cookpad.com/in](https://cookpad.com/in)爬取得到的数据集。该数据集是涵盖多元人类创作烹饪内容的大型档案库，收录家庭自制食谱、配料清单、分步操作指南及相关网页元数据。 ## 用途与使用规范本数据集公开发布，**仅可用于教育、研究与档案保存用途**。对于自然语言处理（Natural Language Processing，NLP）研究者、数据科学家与开发者而言，本数据集是优质的资源，可用于实现以下场景： * 训练或微调流程文本生成模型（例如根据配料列表生成食谱步骤）。 * 构建面向烹饪辅助的对话式AI或检索增强生成（Retrieval-Augmented Generation，RAG）系统。 * 开展命名实体识别（Named Entity Recognition，NER）任务，提取配料用量、烹饪技法与食材名称。 * 分析印度国内各地区的烹饪趋势、饮食偏好与风味特征。 ## 数据集详情 * **数据来源**：cookpad.com/in * **采集方式**：网络爬虫爬取 * **内容类型**：烹饪类数据（包含食谱标题、配料清单、操作步骤说明及通用网页元数据） * **仓库地址**：`sayurio/cookpad-scrape-recipes` ## 版权与合理使用声明本档案库依据**版权法第107条规定的合理使用原则**创建，仅用于评论、点评、教学、学术研究等合法用途。 * **不主张任何所有权**：本仓库创建者不对原始食谱、图片或用户提交内容主张任何所有权、著作权或版权。原始文本的全部权利、所有权与权益均归属于对应的家庭厨师作者与Cookpad公司。 * **非商业用途**：本数据集完全免费提供，严禁用于商业牟利、变现或获取收益。 * **改造性使用**：本数据集的数据均经过聚合、从原始网页格式中提取，并专门为计算分析、档案保存与教育研究而整理，属于对原始公开内容的改造性使用。 **下架请求**：若您为版权持有者、食谱作者或来源网站代表，希望从本档案库中移除特定内容，请提交Issue或直接联系仓库所有者。提交下架请求时，请注明需移除的食谱的具体URL或标题，以便我们在数据集中准确定位并完成移除。 ## 使用方法您可通过Hugging Face的`datasets`库直接将本数据集加载至Python环境中： python from datasets import load_dataset # 加载数据集 dataset = load_dataset("sayurio/cookpad-scrape-recipes") # 查看第一条食谱条目的结构 print(dataset["train"][0])

提供机构：

sayurio

5,000+

优质数据集

54 个

任务类型

进入经典数据集