sayurio/cookpad-scrape-recipes
收藏Hugging Face2026-03-27 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/sayurio/cookpad-scrape-recipes
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
task_categories:
- text-classification
- token-classification
- question-answering
language:
- en
- hi
- bn
tags:
- food
- recipes
- cooking
- india
- web-scraped
- non-ai
pretty_name: Cookpad India Recipe Archive
size_categories:
- 100K<n<1M
---
# Cookpad India Recipe Archive
[Request More Scrapes](https://docs.google.com/forms/d/e/1FAIpQLSdQhqM-YE-1KvLh8E2CKknVBySZh6c58p5SfgfjdSpDUnDdtg/viewform?usp=publish-editor)
[Order Private Scrapes](https://discord.gg/eZ92ZVcDyC)
## Overview
This repository contains a dataset scraped from [cookpad.com/in](https://cookpad.com/in), a popular community-driven recipe sharing platform. The dataset serves as an extensive archive of diverse, human-created culinary data, capturing home-cooked recipes, ingredient lists, step-by-step instructions, and related web metadata.
## Purpose and Usage
This dataset is published publicly and strictly for **educational, research, and archival purposes**. It is an excellent resource for Natural Language Processing (NLP) researchers, data scientists, and developers looking to:
* Train or fine-tune models for procedural text generation (e.g., generating recipe steps from a list of ingredients).
* Build conversational AI or Retrieval-Augmented Generation (RAG) systems for cooking and culinary assistance.
* Perform Named Entity Recognition (NER) to extract measurements, cooking techniques, and ingredients.
* Analyze regional culinary trends, dietary preferences, and flavor profiles across India.
## Dataset Details
* **Source:** cookpad.com/in
* **Collection Method:** Web scraping
* **Content Type:** Culinary data (including recipe titles, ingredient lists, procedural instructions, and generic web metadata).
* **Repository:** `sayurio/cookpad-scrape-recipes`
## Copyright and Fair Use Disclaimer
This archive is created under the principles of **Fair Use** (under Section 107 of the Copyright Act) for purposes such as criticism, comment, teaching, scholarship, and research.
* **No Ownership Claimed:** The creator of this repository does not claim any ownership, authorship, or copyright over the original recipes, images, or user-submitted content. All rights, title, and interest in the original text remain entirely with their respective home-chef authors and Cookpad Inc.
* **Non-Commercial:** This dataset is provided completely free of charge and is strictly not intended for commercial gain, monetization, or profit.
* **Transformative Use:** The data has been aggregated, extracted from its original web formatting, and compiled specifically for computational analysis, archiving, and educational study. This represents a transformative use of the original publicly available material.
**Takedown Requests:** If you are a copyright holder, a recipe author, or a representative of the source website and wish for specific content to be removed from this archive, please open an issue or contact the repository owner directly. Please submit a removal request specifying the exact recipe URLs or titles you wish to have taken down so they can be accurately located within the dataset and removed.
## How to Use
You can load this dataset directly into your Python environment using the Hugging Face `datasets` library:
```python
from datasets import load_dataset
# Load the dataset
dataset = load_dataset("sayurio/cookpad-scrape-recipes")
# View the structure of the first recipe entry
print(dataset['train'][0])
```
license: MIT许可证
task_categories:
- 文本分类
- Token分类(Token Classification)
- 问答任务
language:
- 英语
- 印地语
- 孟加拉语
tags:
- 食品
- 食谱
- 烹饪
- 印度
- 网络爬取
- 非人工智能
pretty_name: Cookpad印度食谱档案库
size_categories:
- 10万<n<100万
---
# Cookpad印度食谱档案库
[申请更多爬取任务](https://docs.google.com/forms/d/e/1FAIpQLSdQhqM-YE-1KvLh8E2CKknVBySZh6c58p5SfgfjdSpDUnDdtg/viewform?usp=publish-editor)
[订购专属爬取服务](https://discord.gg/eZ92ZVcDyC)
## 数据集概览
本仓库包含从热门社区型食谱分享平台[cookpad.com/in](https://cookpad.com/in)爬取得到的数据集。该数据集是涵盖多元人类创作烹饪内容的大型档案库,收录家庭自制食谱、配料清单、分步操作指南及相关网页元数据。
## 用途与使用规范
本数据集公开发布,**仅可用于教育、研究与档案保存用途**。对于自然语言处理(Natural Language Processing,NLP)研究者、数据科学家与开发者而言,本数据集是优质的资源,可用于实现以下场景:
* 训练或微调流程文本生成模型(例如根据配料列表生成食谱步骤)。
* 构建面向烹饪辅助的对话式AI或检索增强生成(Retrieval-Augmented Generation,RAG)系统。
* 开展命名实体识别(Named Entity Recognition,NER)任务,提取配料用量、烹饪技法与食材名称。
* 分析印度国内各地区的烹饪趋势、饮食偏好与风味特征。
## 数据集详情
* **数据来源**:cookpad.com/in
* **采集方式**:网络爬虫爬取
* **内容类型**:烹饪类数据(包含食谱标题、配料清单、操作步骤说明及通用网页元数据)
* **仓库地址**:`sayurio/cookpad-scrape-recipes`
## 版权与合理使用声明
本档案库依据**版权法第107条规定的合理使用原则**创建,仅用于评论、点评、教学、学术研究等合法用途。
* **不主张任何所有权**:本仓库创建者不对原始食谱、图片或用户提交内容主张任何所有权、著作权或版权。原始文本的全部权利、所有权与权益均归属于对应的家庭厨师作者与Cookpad公司。
* **非商业用途**:本数据集完全免费提供,严禁用于商业牟利、变现或获取收益。
* **改造性使用**:本数据集的数据均经过聚合、从原始网页格式中提取,并专门为计算分析、档案保存与教育研究而整理,属于对原始公开内容的改造性使用。
**下架请求**:若您为版权持有者、食谱作者或来源网站代表,希望从本档案库中移除特定内容,请提交Issue或直接联系仓库所有者。提交下架请求时,请注明需移除的食谱的具体URL或标题,以便我们在数据集中准确定位并完成移除。
## 使用方法
您可通过Hugging Face的`datasets`库直接将本数据集加载至Python环境中:
python
from datasets import load_dataset
# 加载数据集
dataset = load_dataset("sayurio/cookpad-scrape-recipes")
# 查看第一条食谱条目的结构
print(dataset["train"][0])
提供机构:
sayurio



