five

sayurio/cookpad-scrape-recipes

收藏
Hugging Face2026-03-27 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/sayurio/cookpad-scrape-recipes
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit task_categories: - text-classification - token-classification - question-answering language: - en - hi - bn tags: - food - recipes - cooking - india - web-scraped - non-ai pretty_name: Cookpad India Recipe Archive size_categories: - 100K<n<1M --- # Cookpad India Recipe Archive [Request More Scrapes](https://docs.google.com/forms/d/e/1FAIpQLSdQhqM-YE-1KvLh8E2CKknVBySZh6c58p5SfgfjdSpDUnDdtg/viewform?usp=publish-editor) [Order Private Scrapes](https://discord.gg/eZ92ZVcDyC) ## Overview This repository contains a dataset scraped from [cookpad.com/in](https://cookpad.com/in), a popular community-driven recipe sharing platform. The dataset serves as an extensive archive of diverse, human-created culinary data, capturing home-cooked recipes, ingredient lists, step-by-step instructions, and related web metadata. ## Purpose and Usage This dataset is published publicly and strictly for **educational, research, and archival purposes**. It is an excellent resource for Natural Language Processing (NLP) researchers, data scientists, and developers looking to: * Train or fine-tune models for procedural text generation (e.g., generating recipe steps from a list of ingredients). * Build conversational AI or Retrieval-Augmented Generation (RAG) systems for cooking and culinary assistance. * Perform Named Entity Recognition (NER) to extract measurements, cooking techniques, and ingredients. * Analyze regional culinary trends, dietary preferences, and flavor profiles across India. ## Dataset Details * **Source:** cookpad.com/in * **Collection Method:** Web scraping * **Content Type:** Culinary data (including recipe titles, ingredient lists, procedural instructions, and generic web metadata). * **Repository:** `sayurio/cookpad-scrape-recipes` ## Copyright and Fair Use Disclaimer This archive is created under the principles of **Fair Use** (under Section 107 of the Copyright Act) for purposes such as criticism, comment, teaching, scholarship, and research. * **No Ownership Claimed:** The creator of this repository does not claim any ownership, authorship, or copyright over the original recipes, images, or user-submitted content. All rights, title, and interest in the original text remain entirely with their respective home-chef authors and Cookpad Inc. * **Non-Commercial:** This dataset is provided completely free of charge and is strictly not intended for commercial gain, monetization, or profit. * **Transformative Use:** The data has been aggregated, extracted from its original web formatting, and compiled specifically for computational analysis, archiving, and educational study. This represents a transformative use of the original publicly available material. **Takedown Requests:** If you are a copyright holder, a recipe author, or a representative of the source website and wish for specific content to be removed from this archive, please open an issue or contact the repository owner directly. Please submit a removal request specifying the exact recipe URLs or titles you wish to have taken down so they can be accurately located within the dataset and removed. ## How to Use You can load this dataset directly into your Python environment using the Hugging Face `datasets` library: ```python from datasets import load_dataset # Load the dataset dataset = load_dataset("sayurio/cookpad-scrape-recipes") # View the structure of the first recipe entry print(dataset['train'][0]) ```

license: MIT许可证 task_categories: - 文本分类 - Token分类(Token Classification) - 问答任务 language: - 英语 - 印地语 - 孟加拉语 tags: - 食品 - 食谱 - 烹饪 - 印度 - 网络爬取 - 非人工智能 pretty_name: Cookpad印度食谱档案库 size_categories: - 10万<n<100万 --- # Cookpad印度食谱档案库 [申请更多爬取任务](https://docs.google.com/forms/d/e/1FAIpQLSdQhqM-YE-1KvLh8E2CKknVBySZh6c58p5SfgfjdSpDUnDdtg/viewform?usp=publish-editor) [订购专属爬取服务](https://discord.gg/eZ92ZVcDyC) ## 数据集概览 本仓库包含从热门社区型食谱分享平台[cookpad.com/in](https://cookpad.com/in)爬取得到的数据集。该数据集是涵盖多元人类创作烹饪内容的大型档案库,收录家庭自制食谱、配料清单、分步操作指南及相关网页元数据。 ## 用途与使用规范 本数据集公开发布,**仅可用于教育、研究与档案保存用途**。对于自然语言处理(Natural Language Processing,NLP)研究者、数据科学家与开发者而言,本数据集是优质的资源,可用于实现以下场景: * 训练或微调流程文本生成模型(例如根据配料列表生成食谱步骤)。 * 构建面向烹饪辅助的对话式AI或检索增强生成(Retrieval-Augmented Generation,RAG)系统。 * 开展命名实体识别(Named Entity Recognition,NER)任务,提取配料用量、烹饪技法与食材名称。 * 分析印度国内各地区的烹饪趋势、饮食偏好与风味特征。 ## 数据集详情 * **数据来源**:cookpad.com/in * **采集方式**:网络爬虫爬取 * **内容类型**:烹饪类数据(包含食谱标题、配料清单、操作步骤说明及通用网页元数据) * **仓库地址**:`sayurio/cookpad-scrape-recipes` ## 版权与合理使用声明 本档案库依据**版权法第107条规定的合理使用原则**创建,仅用于评论、点评、教学、学术研究等合法用途。 * **不主张任何所有权**:本仓库创建者不对原始食谱、图片或用户提交内容主张任何所有权、著作权或版权。原始文本的全部权利、所有权与权益均归属于对应的家庭厨师作者与Cookpad公司。 * **非商业用途**:本数据集完全免费提供,严禁用于商业牟利、变现或获取收益。 * **改造性使用**:本数据集的数据均经过聚合、从原始网页格式中提取,并专门为计算分析、档案保存与教育研究而整理,属于对原始公开内容的改造性使用。 **下架请求**:若您为版权持有者、食谱作者或来源网站代表,希望从本档案库中移除特定内容,请提交Issue或直接联系仓库所有者。提交下架请求时,请注明需移除的食谱的具体URL或标题,以便我们在数据集中准确定位并完成移除。 ## 使用方法 您可通过Hugging Face的`datasets`库直接将本数据集加载至Python环境中: python from datasets import load_dataset # 加载数据集 dataset = load_dataset("sayurio/cookpad-scrape-recipes") # 查看第一条食谱条目的结构 print(dataset["train"][0])
提供机构:
sayurio
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作