five

mbien/recipe_nlg

收藏
Hugging Face2024-01-18 更新2024-05-25 收录
下载链接:
https://hf-mirror.com/datasets/mbien/recipe_nlg
下载链接
链接失效反馈
官方服务:
资源简介:
--- annotations_creators: - found language_creators: - found language: - en license: - unknown multilinguality: - monolingual size_categories: - 1M<n<10M source_datasets: - original task_categories: - text2text-generation - text-generation - fill-mask - text-retrieval - summarization task_ids: - document-retrieval - entity-linking-retrieval - explanation-generation - language-modeling - masked-language-modeling paperswithcode_id: recipenlg pretty_name: RecipeNLG dataset_info: features: - name: id dtype: int32 - name: title dtype: string - name: ingredients sequence: string - name: directions sequence: string - name: link dtype: string - name: source dtype: class_label: names: '0': Gathered '1': Recipes1M - name: ner sequence: string splits: - name: train num_bytes: 2194783815 num_examples: 2231142 download_size: 0 dataset_size: 2194783815 --- # Dataset Card for RecipeNLG ## Table of Contents - [Dataset Description](#dataset-description) - [Dataset Summary](#dataset-summary) - [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards) - [Languages](#languages) - [Dataset Structure](#dataset-structure) - [Data Instances](#data-instances) - [Data Fields](#data-fields) - [Data Splits](#data-splits) - [Dataset Creation](#dataset-creation) - [Curation Rationale](#curation-rationale) - [Source Data](#source-data) - [Annotations](#annotations) - [Personal and Sensitive Information](#personal-and-sensitive-information) - [Considerations for Using the Data](#considerations-for-using-the-data) - [Social Impact of Dataset](#social-impact-of-dataset) - [Discussion of Biases](#discussion-of-biases) - [Other Known Limitations](#other-known-limitations) - [Additional Information](#additional-information) - [Dataset Curators](#dataset-curators) - [Licensing Information](#licensing-information) - [Citation Information](#citation-information) - [Contributions](#contributions) ## Dataset Description - **Homepage:** https://recipenlg.cs.put.poznan.pl/ - **Repository:** https://github.com/Glorf/recipenlg - **Paper:** https://www.aclweb.org/anthology/volumes/2020.inlg-1/ - **Leaderboard:** [More Information Needed] - **Point of Contact:** [More Information Needed] ### Dataset Summary RecipeNLG: A Cooking Recipes Dataset for Semi-Structured Text Generation. While the RecipeNLG dataset is based on the Recipe1M+ dataset, it greatly expands the number of recipes available. The new dataset provides over 1 million new, preprocessed and deduplicated recipes on top of the Recipe1M+ dataset. ### Supported Tasks and Leaderboards [More Information Needed] ### Languages The dataset is in English. ## Dataset Structure ### Data Instances ``` {'id': 0, 'title': 'No-Bake Nut Cookies', 'ingredients': ['1 c. firmly packed brown sugar', '1/2 c. evaporated milk', '1/2 tsp. vanilla', '1/2 c. broken nuts (pecans)', '2 Tbsp. butter or margarine', '3 1/2 c. bite size shredded rice biscuits'], 'directions': ['In a heavy 2-quart saucepan, mix brown sugar, nuts, evaporated milk and butter or margarine.', 'Stir over medium heat until mixture bubbles all over top.', 'Boil and stir 5 minutes more. Take off heat.', 'Stir in vanilla and cereal; mix well.', 'Using 2 teaspoons, drop and shape into 30 clusters on wax paper.', 'Let stand until firm, about 30 minutes.'], 'link': 'www.cookbooks.com/Recipe-Details.aspx?id=44874', 'source': 0, 'ner': ['brown sugar', 'milk', 'vanilla', 'nuts', 'butter', 'bite size shredded rice biscuits']} ``` ### Data Fields - `id` (`int`): ID. - `title` (`str`): Title of the recipe. - `ingredients` (`list` of `str`): Ingredients. - `directions` (`list` of `str`): Instruction steps. - `link` (`str`): URL link. - `source` (`ClassLabel`): Origin of each recipe record, with possible value {"Gathered", "Recipes1M"}: - "Gathered" (0): Additional recipes gathered from multiple cooking web pages, using automated scripts in a web scraping process. - "Recipes1M" (1): Recipes from "Recipe1M+" dataset. - `ner` (`list` of `str`): NER food entities. ### Data Splits The dataset contains a single `train` split. ## Dataset Creation ### Curation Rationale [More Information Needed] ### Source Data [More Information Needed] #### Initial Data Collection and Normalization [More Information Needed] #### Who are the source language producers? [More Information Needed] ### Annotations [More Information Needed] #### Annotation process [More Information Needed] #### Who are the annotators? [More Information Needed] ### Personal and Sensitive Information [More Information Needed] ## Considerations for Using the Data ### Social Impact of Dataset [More Information Needed] ### Discussion of Biases [More Information Needed] ### Other Known Limitations [More Information Needed] ## Additional Information ### Dataset Curators [More Information Needed] ### Licensing Information I (the "Researcher") have requested permission to use the RecipeNLG dataset (the "Dataset") at Poznań University of Technology (PUT). In exchange for such permission, Researcher hereby agrees to the following terms and conditions: 1. Researcher shall use the Dataset only for non-commercial research and educational purposes. 2. PUT makes no representations or warranties regarding the Dataset, including but not limited to warranties of non-infringement or fitness for a particular purpose. 3. Researcher accepts full responsibility for his or her use of the Dataset and shall defend and indemnify PUT, including its employees, Trustees, officers and agents, against any and all claims arising from Researcher's use of the Dataset including but not limited to Researcher's use of any copies of copyrighted images or text that he or she may create from the Dataset. 4. Researcher may provide research associates and colleagues with access to the Dataset provided that they first agree to be bound by these terms and conditions. 5. If Researcher is employed by a for-profit, commercial entity, Researcher's employer shall also be bound by these terms and conditions, and Researcher hereby represents that he or she is fully authorized to enter into this agreement on behalf of such employer. ### Citation Information ```bibtex @inproceedings{bien-etal-2020-recipenlg, title = "{R}ecipe{NLG}: A Cooking Recipes Dataset for Semi-Structured Text Generation", author = "Bie{\'n}, Micha{\l} and Gilski, Micha{\l} and Maciejewska, Martyna and Taisner, Wojciech and Wisniewski, Dawid and Lawrynowicz, Agnieszka", booktitle = "Proceedings of the 13th International Conference on Natural Language Generation", month = dec, year = "2020", address = "Dublin, Ireland", publisher = "Association for Computational Linguistics", url = "https://www.aclweb.org/anthology/2020.inlg-1.4", pages = "22--28", } ``` ### Contributions Thanks to [@abhishekkrthakur](https://github.com/abhishekkrthakur) for adding this dataset.
提供机构:
mbien
原始信息汇总

数据集概述

数据集描述

数据集摘要

RecipeNLG 是一个用于半结构化文本生成的烹饪食谱数据集。该数据集基于 Recipe1M+ 数据集,提供了超过 100 万条新的、预处理和去重后的食谱。

支持的任务和排行榜

[更多信息待补充]

语言

该数据集为英语。

数据集结构

数据实例

json { "id": 0, "title": "No-Bake Nut Cookies", "ingredients": [ "1 c. firmly packed brown sugar", "1/2 c. evaporated milk", "1/2 tsp. vanilla", "1/2 c. broken nuts (pecans)", "2 Tbsp. butter or margarine", "3 1/2 c. bite size shredded rice biscuits" ], "directions": [ "In a heavy 2-quart saucepan, mix brown sugar, nuts, evaporated milk and butter or margarine.", "Stir over medium heat until mixture bubbles all over top.", "Boil and stir 5 minutes more. Take off heat.", "Stir in vanilla and cereal; mix well.", "Using 2 teaspoons, drop and shape into 30 clusters on wax paper.", "Let stand until firm, about 30 minutes." ], "link": "www.cookbooks.com/Recipe-Details.aspx?id=44874", "source": 0, "ner": [ "brown sugar", "milk", "vanilla", "nuts", "butter", "bite size shredded rice biscuits" ] }

数据字段

  • id (int): 唯一标识符。
  • title (str): 食谱的标题。
  • ingredients (list of str): 食谱的原料列表。
  • directions (list of str): 食谱的步骤说明。
  • link (str): 食谱的 URL 链接。
  • source (ClassLabel): 食谱记录的来源,可能的值为 {"Gathered", "Recipes1M"}:
    • "Gathered" (0): 从多个烹饪网页上通过网络爬虫脚本收集的额外食谱。
    • "Recipes1M" (1): 来自 "Recipe1M+" 数据集的食谱。
  • ner (list of str): 食物实体的命名实体识别。

数据分割

数据集包含一个单独的 train 分割。

数据集创建

策划理由

[更多信息待补充]

源数据

[更多信息待补充]

注释

[更多信息待补充]

个人和敏感信息

[更多信息待补充]

使用数据的注意事项

数据集的社会影响

[更多信息待补充]

偏见的讨论

[更多信息待补充]

其他已知限制

[更多信息待补充]

附加信息

数据集策展人

[更多信息待补充]

许可信息

使用 RecipeNLG 数据集需遵守以下条款和条件:

  1. 仅用于非商业研究和教育目的。
  2. 不提供任何关于数据集的保证。
  3. 使用数据集产生的任何责任由使用者自行承担。
  4. 可以与同意这些条款和条件的研究伙伴和同事共享数据集。
  5. 如果使用者受雇于营利性商业实体,该雇主也受这些条款和条件的约束。

引用信息

bibtex @inproceedings{bien-etal-2020-recipenlg, title = "{R}ecipe{NLG}: A Cooking Recipes Dataset for Semi-Structured Text Generation", author = "Bie{ }, Micha{l} and Gilski, Micha{l} and Maciejewska, Martyna and Taisner, Wojciech and Wisniewski, Dawid and Lawrynowicz, Agnieszka", booktitle = "Proceedings of the 13th International Conference on Natural Language Generation", month = dec, year = "2020", address = "Dublin, Ireland", publisher = "Association for Computational Linguistics", url = "https://www.aclweb.org/anthology/2020.inlg-1.4", pages = "22--28", }

贡献

感谢 @abhishekkrthakur 添加此数据集。

搜集汇总
数据集介绍
main_image_url
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作