nuhuibrahim/recifine
收藏Hugging Face2026-03-05 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/nuhuibrahim/recifine
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- en
license: cc-by-nc-4.0
tags:
- token-classification
- named-entity-recognition
- bert
- roberta
- recipes
- knowledge-augmented
pretty_name: ReciFine
task_categories:
- token-classification
---
# ReciFine: Large Silver-standard Dataset
**ReciFine** is a large, silver-standard dataset derived from the [RecipeNLG](https://aclanthology.org/2020.inlg-1.4/) corpus and enriched with fine-grained
semantic annotations.
## Scale
- **2.2+ million recipes**
- **97+ million entity mentions**
- Token-level annotations across all recipe instructions
## Entity Distribution
| **Entity Type** | **Frequency** | **Top Entities** |
|-------------------|--------------:|:---|
| Food (F) | 30,199,222 | *salt, water, sugar, butter, flour* |
| Tool (T) | 10,384,889 | *bowl, oven, pan, saucepan* |
| Duration (D) | 3,700,982 | *10 min, 5 min, 30 min* |
| Quantity (Q) | 4,476,287 | *remaining, all, 2, half, 1* |
| Chef Action (Ac) | 30,854,542 | *add, bake, mix, stir, cook* |
| Discont. Ac (Ac2) | 2,199,530 | *together, to taste, to a boil* |
| Food Action (Af) | 3,023,358 | *cool, stand, set, combined* |
| Tool Action (At) | 705,504 | *comes out, stand, set* |
| Food State (Sf) | 7,331,595 | *hot, tender, smooth, browned* |
| Tool State (St) | 5,022,448 | *large, medium, small, 350°* |
**Table 1** Frequencies of entities for each ReciFine entity type, along with their most frequent entities.
## ReciFine Annotation Schema
The **ReciFine** follow the [The Case Study Paper](https://www.dl.soc.i.kyoto-u.ac.jp/~tajima/papers/hmdata18yamakatawww.pdf) annotation scheme and include **10 fine-grained entity types** annotated at token level within recipe instructions.
| Tag | Name | Definition |
|-----|------|------------|
| **F** | Food | Edible items; includes both raw ingredients and intermediate products |
| **T** | Tool | Cooking tools such as *knives, bowls and pans* |
| **D** | Duration | Time durations used in cooking (e.g., *20 minutes*) |
| **Q** | Quantity | Quantities associated with ingredients |
| **Ac** | Action by chef | Verbs denoting deliberate actions by the cook (e.g., *bring* in “Bring the mixture to a boil.”) |
| **Ac2** | Discontinuous Ac | Non-contiguous parts of compound chef actions (e.g., *to a boil* in “Bring the mixture to a boil.”) |
| **Af** | Action by food | Verbs where food is the agent (e.g., *melt, boil*) |
| **At** | Action by tool | Verbs pertaining to a tool's action (e.g., *grind, beat*) |
| **Sf** | Food state | Descriptions of food's physical state (e.g., *chopped, soft*) |
| **St** | Tool state | Descriptions of tool state or readiness (e.g., *preheated*, *greased*, *covered*) |
**Table 2:** Entity types in the English Recipe Flow Graph corpus.
## Evaluating Automatic Annotations
To assess the reliability and validity of the **ReciFine** silver-standard annotations, we compared model-generated labels from RecipeBERT and RecipeRoBERTa against human annotations on 500 randomly selected recipes. The models achieved strong agreement with human labels, confirming that the large-scale automatically annotated ReciFine corpus closely aligns with expert annotation quality. This finding mirrors the results observed on the ERFG corpus, further supporting the consistency and robustness of ReciFine’s finely annotated extractions.


language:
- 英语
license: cc-by-nc-4.0
tags:
- 令牌分类(token-classification)
- 命名实体识别(named-entity-recognition)
- BERT
- RoBERTa
- 食谱
- 知识增强(knowledge-augmented)
pretty_name: ReciFine
task_categories:
- 令牌分类(token-classification)
# ReciFine:大规模银标数据集
**ReciFine** 是源自[RecipeNLG](https://aclanthology.org/2020.inlg-1.4/)语料库的大规模银标数据集,并经过细粒度语义注释增强。
## 数据集规模
- **超220万条食谱**
- **超9700万实体提及**
- 覆盖所有食谱步骤的令牌级注释
## 实体分布
| **实体类型** | **出现频次** | **高频实体** |
|-------------------|--------------:|:---|
| 食品(F) | 30,199,222 | *盐、水、糖、黄油、面粉* |
| 工具(T) | 10,384,889 | *碗、烤箱、平底锅、炖锅* |
| 时长(D) | 3,700,982 | *10分钟、5分钟、30分钟* |
| 数量(Q) | 4,476,287 | *剩余、全部、2、一半、1* |
| 厨师动作(Ac) | 30,854,542 | *添加、烘烤、混合、搅拌、烹饪* |
| 非连续动作(Ac2) | 2,199,530 | *混合均匀、按需调味、煮至沸腾* |
| 食材动作(Af) | 3,023,358 | *冷却、静置、定型、混合完成* |
| 工具动作(At) | 705,504 | *取出、静置、定型* |
| 食材状态(Sf) | 7,331,595 | *热、嫩、顺滑、煎至金黄* |
| 工具状态(St) | 5,022,448 | *大、中、小、350°* |
**表1** ReciFine各实体类型的出现频次及对应高频实体。
## ReciFine注释规范
ReciFine遵循[案例研究论文](https://www.dl.soc.i.kyoto-u.ac.jp/~tajima/papers/hmdata18yamakatawww.pdf)的注释框架,在食谱步骤的令牌层面标注了**10种细粒度实体类型**。
| 标签 | 实体名称 | 定义 |
|-----|------|------------|
| **F** | 食品 | 可食用物品,包含生鲜原料与中间产物 |
| **T** | 工具 | 烹饪器具,如刀具、碗具、平底锅等 |
| **D** | 时长 | 烹饪过程中使用的时间长度,例如*20分钟* |
| **Q** | 数量 | 与食材相关的数量表述 |
| **Ac** | 厨师动作 | 表示厨师主动行为的动词,例如“将混合物煮至沸腾”中的*bring* |
| **Ac2** | 非连续动作 | 复合厨师动作的非连续部分,例如“将混合物煮至沸腾”中的*to a boil* |
| **Af** | 食材动作 | 以食材为施动者的动词,例如*melt、boil* |
| **At** | 工具动作 | 与器具动作相关的动词,例如*grind、beat* |
| **Sf** | 食材状态 | 描述食材物理状态的表述,例如*chopped、soft* |
| **St** | 工具状态 | 描述器具状态或就绪程度的表述,例如*preheated、greased、covered* |
**表2:英文食谱流程图语料库的实体类型**
## 自动注释评估
为评估ReciFine银标注释的可靠性与有效性,我们将RecipeBERT与RecipeRoBERTa生成的模型标签,与500份随机选取食谱的人工标注结果进行对比。模型生成的标签与人工标注一致性优异,证实了该大规模自动注释的ReciFine语料库与专家注释质量高度契合。这一发现与ERFG语料库的实验结果一致,进一步验证了ReciFine细粒度注释提取结果的一致性与鲁棒性。


提供机构:
nuhuibrahim



