five

DanielDDDS/recipe-modifications

收藏
Hugging Face2026-03-21 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/DanielDDDS/recipe-modifications
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit --- # Hebrew Recipe Modification Dataset ## Overview 10,058 Hebrew comment threads from YouTube cooking channels, annotated for recipe modification extraction using a three-pass Teacher-Student distillation approach. ## Task Token-level BIO tagging to extract recipe modifications from Hebrew user comments. Four modification aspects: SUBSTITUTION, QUANTITY, TECHNIQUE, ADDITION. ## Dataset Structure ### Raw Data - **threads.jsonl** — 10,058 comment threads (top comment + replies) from 17 Hebrew cooking channels - **teacher_output.jsonl** — Silver labels from three-pass annotation with majority vote - **final_stats.json** — Labeling statistics ### Processed (BIO-tagged, ready for training) - **processed/train.jsonl** — 8,129 examples (1,066 positive) - **processed/val.jsonl** — 1,016 examples (134 positive) - **processed/test.jsonl** — 1,012 examples (128 positive) - **processed/label2id.json** — 9 BIO labels - **processed/id2label.json** — Reverse mapping - **processed/stats.json** — Preprocessing statistics ## Labeling Pipeline | Pass | Model | Temperature | Role | |------|-------|-------------|------| | 1 | Gemini 3.1 Flash Lite | 0.1 | Primary annotator | | 2 | Gemini 3.1 Flash Lite | 0.3 | Intra-annotator consistency | | 3 | Qwen 3 235B (Cerebras) | 0.1 | Inter-annotator validation | ### Agreement | Vote Method | Count | % | |-------------|-------|---| | Unanimous (3/3) | 8,907 | 88.6% | | Majority (2/3) | 1,142 | 11.4% | | Manual review | 9 | 0.1% | ### Final Labels | Status | Count | % | |--------|-------|---| | With modification | 1,230 | 12.2% | | No modification | 8,828 | 87.8% | ## Label Schema (BIO, 9 labels) `````` O, B-SUBSTITUTION, I-SUBSTITUTION, B-QUANTITY, I-QUANTITY, B-TECHNIQUE, I-TECHNIQUE, B-ADDITION, I-ADDITION `````` ## Intended Use Training Hebrew token classification models (e.g., AlephBERT) for recipe modification extraction via knowledge distillation. ## Citation If you use this dataset, please cite our project report. "@ | Out-File -Encoding utf8 data\hf_readme.md huggingface-cli upload DanielDDDS/recipe-modifications data/hf_readme.md README.md --repo-type dataset
提供机构:
DanielDDDS
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作