five

Assorted, Archetypal and Annotated Two Million (3A2M) Cooking Recipes Dataset

收藏
arXiv2023-03-27 更新2024-06-21 收录
下载链接:
https://tinyurl.com/3zu4778y
下载链接
链接失效反馈
官方服务:
资源简介:
3A2M烹饪食谱数据集是一个包含两百万条食谱的大型数据集,由伊斯兰科技大学和孟加拉工程技术大学的研究团队创建。该数据集从RecipeNLG数据集中收集食谱,并通过领域专家的知识和主动学习技术进行分类,分为九个类别:烘焙、饮料、非素食、蔬菜、快餐、谷物、餐食、配菜和融合。数据集的创建过程涉及专家手动分类30万条食谱,剩余的190万条使用主动学习和查询委员会方法自动分类。该数据集可用于多种机器学习任务,如食谱类别分类、特定类别食谱生成和新食谱创作,同时也适用于自然语言处理任务,如命名实体识别、词性标注和语义角色标注。

The 3A2M Cooking Recipe Dataset is a large-scale dataset containing 2 million recipes, developed by a research team from the Islamic University of Technology and Bangladesh University of Engineering and Technology. This dataset collects recipes from the RecipeNLG dataset, and classifies them into nine categories using domain expert knowledge and active learning techniques: baking, beverages, non-vegetarian dishes, vegetables, fast food, cereals, main courses, side dishes, and fusion cuisines. The construction of this dataset entails manual classification of 300,000 recipes by domain experts, while the remaining 1.9 million recipes are automatically categorized via active learning and query committee approaches. This dataset supports a wide range of machine learning tasks, including recipe category classification, targeted recipe generation for specific categories, and novel recipe creation. Additionally, it is applicable to natural language processing tasks such as named entity recognition, part-of-speech tagging, and semantic role labeling.
提供机构:
伊斯兰科技大学(IUT), 加济布尔, 孟加拉国 2 孟加拉工程技术大学, 达卡, 孟加拉国
创建时间:
2023-03-27
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作