DORE
收藏arXiv2024-03-28 更新2024-06-21 收录
下载链接:
https://huggingface.co/datasets/multidefmod/dore
下载链接
链接失效反馈官方服务:
资源简介:
DORE是首个针对葡萄牙语的定义建模数据集,由加利福尼亚大学创建,包含超过103,019条定义。该数据集通过网络爬虫从Dicio和葡萄牙语维基词典中提取数据,旨在解决葡萄牙语在定义生成领域的空白。数据集内容丰富,涵盖多种语言变体,适用于机器学习模型的训练和评估,特别是在自然语言处理和语言学习资源开发中具有重要应用价值。
DORE is the first Portuguese-language definition modeling dataset developed by the University of California, containing over 103,019 definition entries. This dataset is constructed by extracting data from Dicio and Portuguese Wiktionary through web crawling, with the goal of addressing the existing gap in the field of Portuguese definition generation. Featuring rich content spanning multiple language variants, the dataset is applicable to the training and evaluation of machine learning models, and holds significant application value particularly in natural language processing and the development of language learning resources.
提供机构:
加利福尼亚大学
创建时间:
2024-03-27



