five

Entity6K

收藏
arXiv2024-03-19 更新2024-08-06 收录
下载链接:
http://arxiv.org/abs/2403.12339v1
下载链接
链接失效反馈
官方服务:
资源简介:
Entity6K是由微软Azure AI和卡内基梅隆大学合作开发的大型开放领域实体识别评估数据集,包含5700个跨26个类别的独特实体。每个实体配有5张经人工验证的图像及标注,总计28500张图像。数据集通过在Flickr上使用实体名称作为搜索查询收集,经过严格的人工审核和标注过程,确保数据质量和多样性。Entity6K旨在解决现有数据集在实体识别方面的不足,特别是在实体名称的精确性和图像的复杂性方面。该数据集适用于图像字幕、对象检测、零样本分类和密集字幕等多种任务,为评估模型在开放领域实体识别能力提供了宝贵的资源。

Entity6K is a large-scale open-domain entity recognition evaluation dataset co-developed by Microsoft Azure AI and Carnegie Mellon University. It contains 5,700 unique entities spanning 26 categories. Each entity is equipped with 5 manually verified images and corresponding annotations, totaling 28,500 images. The dataset was collected by using entity names as search queries on Flickr, and underwent strict manual review and annotation procedures to ensure data quality and diversity. Entity6K is designed to address the shortcomings of existing datasets in entity recognition, particularly regarding the precision of entity names and the complexity of accompanying images. This dataset is applicable to multiple tasks such as image captioning, object detection, zero-shot classification, and dense captioning, serving as a valuable resource for evaluating models' open-domain entity recognition capabilities.
提供机构:
微软Azure AI
创建时间:
2024-03-19
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作