CMDBench
收藏arXiv2024-06-02 更新2024-06-21 收录
下载链接:
https://github.com/megagonlabs/CMDBench
下载链接
链接失效反馈官方服务:
资源简介:
CMDBench是由美金刚实验室创建的多模态数据发现基准,专注于复合AI系统中的粗到细数据发现。该数据集整合了来自Wikipedia的文档和表格数据,并引入了从Wikidata提取的知识图作为新的数据模态。CMDBench旨在通过模拟企业数据平台的复杂性,评估多模态数据检索器在实际环境中的性能。数据集的应用领域包括搜索、问答、聊天、事实检查等知识密集型任务,旨在解决企业数据平台中多模态数据源的发现挑战。
CMDBench is a multimodal data discovery benchmark created by Meijin Laboratory, focusing on coarse-to-fine data discovery in composite AI systems. This dataset integrates textual documents and tabular data from Wikipedia, and introduces knowledge graphs extracted from Wikidata as a new data modality. CMDBench aims to evaluate the performance of multimodal data retrievers in real-world scenarios by simulating the complexity of enterprise data platforms. Its application areas cover knowledge-intensive tasks such as search, question answering, chat, and fact-checking, and it is designed to address the challenge of multimodal data source discovery in enterprise data platforms.
提供机构:
美金刚实验室
创建时间:
2024-06-02



