Orca

Name: Orca
Creator: 香港科技大学（广州），香港科技大学
Published: 2023-10-13 20:13:21
License: 暂无描述

arXiv2023-10-13 更新2024-06-21 收录

下载链接：

https://github.com/nuochenpku/Orca

下载链接

链接失效反馈

官方服务：

资源简介：

Orca是首个针对中文对话机器阅读理解（CMRC）的基准数据集，由香港科技大学开发。该数据集包含831个热点话题驱动的对话，总计4742轮。每个对话轮次都与一个相关的回复段落配对，旨在更合理地评估模型的理解能力。话题来源于社交媒体平台，覆盖33个领域，力求与真实场景一致。Orca中的答案均为精心标注的自然回复，而非特定范围或短语，这要求模型不仅要有理解能力，还要有生成能力。数据集适用于评估模型对新知识和问题的适应性，以及在多领域中的泛化能力。

Orca is the first benchmark dataset for Chinese Machine Reading Comprehension (CMRC), developed by The Hong Kong University of Science and Technology. It contains 831 hot-topic-driven dialogues, totaling 4742 turns. Each dialogue turn is paired with a relevant response paragraph, aiming to enable a more reasonable and realistic evaluation of the model's comprehension capabilities. The topics are sourced from social media platforms, spanning 33 domains, and are designed to align with real-world application scenarios. All answers in Orca are carefully annotated natural conversational responses, rather than fixed-range or pre-specified phrases, which imposes dual requirements on the model: both comprehension ability and generation ability. This dataset is suitable for evaluating a model's adaptability to new knowledge and questions, as well as its cross-domain generalization ability.

提供机构：

香港科技大学（广州），香港科技大学

创建时间：

2023-02-27

5,000+

优质数据集

54 个

任务类型

进入经典数据集