ibm-research/nc-bench
收藏Hugging Face2026-01-15 更新2026-02-07 收录
下载链接:
https://hf-mirror.com/datasets/ibm-research/nc-bench
下载链接
链接失效反馈官方服务:
资源简介:
自然对话基准(NC-Bench)旨在回答一个问题:生成式人工智能能否像人类一样进行对话?换句话说,该基准开始衡量大型语言模型(LLM)的一般对话能力。它通过测试模型在特定对话动作序列中生成适当类型对话动作的能力来实现这一点。对话动作序列或模式改编自对话科学,特别是对话分析领域的序列组织模型(Schegloff, 2007)和IBM自然对话框架的模式库。数据集包含720个样本,分为三个子集:基础(180个样本)、RAG(180个样本)和复杂请求(360个样本)。基础子集捕获序列管理的基本实践,RAG子集在包含信息段落的情况下测试模型是否能够保持对话模式,复杂请求子集涉及需要从虚构用户那里获取细节的复杂请求。数据集适用于评估大型语言模型的对话能力,但不适合大规模训练。
The Natural Conversation Benchmarks (NC-Bench) aim to answer the question: How well can generative AI converse like humans do? In other words, the benchmarks begin to measure the general conversational competence of large language models (LLMs). They do this by testing models ability to generate an appropriate type of conversational action, or dialogue act, in response to a particular sequence of actions. The sequences of conversational actions, or patterns, are adapted from conversation science, specifically the model of sequence organization in the field of conversation analysis (Schegloff, 2007) and the pattern library of IBM Natural Conversation Framework. The dataset consists of 720 samples: Basic (180), RAG (180), and Complex Request (360). The Basic set captures basic practices of sequence management, the RAG set tests the models ability to maintain conversation patterns with a document context, and the Complex Request set involves complex requests requiring detail elicitation. The dataset is intended for benchmarking the conversational competence of LLMs but is not suited for large-scale training due to its small size.
提供机构:
ibm-research



