five

vitercik-lab/DSR-Bench-natural

收藏
Hugging Face2025-05-16 更新2025-11-01 收录
下载链接:
https://hf-mirror.com/datasets/vitercik-lab/DSR-Bench-natural
下载链接
链接失效反馈
官方服务:
资源简介:
DSR-Bench-natural数据集将数据结构(队列、二叉搜索树和图)转换为基于叙述的自然语言问题。该数据集不再使用如“给定一个空队列,进行以下操作:入队2,出队,入队3...当前队列是什么?”这样的教科书/面试风格的正式提示来评估LLM,而是使用需要隐式使用和维护数据结构的现实场景,例如“在一个阳光明媚的下午,一辆冰淇淋卡车开进了公园。孩子们排队站在队伍的末端,而卖家从前面服务。李彩儿跑过来加入了队伍。下一个孩子正在被服务...剩下的孩子排队顺序是什么?”。这种扩展使我们能够检查LLM是否能够在现实世界场景中将结构推理推广到正式任务描述之外,这对于LLM作为实际应用的助手部署是一个关键能力,并突出了未来研究的重要方向。它还测试了LLM处理模糊性和混淆信息的能力,我们在问题中特别设计和包含了这些信息(例如,“A和B都看到了冰淇淋卡车。只有A加入了队伍,因为B没有钱”,这里的B是混淆的名字)。

DSR-Bench-natural datasets convert data structures (queue, binary search tree, and graphs) into narrative-based natural language questions. LLMs are no longer evaluated on textbook/interview question style formal prompts like Given an empty queue. Do the following operations: (enqueue 2), (dequeue), (enqueue 3)... What is the current queue?, but on real-world scenarios that implicitly require the usage and maintenance of a data structure, like On a sunny afternoon, an ice cream truck rolled into the park. Each child takes their place at the end of the line while the vendor serves from the front. Leila Choi ran over and joined the line. The next kid is being served... What is the order of the remaining kids in line? This extension allows us to examine whether LLMs can generalize structural reasoning beyond formal task descriptions with real-world scenarios, which is a critical ability for LLM deployment as assistants in practical applications and highlights an important future research direction. It also tests LLMs capacity to reason with ambiguity and confounding information, which we specifically designed and included in the questions (e.g., A and B both saw the ice cream truck. Only A joined the line because B has no money, where B is a confounding name).
提供机构:
vitercik-lab
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作