ExploreToM

Name: ExploreToM
Creator: maas
Published: 2025-12-04 16:19:18
License: 暂无描述

魔搭社区2025-12-04 更新2024-12-21 收录

下载链接：

https://modelscope.cn/datasets/AI-ModelScope/ExploreToM

下载链接

链接失效反馈

官方服务：

资源简介：

# Data sample for *ExploreToM: Program-guided adversarial data generation for theory of mind reasoning* ExploreToM is the first framework to allow **large-scale generation of diverse and challenging theory of mind data for robust training and evaluation**. Our approach leverages an A* search over a custom domain-specific language to produce complex story structures and novel, diverse, yet plausible scenarios to stress test the limits of LLMs. Our A* search procedure aims to find particularly difficult stories for a given model. Here we present a data sample generated adversarially for [Llama-3.1-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct). We generated 10 story structures across the 18 settings presented in the paper using a budget of 50 nodes per story structure. We then infill the story structures as described in the paper. A large sample of the resulting data is presented here. **If your goal is to test a model, we highly recommend running the algorithm using your specific model as ExploreToM works by finding stories adversarially towards a given model.** If this were unfeasible, our experiments show that ExploreToM-generated data using Llama-3.1-70B-Instruct is still challenging for testing other frontier models but please **DO NOT USE THIS DATA AS THE CANONICAL TEST SET FOR EXPLORETOM**. **If your goal is to use ExploreToM as training data, feel free to generate even more data!** You can adjust the A* search function and action sets allowed depending on your needs, or even completely disable the A* search and overgenerate. ## Clarifications on data fields - qprop -> question-related property - sprop -> story-related property - param -> search parameter (e.g. number of people involved) `qprop=non_unique_mental_state` is a synonym for checking if a question is interesting. If the question is not theory of mind-related (that is, if `nth_order=-1`, which corresponds to memory or factual questions) then `qprop=non_unique_mental_state=True` by default. ## Code Code to generate data and analyses is available at: https://github.com/facebookresearch/ExploreToM ## Citation If you found [the paper](https://openreview.net/forum?id=246rHKUnnf) or data helpful, consider citing it: ``` @inproceedings{sclarexplore, title={Explore Theory of Mind: program-guided adversarial data generation for theory of mind reasoning}, author={Sclar, Melanie and Yu, Jane and Fazel-Zarandi, Maryam and Tsvetkov, Yulia and Bisk, Yonatan and Choi, Yejin and Celikyilmaz, Asli}, booktitle={The Thirteenth International Conference on Learning Representations} } ``` For questions, please reach out to the first author of the paper.

# *ExploreToM：面向心理理论推理的程序引导式对抗数据生成* 数据集样本 ExploreToM是首个能够**大规模生成多样化且富有挑战性的心理理论（Theory of Mind）数据，用于鲁棒性训练与评估**的框架。我们的方法通过在自定义领域特定语言上执行A*搜索，以生成复杂的故事结构，以及新颖、多样且合理的场景，以此来测试大语言模型（Large Language Model）的极限。我们的A*搜索流程旨在为特定模型寻找难度较高的故事。本次展示的是针对[Llama-3.1-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct)生成的对抗性数据样本。我们按照论文中提出的18种设定生成了10种故事结构，每个故事结构的搜索节点预算为50。随后我们按照论文所述的方式补全故事结构。本文中展示了大量生成得到的样本数据。 **若您的目标是测试模型，我们强烈建议使用您的特定模型运行该算法，因为ExploreToM的工作原理是针对给定模型生成对抗性故事。** 若此操作不可行，我们的实验表明，使用Llama-3.1-70B-Instruct生成的ExploreToM数据，对于测试其他前沿大语言模型仍具有挑战性，但请**切勿将该数据集用作ExploreToM的标准测试集**。 **若您的目标是将ExploreToM用作训练数据，可自由生成更多数据！** 您可以根据需求调整A*搜索函数与允许的动作集，甚至可以完全禁用A*搜索并进行超额生成。 ## 数据字段说明 - qprop -> 与问题相关的属性（question-related property） - sprop -> 与故事相关的属性（story-related property） - param -> 搜索参数（例如涉及的人物数量） `qprop=non_unique_mental_state` 可作为判断问题是否具有趣味性的同义表述。若问题与心理理论无关（即`nth_order=-1`，对应记忆或事实性问题），则默认`qprop=non_unique_mental_state=True`。 ## 代码用于生成数据与执行分析的代码已开源：https://github.com/facebookresearch/ExploreToM ## 引用若您认为[本文](https://openreview.net/forum?id=246rHKUnnf)或本数据集有所帮助，请引用如下文献： @inproceedings{sclarexplore, title={ExploreToM：面向心理理论推理的程序引导式对抗数据生成}, author={Sclar, Melanie and Yu, Jane and Fazel-Zarandi, Maryam and Tsvetkov, Yulia and Bisk, Yonatan and Choi, Yejin and Celikyilmaz, Asli}, booktitle={第十三届国际学习表征会议} } 如有任何疑问，请联系论文的第一作者。

提供机构：

maas

创建时间：

2024-12-19

5,000+

优质数据集

54 个

任务类型

进入经典数据集