ExploreToM
收藏魔搭社区2025-12-04 更新2024-12-21 收录
下载链接:
https://modelscope.cn/datasets/AI-ModelScope/ExploreToM
下载链接
链接失效反馈官方服务:
资源简介:
# Data sample for *ExploreToM: Program-guided adversarial data generation for theory of mind reasoning*
ExploreToM is the first framework to allow **large-scale generation of diverse and challenging theory of mind data for robust training and evaluation**.
Our approach leverages an A* search over a custom domain-specific language to produce complex story structures and novel, diverse, yet plausible scenarios to stress test the limits of LLMs.
Our A* search procedure aims to find particularly difficult stories for a given model. Here we present a data sample generated adversarially for [Llama-3.1-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct). We generated 10 story structures across the 18 settings presented in the paper using a budget of 50 nodes per story structure. We then infill the story structures as described in the paper. A large sample of the resulting data is presented here.
**If your goal is to test a model, we highly recommend running the algorithm using your specific model as ExploreToM works by finding stories adversarially towards a given model.** If this were unfeasible, our experiments show that ExploreToM-generated data using Llama-3.1-70B-Instruct is still challenging for testing other frontier models but please **DO NOT USE THIS DATA AS THE CANONICAL TEST SET FOR EXPLORETOM**.
**If your goal is to use ExploreToM as training data, feel free to generate even more data!** You can adjust the A* search function and action sets allowed depending on your needs, or even completely disable the A* search and overgenerate.
## Clarifications on data fields
- qprop -> question-related property
- sprop -> story-related property
- param -> search parameter (e.g. number of people involved)
`qprop=non_unique_mental_state` is a synonym for checking if a question is interesting. If the question is not theory of mind-related (that is, if `nth_order=-1`, which corresponds to memory or factual questions) then `qprop=non_unique_mental_state=True` by default.
## Code
Code to generate data and analyses is available at: https://github.com/facebookresearch/ExploreToM
## Citation
If you found [the paper](https://openreview.net/forum?id=246rHKUnnf) or data helpful, consider citing it:
```
@inproceedings{sclarexplore,
title={Explore Theory of Mind: program-guided adversarial data generation for theory of mind reasoning},
author={Sclar, Melanie and Yu, Jane and Fazel-Zarandi, Maryam and Tsvetkov, Yulia and Bisk, Yonatan and Choi, Yejin and Celikyilmaz, Asli},
booktitle={The Thirteenth International Conference on Learning Representations}
}
```
For questions, please reach out to the first author of the paper.
# *ExploreToM:面向心理理论推理的程序引导式对抗数据生成* 数据集样本
ExploreToM是首个能够**大规模生成多样化且富有挑战性的心理理论(Theory of Mind)数据,用于鲁棒性训练与评估**的框架。我们的方法通过在自定义领域特定语言上执行A*搜索,以生成复杂的故事结构,以及新颖、多样且合理的场景,以此来测试大语言模型(Large Language Model)的极限。
我们的A*搜索流程旨在为特定模型寻找难度较高的故事。本次展示的是针对[Llama-3.1-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct)生成的对抗性数据样本。我们按照论文中提出的18种设定生成了10种故事结构,每个故事结构的搜索节点预算为50。随后我们按照论文所述的方式补全故事结构。本文中展示了大量生成得到的样本数据。
**若您的目标是测试模型,我们强烈建议使用您的特定模型运行该算法,因为ExploreToM的工作原理是针对给定模型生成对抗性故事。** 若此操作不可行,我们的实验表明,使用Llama-3.1-70B-Instruct生成的ExploreToM数据,对于测试其他前沿大语言模型仍具有挑战性,但请**切勿将该数据集用作ExploreToM的标准测试集**。
**若您的目标是将ExploreToM用作训练数据,可自由生成更多数据!** 您可以根据需求调整A*搜索函数与允许的动作集,甚至可以完全禁用A*搜索并进行超额生成。
## 数据字段说明
- qprop -> 与问题相关的属性(question-related property)
- sprop -> 与故事相关的属性(story-related property)
- param -> 搜索参数(例如涉及的人物数量)
`qprop=non_unique_mental_state` 可作为判断问题是否具有趣味性的同义表述。若问题与心理理论无关(即`nth_order=-1`,对应记忆或事实性问题),则默认`qprop=non_unique_mental_state=True`。
## 代码
用于生成数据与执行分析的代码已开源:https://github.com/facebookresearch/ExploreToM
## 引用
若您认为[本文](https://openreview.net/forum?id=246rHKUnnf)或本数据集有所帮助,请引用如下文献:
@inproceedings{sclarexplore,
title={ExploreToM:面向心理理论推理的程序引导式对抗数据生成},
author={Sclar, Melanie and Yu, Jane and Fazel-Zarandi, Maryam and Tsvetkov, Yulia and Bisk, Yonatan and Choi, Yejin and Celikyilmaz, Asli},
booktitle={第十三届国际学习表征会议}
}
如有任何疑问,请联系论文的第一作者。
提供机构:
maas
创建时间:
2024-12-19



