SimpleToM

Name: SimpleToM
Creator: maas
Published: 2025-08-08 16:30:28
License: 暂无描述

魔搭社区2025-08-08 更新2025-05-31 收录

下载链接：

https://modelscope.cn/datasets/allenai/SimpleToM

下载链接

链接失效反馈

官方服务：

资源简介：

# SimpleToM Dataset and Evaluation data The SimpleToM dataset of stories with associated questions are described in the paper ["SimpleToM: Exposing the Gap between Explicit ToM Inference and Implicit ToM Application in LLMs"](https://arxiv.org/abs/2410.13648) Associated evaluation data for the models analyzed in the paper can be found in the separate dataset: [coming soon](). ## Question sets There are three question sets in the SimpleToM dataset: * `mental-state-qa` questions about information awareness of character in the story * `behavior-qa` questions about likely future behavior of character in the story * `judgment-qa` questions about reasonableness of character's behavior The questions follow a standard multiple-choice QA format, for instance: ```json { "id":"gen1169_sev3_aware", "story":"Mike replaced the Oreo cookies in the package with dog treats that look similar to Oreos. Mike's friend spots the Oreo package on the kitchen table and reaches for it.", "question":"Is Mike's friend likely to be aware that \"Mike replaced the Oreo cookies in the package with dog treats that look similar to Oreos.\"?", "scenario_name":"inside_reuse_labeled_containers", "choices":{"text":["Yes","No"],"label":["A","B"]}, "answerKey":"B" } ``` ## Story data The `story-data/test.jsonl` file contains more detailed metadata about each story, including the original story prompt, the entities involved, the severity rating, and the predicted harmfulness and unethicality ratings. Example story data with select comments: ```json { "story_id":"gen1169_sev3", "core_id":"gen1169", "scenario_name":"inside_reuse_labeled_containers", "story_prompt_example":"gen168", // Story used as prompt "entities":{"Person X":"Mike's friend","Person Y":"Mike","Object Z":"Oreo cookie package"}, "severity_original":3, // Original severity rating "harmfulness_predicted":[ // Harmfulness predictions {"harmfulness":3,"predictor_model":"claude-3-5-sonnet-20240620"}, {"harmfulness":3,"predictor_model":"gpt-4o-2024-05-13"} ], "unethicality_predicted":[ // unethicality predictions {"unethicality":3,"predictor_model":"claude-3-5-sonnet-20240620"}, {"unethicality":3,"predictor_model":"gpt-4o-2024-05-13"} ], "story":"Mike replaced the Oreo cookies in the package with dog treats that look similar to Oreos. Mike's friend spots the Oreo package on the kitchen table and reaches for it.", "key_information":"Mike replaced the Oreo cookies in the package with dog treats that look similar to Oreos.", "story_sentence":"Mike's friend spots the Oreo package on the kitchen table and reaches for it.", "story_generator_model":"claude-3-5-sonnet-20240620", // Model used to generate story "questions":[ {"question_id":"gen1169_sev3_aware", "question":"Is Mike's friend likely to be aware that \"Mike replaced the Oreo cookies in the package with dog treats that look similar to Oreos.\"?", "answer_correct":"No","answer_incorrect":"Yes" }, {"question_id":"gen1169_sev3_action", "question":"What will Mike's friend likely do next?", "answer_correct":"Open the package and eat one of the cookies.", "answer_incorrect":"Express shock and concern about the potential health risks of eating dog treats." } ] } ``` # Citation ``` @misc{gu2024simpletomexposinggapexplicit, title={SimpleToM: Exposing the Gap between Explicit ToM Inference and Implicit ToM Application in LLMs}, author={Yuling Gu and Oyvind Tafjord and Hyunwoo Kim and Jared Moore and Ronan Le Bras and Peter Clark and Yejin Choi}, year={2024}, eprint={2410.13648}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2410.13648}, } ```

# SimpleToM数据集与评测数据包含关联问题的SimpleToM故事数据集已在论文《SimpleToM：揭示大语言模型中显式心理理论(Theory of Mind，简称ToM)推理与隐式心理理论应用之间的差距》(https://arxiv.org/abs/2410.13648)中进行了详细阐述。本文所分析模型的配套评测数据可于独立数据集[即将上线]()中获取。 ## 问题集 SimpleToM数据集中共包含三类问题集： * `mental-state-qa`：针对故事中角色的信息知晓情况设置的问题 * `behavior-qa`：针对故事中角色的未来可能行为设置的问题 * `judgment-qa`：针对角色行为合理性进行评判的问题所有问题均采用标准多项选择题问答格式，示例如下： json { "id":"gen1169_sev3_aware", "story":"Mike replaced the Oreo cookies in the package with dog treats that look similar to Oreos. Mike's friend spots the Oreo package on the kitchen table and reaches for it.", "question":"Is Mike's friend likely to be aware that "Mike replaced the Oreo cookies in the package with dog treats that look similar to Oreos."?", "scenario_name":"inside_reuse_labeled_containers", "choices":{"text":["Yes","No"],"label":["A","B"]}, "answerKey":"B" } ## 故事数据 `story-data/test.jsonl` 文件收录了每一则故事的详细元数据，涵盖原始故事提示、涉及的实体、严重程度评级，以及预测得到的危害性与不道德性评级。带注释的示例故事数据如下： json { "story_id":"gen1169_sev3", "core_id":"gen1169", "scenario_name":"inside_reuse_labeled_containers", "story_prompt_example":"gen168", // Story used as prompt "entities":{"Person X":"Mike's friend","Person Y":"Mike","Object Z":"Oreo cookie package"}, "severity_original":3, // Original severity rating "harmfulness_predicted":[ {"harmfulness":3,"predictor_model":"claude-3-5-sonnet-20240620"}, {"harmfulness":3,"predictor_model":"gpt-4o-2024-05-13"} ], "unethicality_predicted":[ {"unethicality":3,"predictor_model":"claude-3-5-sonnet-20240620"}, {"unethicality":3,"predictor_model":"gpt-4o-2024-05-13"} ], "story":"Mike replaced the Oreo cookies in the package with dog treats that look similar to Oreos. Mike's friend spots the Oreo package on the kitchen table and reaches for it.", "key_information":"Mike replaced the Oreo cookies in the package with dog treats that look similar to Oreos.", "story_sentence":"Mike's friend spots the Oreo package on the kitchen table and reaches for it.", "story_generator_model":"claude-3-5-sonnet-20240620", // Model used to generate story "questions":[ {"question_id":"gen1169_sev3_aware", "question":"Is Mike's friend likely to be aware that "Mike replaced the Oreo cookies in the package with dog treats that look similar to Oreos."?", "answer_correct":"No","answer_incorrect":"Yes" }, {"question_id":"gen1169_sev3_action", "question":"What will Mike's friend likely do next?", "answer_correct":"Open the package and eat one of the cookies.", "answer_incorrect":"Express shock and concern about the potential health risks of eating dog treats." } ] } # 引用 @misc{gu2024simpletomexposinggapexplicit, title={SimpleToM：揭示大语言模型中显式心理理论推理与隐式心理理论应用之间的差距}, author={Yuling Gu and Oyvind Tafjord and Hyunwoo Kim and Jared Moore and Ronan Le Bras and Peter Clark and Yejin Choi}, year={2024}, eprint={2410.13648}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2410.13648}, }

提供机构：

maas

创建时间：

2025-05-28

5,000+

优质数据集

54 个

任务类型

进入经典数据集