TwoHopFact, SOCRATES
收藏数据集概述
数据集来源
- 数据集来源于以下两篇论文:
- Sohee Yang, Elena Gribovskaya, Nora Kassner, Mor Geva*, Sebastian Riedel*. Do Large Language Models Latently Perform Multi-Hop Reasoning?. In ACL 2024.
- Sohee Yang, Nora Kassner, Elena Gribovskaya, Sebastian Riedel*, Mor Geva*. Do Large Language Models Perform Latent Multi-Hop Reasoning without Exploiting Shortcuts?. arXiv, 2024.
数据集目录
- 数据集位于
datasets目录下。
数据集详情
TwoHopFact
- 引入论文: Do Large Language Models Latently Perform Multi-Hop Reasoning?
- 描述: 包含45,595对一跳和两跳事实提示,涵盖52种事实组合类型,分布均衡,旨在探究潜在多跳推理的内部机制。
- 文件路径:
datasets/TwoHopFact.csv(91MB) - HuggingFace数据集: soheeyang/TwoHopFact
SOCRATES (ShOrtCut-fRee lATent rEaSoning)
- 引入论文: Do Large Language Models Perform Latent Multi-Hop Reasoning without Exploiting Shortcuts?
- 描述: 包含7,232对一跳和两跳事实提示,涵盖17种事实组合类型,旨在评估大语言模型的潜在多跳推理能力,同时最小化捷径风险。
- 文件路径:
datasets/SOCRATES_v1.csv(14MB): 清理后的版本,不包含语法错误。datasets/SOCRATES_v0.csv(14MB): 论文中使用的版本,包含少量语法错误。
- HuggingFace数据集: soheeyang/SOCRATES
代码使用
潜在多跳推理路径检查
bash python inspect_latent_reasoning.py --model_name_or_path $MODEL_NAME_OR_PATH --input_csv_path datasets/TwoHopFact.csv --rq1_batch_size 256 --rq2_batch_size 8 --completion_batch_size 64 --hf_token $HF_TOKEN --run_rq1 --run_rq2 --run_appositive --run_cot --run_completion
无捷径评估
bash python evaluate_latent_reasoning.py --model_name_or_path $MODEL_NAME_OR_PATH --input_csv_path datasets/SOCRATES.csv --tensor_parallel_size 2 --batch_size 256 --hf_token $HF_TOKEN
Patchscopes分析
bash python run_patchscopes.py --model_name_or_path $MODEL_NAME_OR_PATH --input_csv_path datasets/SOCRATES.csv --batch_size 64 --source_layer_idxs 1,2 --target_layer_idxs 30,31 --hf_token $HF_TOKEN --run_evaluation --run_patchscopes_evaluation
代码结构
datasets: 包含两篇论文中引入的数据集。TwoHopFact.csvSOCRATES.csv
src: 包含核心功能代码。data_utils.py,model_utils.py,tokenization_utils.py: 包含两篇论文中使用的通用代码。inspection_utils.py: 包含Do Large Language Models Latently Perform Multi-Hop Reasoning?中使用的代码。evaluation_utils.py: 包含Do Large Language Models Perform Latent Multi-Hop Reasoning without Exploiting Shortcuts?中使用的代码。patchscopes_utils.py: 包含Do Large Language Models Perform Latent Multi-Hop Reasoning without Exploiting Shortcuts?中Patchscopes分析使用的代码。
results: 实验结果文件存储目录,可通过--output_dir参数设置。
引用
Do Large Language Models Latently Perform Multi-Hop Reasoning?
@inproceedings{ yang2024latentreasoning, title={Do Large Language Models Latently Perform Multi-Hop Reasoning?}, author={Sohee Yang and Elena Gribovskaya and Nora Kassner and Mor Geva and Sebastian Riedel}, booktitle={Association for Computational Linguistics}, year={2024}, url={https://aclanthology.org/2024.acl-long.550} }
Do Large Language Models Perform Latent Multi-Hop Reasoning without Exploiting Shortcuts?
@article{ yang2024shortcutfree, title={Do Large Language Models Perform Latent Multi-Hop Reasoning without Exploiting Shortcuts?}, author={Sohee Yang and Nora Kassner and Elena Gribovskaya and Sebastian Riedel and Mor Geva}, journal={arXiv}, year={2024}, url={https://arxiv.org/abs/2411.16679} }
许可证
- 所有软件均根据Apache License, Version 2.0 (Apache 2.0)许可;
- 所有其他材料均根据Creative Commons Attribution 4.0 International License (CC-BY)许可。




