Social Tasks in Sandbox Simulation (STSS)
收藏arXiv2024-04-08 更新2024-08-06 收录
下载链接:
http://arxiv.org/abs/2404.05337v1
下载链接
链接失效反馈官方服务:
资源简介:
STSS数据集由清华大学计算机科学与技术系开发,旨在通过沙盒模拟评估语言代理的社会智能。该数据集包含30个社会任务模板,涵盖5个类别,用于客观评估语言代理在多代理模拟中的目标达成情况。数据集创建过程中,通过模拟轨迹和自动后分析来量化任务成功度。STSS数据集的应用领域包括语言模型和代理架构的评估,旨在解决现有社会智能评估中主观性和语言层面评估的不足。
The STSS dataset was developed by the Department of Computer Science and Technology, Tsinghua University. It is designed to evaluate the social intelligence of language agents via sandbox simulations. This dataset contains 30 social task templates across 5 categories, which are utilized to objectively assess the goal achievement of language agents in multi-agent simulation scenarios. During the dataset construction process, task success rates are quantified based on simulation trajectories and automated post-analysis. The application scope of the STSS dataset includes the evaluation of language models and agent architectures, aiming to address the limitations of subjectivity and linguistic-level evaluation in existing social intelligence assessment studies.
提供机构:
清华大学计算机科学与技术系
创建时间:
2024-04-08



