sara-episodes
收藏Hugging Face2026-03-15 更新2026-03-20 收录
下载链接:
https://huggingface.co/datasets/farffadet/sara-episodes
下载链接
链接失效反馈官方服务:
资源简介:
sara-episodes 是一个用于法律推理的多轮对话数据集,特别关注美国税法第 26 章 § 7703 条款(联邦所得税中已婚/未婚状态的判定)。该数据集是 SylloGym 强化学习环境的一部分。数据集包含 400 个多轮对话片段,每个片段涉及 1-3 轮交互。在每轮交互中,智能体需要根据呈现的纳税人事实信息回答 'Yes'(视为未婚)或 'No'(视为已婚)。数据特点包括:新事实信息逐步揭示;约 50% 的多轮对话包含'情节转折'(即正确答案与上一轮相反)。数据集统计显示:1轮对话约占 36%,2轮 32%,3轮 32%;约 260 轮(占总轮数的 31%)包含转折;答案分布为 Yes 58%,No 42%。数据集通过纯 Python 的 § 7703 验证器离线生成,包含六种情景链:单身、已婚+配偶在家、已婚+配偶外出、合法分居、和解、分居+反转。
The sara-episodes dataset is a multi-turn dialogue dataset for legal reasoning, with a specific focus on Internal Revenue Code § 7703 of Title 26 of the United States Code (regarding the determination of marital status—married or unmarried—for federal income tax purposes). This dataset is part of the SylloGym reinforcement learning environment. It contains 400 multi-turn dialogue segments, each involving 1 to 3 rounds of interaction. In each round of interaction, the AI Agent is required to respond with 'Yes' (classified as unmarried) or 'No' (classified as married) based on the presented factual information about the taxpayer. Key characteristics of the dataset include: gradual revelation of new factual information; approximately 50% of multi-turn dialogues contain "plot twists", i.e., the correct answer is the opposite of that from the previous round. Dataset statistics indicate that 1-round dialogues make up roughly 36%, 2-round dialogues 32%, and 3-round dialogues 32%; approximately 260 rounds (accounting for 31% of the total number of rounds) include plot twists. The answer distribution is 58% for 'Yes' and 42% for 'No'. The dataset is generated offline via a pure-Python § 7703 validator, and encompasses six scenario chains: single, married with spouse present at home, married with spouse absent, legally separated, reconciled, and separated + reversal.
创建时间:
2026-03-15



