jhu-clsp/SARA

Name: jhu-clsp/SARA
Creator: jhu-clsp
Published: 2023-06-24 14:13:13
License: 暂无描述

Hugging Face2023-06-24 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/jhu-clsp/SARA

下载链接

链接失效反馈

官方服务：

资源简介：

SARA数据集是一个用于税法中法定推理的数据集，支持问答和自然语言推理任务。数据集包含背景信息、问题、答案、相关事实和测试代码，所有内容均以英文呈现。数据集的结构包括训练集和测试集，可以通过HuggingFace的datasets库加载。

--- language: - 英语 tags: - 法律 - 税务 - 自然语言推理（Natural Language Inference，NLI） - 问答（Question Answering，QA） pretty_name: SARA size_categories: - n<1K --- # 税务法律蕴涵与问答场景下法定推理数据集（SARA）数据集卡片注意：本版本为SARA v1，如需获取SARA v2版本，请访问https://nlp.jhu.edu/law/（即将登陆Hugging Face平台！） ## 数据集说明 - **代码仓库**：[https://nlp.jhu.edu/law/] - **相关论文**：[https://ceur-ws.org/Vol-2645/paper5.pdf] - **联系人邮箱**：nils.holzenberger@telecom-paris.fr ## 数据集摘要若您使用本数据集，请引用我们的相关工作： @inproceedings{Holzenberger2020ADF, title={A Dataset for Statutory Reasoning in Tax Law Entailment and Question Answering}, author={Nils Holzenberger and Andrew Blair-Stanek and Benjamin Van Durme}, booktitle={NLLP@KDD}, year={2020} } ### 支持任务与排行榜本数据集包含两项任务：问答（Question Answering，QA）与自然语言推理（Natural Language Inference，NLI），均设有训练集与测试集。暂无官方排行榜。 ### 语言英语 ## 数据集结构 ### 数据样例以下为一条数据样例： { "id": "s151_a_neg", "text": "Alice's income in 2015 is $100000. She gets one exemption of $2000 for the year 2015 under section 151(c). Alice is not married.", "question": "Alice's total exemption for 2015 under section 151(a) is equal to $6000", "answer": "Contradiction", "facts": ":- discontiguous s151_c/4. :- [statutes/prolog/init]. income_(alice_makes_money). agent_(alice_makes_money,alice). start_(alice_makes_money,"2015-01-01"). end_(alice_makes_money,"2015-12-31"). amount_(alice_makes_money,100000). s151_c(alice,_,2000,2015).", "test": ":- \+ s151_a(alice,6000,2015)." } ### 数据字段 * `id`：数据实例的唯一标识符，标注了案例编号与相关成文法（若适用）。 * `text`：该法律案例的背景详情 * `question`：该实例的问题（或假设语句） * `answer`：问题答案或自然语言推理判定结果（蕴含/矛盾） * `facts`：该案例的相关事实，以Prolog语言编写 * `test`：相关执行代码，以Prolog语言编写 ### 数据拆分可通过以下方式加载数据拆分： from datasets import load_dataset qa_test = load_dataset("jhu-clsp/SARA", "qa", split="test") qa_train = load_dataset("jhu-clsp/SARA", "qa", split="train") nli_test = load_dataset("jhu-clsp/SARA", "nli", split="test") nli_train = load_dataset("jhu-clsp/SARA", "nli", split="train") ## 数据集构建完整构建细节请参阅相关论文：https://ceur-ws.org/Vol-2645/paper5.pdf

提供机构：

jhu-clsp

原始信息汇总

数据集概述

数据集名称: SARA

版本: v1

语言: 英语

标签: 法律, 税务, 自然语言推理, 问答

规模: 小于1000条记录

数据集描述

目的: 用于税务法律中的法定推理、蕴含和问答

联系人: nils.holzenberger@telecom-paris.fr

数据集总结

引用信息:

@inproceedings{Holzenberger2020ADF, title={A Dataset for Statutory Reasoning in Tax Law Entailment and Question Answering}, author={Nils Holzenberger and Andrew Blair-Stanek and Benjamin Van Durme}, booktitle={NLLP@KDD}, year={2020} }

支持的任务和排行榜

任务:

问答
自然语言推理

数据集划分: 包含训练集和测试集，无官方排行榜。

数据集结构

数据实例

示例:

{ "id": "s151_a_neg", "text": "Alices income in 2015 is $100000. She gets one exemption of $2000 for the year 2015 under section 151(c). Alice is not married.", "question": "Alices total exemption for 2015 under section 151(a) is equal to $6000", "answer": "Contradiction", "facts": ":- discontiguous s151_c/4. :- [statutes/prolog/init]. income_(alice_makes_money). agent_(alice_makes_money,alice). start_(alice_makes_money,"2015-01-01"). end_(alice_makes_money,"2015-12-31"). amount_(alice_makes_money,100000). s151_c(alice,_,2000,2015).", "test": ":- + s151_a(alice,6000,2015)." }

数据字段

id: 唯一标识符，指示案件编号和相关法规。
text: 法律案件的背景详情。
question: 问题或假设。
answer: 问题的答案或NLI判断（蕴含/矛盾）。
facts: 案件相关事实，使用Prolog表示。
test: 相关执行代码，使用Prolog表示。

数据划分

数据集划分可通过以下代码访问: python from datasets import load_dataset qa_test = load_dataset("jhu-clsp/SARA", "qa", split="test") qa_train = load_dataset("jhu-clsp/SARA", "qa", split="train") nli_test = load_dataset("jhu-clsp/SARA", "nli", split="test") nli_train = load_dataset("jhu-clsp/SARA", "nli", split="train")

5,000+

优质数据集

54 个

任务类型

进入经典数据集