JINIAC/JFLD
收藏Hugging Face2024-05-27 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/JINIAC/JFLD
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
- config_name: D1
features:
- name: version
dtype: string
- name: hypothesis
dtype: string
- name: hypothesis_formula
dtype: string
- name: facts
dtype: string
- name: facts_formula
dtype: string
- name: proofs
sequence: string
- name: proofs_formula
sequence: string
- name: negative_hypothesis
dtype: string
- name: negative_hypothesis_formula
dtype: string
- name: negative_proofs
sequence: string
- name: negative_original_tree_depth
dtype: int64
- name: original_tree_depth
dtype: int64
- name: depth
dtype: int64
- name: num_formula_distractors
dtype: int64
- name: num_translation_distractors
dtype: int64
- name: num_all_distractors
dtype: int64
- name: proof_label
dtype: string
- name: negative_proof_label
dtype: string
- name: world_assump_label
dtype: string
- name: negative_world_assump_label
dtype: string
- name: prompt_serial
dtype: string
- name: proof_serial
dtype: string
- name: instruction
dtype: string
- name: input
dtype: string
- name: response
dtype: string
splits:
- name: train
num_bytes: 108112082
num_examples: 20143
- name: validation
num_bytes: 18040312
num_examples: 3374
- name: test
num_bytes: 17777700
num_examples: 3323
download_size: 50969460
dataset_size: 143930094
- config_name: D1_minus
features:
- name: version
dtype: string
- name: hypothesis
dtype: string
- name: hypothesis_formula
dtype: string
- name: facts
dtype: string
- name: facts_formula
dtype: string
- name: proofs
sequence: string
- name: proofs_formula
sequence: string
- name: negative_hypothesis
dtype: 'null'
- name: negative_hypothesis_formula
dtype: 'null'
- name: negative_proofs
sequence: 'null'
- name: negative_original_tree_depth
dtype: 'null'
- name: original_tree_depth
dtype: int64
- name: depth
dtype: int64
- name: num_formula_distractors
dtype: int64
- name: num_translation_distractors
dtype: int64
- name: num_all_distractors
dtype: int64
- name: proof_label
dtype: string
- name: negative_proof_label
dtype: 'null'
- name: world_assump_label
dtype: string
- name: negative_world_assump_label
dtype: 'null'
- name: prompt_serial
dtype: string
- name: proof_serial
dtype: string
- name: instruction
dtype: string
- name: input
dtype: string
- name: response
dtype: string
splits:
- name: train
num_bytes: 23888799
num_examples: 20082
- name: validation
num_bytes: 4007250
num_examples: 3349
- name: test
num_bytes: 4032734
num_examples: 3374
download_size: 8783646
dataset_size: 31928783
- config_name: D3
features:
- name: version
dtype: string
- name: hypothesis
dtype: string
- name: hypothesis_formula
dtype: string
- name: facts
dtype: string
- name: facts_formula
dtype: string
- name: proofs
sequence: string
- name: proofs_formula
sequence: string
- name: negative_hypothesis
dtype: string
- name: negative_hypothesis_formula
dtype: string
- name: negative_proofs
sequence: string
- name: negative_original_tree_depth
dtype: int64
- name: original_tree_depth
dtype: int64
- name: depth
dtype: int64
- name: num_formula_distractors
dtype: int64
- name: num_translation_distractors
dtype: int64
- name: num_all_distractors
dtype: int64
- name: proof_label
dtype: string
- name: negative_proof_label
dtype: string
- name: world_assump_label
dtype: string
- name: negative_world_assump_label
dtype: string
- name: prompt_serial
dtype: string
- name: proof_serial
dtype: string
- name: instruction
dtype: string
- name: input
dtype: string
- name: response
dtype: string
splits:
- name: train
num_bytes: 126666437
num_examples: 20166
- name: validation
num_bytes: 20727548
num_examples: 3340
- name: test
num_bytes: 20914914
num_examples: 3310
download_size: 59393588
dataset_size: 168308899
- config_name: D8
features:
- name: version
dtype: string
- name: hypothesis
dtype: string
- name: hypothesis_formula
dtype: string
- name: facts
dtype: string
- name: facts_formula
dtype: string
- name: proofs
sequence: string
- name: proofs_formula
sequence: string
- name: negative_hypothesis
dtype: string
- name: negative_hypothesis_formula
dtype: string
- name: negative_proofs
sequence: string
- name: negative_original_tree_depth
dtype: int64
- name: original_tree_depth
dtype: int64
- name: depth
dtype: int64
- name: num_formula_distractors
dtype: int64
- name: num_translation_distractors
dtype: int64
- name: num_all_distractors
dtype: int64
- name: proof_label
dtype: string
- name: negative_proof_label
dtype: string
- name: world_assump_label
dtype: string
- name: negative_world_assump_label
dtype: string
- name: prompt_serial
dtype: string
- name: proof_serial
dtype: string
- name: instruction
dtype: string
- name: input
dtype: string
- name: response
dtype: string
splits:
- name: train
num_bytes: 167341010
num_examples: 20190
- name: validation
num_bytes: 28183984
num_examples: 3409
- name: test
num_bytes: 27682859
num_examples: 3352
download_size: 76744182
dataset_size: 223207853
configs:
- config_name: D1
data_files:
- split: train
path: D1/train-*
- split: validation
path: D1/validation-*
- split: test
path: D1/test-*
- config_name: D1_minus
data_files:
- split: train
path: D1_minus/train-*
- split: validation
path: D1_minus/validation-*
- split: test
path: D1_minus/test-*
- config_name: D3
data_files:
- split: train
path: D3/train-*
- split: validation
path: D3/validation-*
- split: test
path: D3/test-*
- config_name: D8
data_files:
- split: train
path: D8/train-*
- split: validation
path: D8/validation-*
- split: test
path: D8/test-*
license: apache-2.0
---
以下のデータセットに、学習用のカラム(instruction, input, response)を追加して作成しました。
https://github.com/hitachi-nlp/FLD
https://huggingface.co/datasets/hitachi-nlp/JFLD
proof_serialが__UNKNOWN__となるデータは含まれません。
## Citation
```
@inproceedings{morishita2024jfld,
title = {JFLD: A Japanese Benchmark for Deductive Reasoning based on Formal Logic},
author = {Morishita, Terufumi and Yamaguchi, Atsuki and Morio, Gaku and Hikaru, Tomonari and Osamu Imaichi and Sogawa, Yasuhiro},
booktitle = {Proceedings of the Joint International Conference on Computational Linguistics, Language Resources and Evaluation},
year = {2024}
}
@inproceedings{morishita2023fld,
title = {Learning Deductive Reasoning from Synthetic Corpus based on Formal Logic},
author = {Morishita, Terufumi and Morio, Gaku and Yamaguchi, Atsuki and Sogawa, Yasuhiro},
booktitle = {Proceedings of the 40th International Conference on Machine Learning},
year = {2023}
}
```
提供机构:
JINIAC
原始信息汇总
数据集概述
数据集配置 D1
- 特征:
- 包含多个字段,如
version,hypothesis,facts,proofs等,所有字段的dtype主要为string,int64。 - 特别地,
proofs和proofs_formula字段为sequence类型。
- 包含多个字段,如
- 分割:
- 训练集: 20143个示例,总大小108112082字节。
- 验证集: 3374个示例,总大小18040312字节。
- 测试集: 3323个示例,总大小17777700字节。
- 下载大小: 50969460字节。
- 数据集大小: 143930094字节。
数据集配置 D1_minus
- 特征:
- 与D1配置类似,但部分字段如
negative_hypothesis,negative_proof_label等设置为null。
- 与D1配置类似,但部分字段如
- 分割:
- 训练集: 20082个示例,总大小23888799字节。
- 验证集: 3349个示例,总大小4007250字节。
- 测试集: 3374个示例,总大小4032734字节。
- 下载大小: 8783646字节。
- 数据集大小: 31928783字节。
数据集配置 D3
- 特征:
- 与D1配置类似,所有字段均不为
null。
- 与D1配置类似,所有字段均不为
- 分割:
- 训练集: 20166个示例,总大小126666437字节。
- 验证集: 3340个示例,总大小20727548字节。
- 测试集: 3310个示例,总大小20914914字节。
- 下载大小: 59393588字节。
- 数据集大小: 168308899字节。
数据集配置 D8
- 特征:
- 与D1配置类似,所有字段均不为
null。
- 与D1配置类似,所有字段均不为
- 分割:
- 训练集: 20190个示例,总大小167341010字节。
- 验证集: 3409个示例,总大小28183984字节。
- 测试集: 3352个示例,总大小27682859字节。
- 下载大小: 76744182字节。
- 数据集大小: 223207853字节。
数据集文件路径
- D1: 分别有训练、验证和测试集,路径格式为
D1/split-*。 - D1_minus: 分别有训练、验证和测试集,路径格式为
D1_minus/split-*。 - D3: 分别有训练、验证和测试集,路径格式为
D3/split-*。 - D8: 分别有训练、验证和测试集,路径格式为
D8/split-*。
许可证
- 许可证: Apache-2.0



