five

JINIAC/JFLD

收藏
Hugging Face2024-05-27 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/JINIAC/JFLD
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: - config_name: D1 features: - name: version dtype: string - name: hypothesis dtype: string - name: hypothesis_formula dtype: string - name: facts dtype: string - name: facts_formula dtype: string - name: proofs sequence: string - name: proofs_formula sequence: string - name: negative_hypothesis dtype: string - name: negative_hypothesis_formula dtype: string - name: negative_proofs sequence: string - name: negative_original_tree_depth dtype: int64 - name: original_tree_depth dtype: int64 - name: depth dtype: int64 - name: num_formula_distractors dtype: int64 - name: num_translation_distractors dtype: int64 - name: num_all_distractors dtype: int64 - name: proof_label dtype: string - name: negative_proof_label dtype: string - name: world_assump_label dtype: string - name: negative_world_assump_label dtype: string - name: prompt_serial dtype: string - name: proof_serial dtype: string - name: instruction dtype: string - name: input dtype: string - name: response dtype: string splits: - name: train num_bytes: 108112082 num_examples: 20143 - name: validation num_bytes: 18040312 num_examples: 3374 - name: test num_bytes: 17777700 num_examples: 3323 download_size: 50969460 dataset_size: 143930094 - config_name: D1_minus features: - name: version dtype: string - name: hypothesis dtype: string - name: hypothesis_formula dtype: string - name: facts dtype: string - name: facts_formula dtype: string - name: proofs sequence: string - name: proofs_formula sequence: string - name: negative_hypothesis dtype: 'null' - name: negative_hypothesis_formula dtype: 'null' - name: negative_proofs sequence: 'null' - name: negative_original_tree_depth dtype: 'null' - name: original_tree_depth dtype: int64 - name: depth dtype: int64 - name: num_formula_distractors dtype: int64 - name: num_translation_distractors dtype: int64 - name: num_all_distractors dtype: int64 - name: proof_label dtype: string - name: negative_proof_label dtype: 'null' - name: world_assump_label dtype: string - name: negative_world_assump_label dtype: 'null' - name: prompt_serial dtype: string - name: proof_serial dtype: string - name: instruction dtype: string - name: input dtype: string - name: response dtype: string splits: - name: train num_bytes: 23888799 num_examples: 20082 - name: validation num_bytes: 4007250 num_examples: 3349 - name: test num_bytes: 4032734 num_examples: 3374 download_size: 8783646 dataset_size: 31928783 - config_name: D3 features: - name: version dtype: string - name: hypothesis dtype: string - name: hypothesis_formula dtype: string - name: facts dtype: string - name: facts_formula dtype: string - name: proofs sequence: string - name: proofs_formula sequence: string - name: negative_hypothesis dtype: string - name: negative_hypothesis_formula dtype: string - name: negative_proofs sequence: string - name: negative_original_tree_depth dtype: int64 - name: original_tree_depth dtype: int64 - name: depth dtype: int64 - name: num_formula_distractors dtype: int64 - name: num_translation_distractors dtype: int64 - name: num_all_distractors dtype: int64 - name: proof_label dtype: string - name: negative_proof_label dtype: string - name: world_assump_label dtype: string - name: negative_world_assump_label dtype: string - name: prompt_serial dtype: string - name: proof_serial dtype: string - name: instruction dtype: string - name: input dtype: string - name: response dtype: string splits: - name: train num_bytes: 126666437 num_examples: 20166 - name: validation num_bytes: 20727548 num_examples: 3340 - name: test num_bytes: 20914914 num_examples: 3310 download_size: 59393588 dataset_size: 168308899 - config_name: D8 features: - name: version dtype: string - name: hypothesis dtype: string - name: hypothesis_formula dtype: string - name: facts dtype: string - name: facts_formula dtype: string - name: proofs sequence: string - name: proofs_formula sequence: string - name: negative_hypothesis dtype: string - name: negative_hypothesis_formula dtype: string - name: negative_proofs sequence: string - name: negative_original_tree_depth dtype: int64 - name: original_tree_depth dtype: int64 - name: depth dtype: int64 - name: num_formula_distractors dtype: int64 - name: num_translation_distractors dtype: int64 - name: num_all_distractors dtype: int64 - name: proof_label dtype: string - name: negative_proof_label dtype: string - name: world_assump_label dtype: string - name: negative_world_assump_label dtype: string - name: prompt_serial dtype: string - name: proof_serial dtype: string - name: instruction dtype: string - name: input dtype: string - name: response dtype: string splits: - name: train num_bytes: 167341010 num_examples: 20190 - name: validation num_bytes: 28183984 num_examples: 3409 - name: test num_bytes: 27682859 num_examples: 3352 download_size: 76744182 dataset_size: 223207853 configs: - config_name: D1 data_files: - split: train path: D1/train-* - split: validation path: D1/validation-* - split: test path: D1/test-* - config_name: D1_minus data_files: - split: train path: D1_minus/train-* - split: validation path: D1_minus/validation-* - split: test path: D1_minus/test-* - config_name: D3 data_files: - split: train path: D3/train-* - split: validation path: D3/validation-* - split: test path: D3/test-* - config_name: D8 data_files: - split: train path: D8/train-* - split: validation path: D8/validation-* - split: test path: D8/test-* license: apache-2.0 --- 以下のデータセットに、学習用のカラム(instruction, input, response)を追加して作成しました。 https://github.com/hitachi-nlp/FLD https://huggingface.co/datasets/hitachi-nlp/JFLD proof_serialが__UNKNOWN__となるデータは含まれません。 ## Citation ``` @inproceedings{morishita2024jfld, title = {JFLD: A Japanese Benchmark for Deductive Reasoning based on Formal Logic}, author = {Morishita, Terufumi and Yamaguchi, Atsuki and Morio, Gaku and Hikaru, Tomonari and Osamu Imaichi and Sogawa, Yasuhiro}, booktitle = {Proceedings of the Joint International Conference on Computational Linguistics, Language Resources and Evaluation}, year = {2024} } @inproceedings{morishita2023fld, title = {Learning Deductive Reasoning from Synthetic Corpus based on Formal Logic}, author = {Morishita, Terufumi and Morio, Gaku and Yamaguchi, Atsuki and Sogawa, Yasuhiro}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, year = {2023} } ```
提供机构:
JINIAC
原始信息汇总

数据集概述

数据集配置 D1

  • 特征:
    • 包含多个字段,如version, hypothesis, facts, proofs等,所有字段的dtype主要为string, int64
    • 特别地,proofsproofs_formula字段为sequence类型。
  • 分割:
    • 训练集: 20143个示例,总大小108112082字节。
    • 验证集: 3374个示例,总大小18040312字节。
    • 测试集: 3323个示例,总大小17777700字节。
  • 下载大小: 50969460字节。
  • 数据集大小: 143930094字节。

数据集配置 D1_minus

  • 特征:
    • 与D1配置类似,但部分字段如negative_hypothesis, negative_proof_label等设置为null
  • 分割:
    • 训练集: 20082个示例,总大小23888799字节。
    • 验证集: 3349个示例,总大小4007250字节。
    • 测试集: 3374个示例,总大小4032734字节。
  • 下载大小: 8783646字节。
  • 数据集大小: 31928783字节。

数据集配置 D3

  • 特征:
    • 与D1配置类似,所有字段均不为null
  • 分割:
    • 训练集: 20166个示例,总大小126666437字节。
    • 验证集: 3340个示例,总大小20727548字节。
    • 测试集: 3310个示例,总大小20914914字节。
  • 下载大小: 59393588字节。
  • 数据集大小: 168308899字节。

数据集配置 D8

  • 特征:
    • 与D1配置类似,所有字段均不为null
  • 分割:
    • 训练集: 20190个示例,总大小167341010字节。
    • 验证集: 3409个示例,总大小28183984字节。
    • 测试集: 3352个示例,总大小27682859字节。
  • 下载大小: 76744182字节。
  • 数据集大小: 223207853字节。

数据集文件路径

  • D1: 分别有训练、验证和测试集,路径格式为D1/split-*
  • D1_minus: 分别有训练、验证和测试集,路径格式为D1_minus/split-*
  • D3: 分别有训练、验证和测试集,路径格式为D3/split-*
  • D8: 分别有训练、验证和测试集,路径格式为D8/split-*

许可证

  • 许可证: Apache-2.0
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作