Name: AdaMLLab/indicxnli_repaired
Creator: AdaMLLab
Published: 2026-01-22 11:20:25
License: 暂无描述

下载链接：

https://hf-mirror.com/datasets/AdaMLLab/indicxnli_repaired

下载链接

链接失效反馈

官方服务：

资源简介：

--- dataset_info: - config_name: as features: - name: premise dtype: string - name: hypothesis dtype: string - name: label dtype: int64 splits: - name: train num_examples: 392702 - name: validation num_examples: 2490 - name: test num_examples: 5010 - config_name: bn features: - name: premise dtype: string - name: hypothesis dtype: string - name: label dtype: int64 splits: - name: train num_examples: 392702 - name: validation num_examples: 2490 - name: test num_examples: 5010 - config_name: gu features: - name: premise dtype: string - name: hypothesis dtype: string - name: label dtype: int64 splits: - name: train num_examples: 392702 - name: validation num_examples: 2490 - name: test num_examples: 5010 - config_name: hi features: - name: premise dtype: string - name: hypothesis dtype: string - name: label dtype: int64 splits: - name: train num_examples: 392702 - name: validation num_examples: 2490 - name: test num_examples: 5010 - config_name: kn features: - name: premise dtype: string - name: hypothesis dtype: string - name: label dtype: int64 splits: - name: train num_examples: 392702 - name: validation num_examples: 2490 - name: test num_examples: 5010 - config_name: ml features: - name: premise dtype: string - name: hypothesis dtype: string - name: label dtype: int64 splits: - name: train num_examples: 392702 - name: validation num_examples: 2490 - name: test num_examples: 5010 - config_name: mr features: - name: premise dtype: string - name: hypothesis dtype: string - name: label dtype: int64 splits: - name: train num_examples: 392702 - name: validation num_examples: 2490 - name: test num_examples: 5010 - config_name: or features: - name: premise dtype: string - name: hypothesis dtype: string - name: label dtype: int64 splits: - name: train num_examples: 392702 - name: validation num_examples: 2490 - name: test num_examples: 5010 - config_name: pa features: - name: premise dtype: string - name: hypothesis dtype: string - name: label dtype: int64 splits: - name: train num_examples: 392702 - name: validation num_examples: 2490 - name: test num_examples: 5010 - config_name: ta features: - name: premise dtype: string - name: hypothesis dtype: string - name: label dtype: int64 splits: - name: train num_examples: 392702 - name: validation num_examples: 3238 - name: test num_examples: 5010 - config_name: te features: - name: premise dtype: string - name: hypothesis dtype: string - name: label dtype: int64 splits: - name: train num_examples: 392702 - name: validation num_examples: 2490 - name: test num_examples: 5010 configs: - config_name: as data_files: - split: train path: data/as/train-* - split: validation path: data/as/validation-* - split: test path: data/as/test-* - config_name: bn data_files: - split: train path: data/bn/train-* - split: validation path: data/bn/validation-* - split: test path: data/bn/test-* - config_name: gu data_files: - split: train path: data/gu/train-* - split: validation path: data/gu/validation-* - split: test path: data/gu/test-* - config_name: hi data_files: - split: train path: data/hi/train-* - split: validation path: data/hi/validation-* - split: test path: data/hi/test-* - config_name: kn data_files: - split: train path: data/kn/train-* - split: validation path: data/kn/validation-* - split: test path: data/kn/test-* - config_name: ml data_files: - split: train path: data/ml/train-* - split: validation path: data/ml/validation-* - split: test path: data/ml/test-* - config_name: mr data_files: - split: train path: data/mr/train-* - split: validation path: data/mr/validation-* - split: test path: data/mr/test-* - config_name: or data_files: - split: train path: data/or/train-* - split: validation path: data/or/validation-* - split: test path: data/or/test-* - config_name: pa data_files: - split: train path: data/pa/train-* - split: validation path: data/pa/validation-* - split: test path: data/pa/test-* - config_name: ta data_files: - split: train path: data/ta/train-* - split: validation path: data/ta/validation-* - split: test path: data/ta/test-* - config_name: te data_files: - split: train path: data/te/train-* - split: validation path: data/te/validation-* - split: test path: data/te/test-* --- # IndicXNLI (Repaired) This is a repaired version of the [Divyanshu/indicxnli](https://huggingface.co/datasets/Divyanshu/indicxnli) dataset, converted to parquet format for compatibility with HuggingFace datasets 4.x+. ## Why this exists The original dataset uses a Python loading script (`indicxnli.py`) which is no longer supported in HuggingFace datasets 4.x. This version converts the data to native parquet format. ## Original Dataset - **Paper**: [IndicXNLI: Evaluating Multilingual Inference for Indian Languages](https://arxiv.org/abs/2204.08776) - **Original Repo**: [Divyanshu/indicxnli](https://huggingface.co/datasets/Divyanshu/indicxnli) ## Languages | Code | Language | |------|----------| | as | Assamese | | bn | Bengali | | gu | Gujarati | | hi | Hindi | | kn | Kannada | | ml | Malayalam | | mr | Marathi | | or | Oriya | | pa | Punjabi | | ta | Tamil | | te | Telugu | ## Usage ```python from datasets import load_dataset # Load Hindi validation split ds = load_dataset("AdaMLLab/indicxnli_repaired", "hi", split="validation") ``` ## Schema Each sample contains: - `premise` (string): The premise sentence - `hypothesis` (string): The hypothesis sentence - `label` (int): 0=entailment, 1=neutral, 2=contradiction ## Data Source This dataset uses the "forward" translation direction from the original dataset, where English XNLI was translated to Indic languages. ## Citation If you use this dataset, please cite the original paper: ```bibtex @inproceedings{aggarwal-etal-2022-indicxnli, title = "{I}ndic{XNLI}: Evaluating Multilingual Inference for {I}ndian Languages", author = "Aggarwal, Divyanshu and Gupta, Vivek and Kunchukuttan, Anoop", booktitle = "Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing", year = "2022" } ```

应用场景：