five

AdaMLLab/indicxnli_repaired

收藏
Hugging Face2026-01-22 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/AdaMLLab/indicxnli_repaired
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: - config_name: as features: - name: premise dtype: string - name: hypothesis dtype: string - name: label dtype: int64 splits: - name: train num_examples: 392702 - name: validation num_examples: 2490 - name: test num_examples: 5010 - config_name: bn features: - name: premise dtype: string - name: hypothesis dtype: string - name: label dtype: int64 splits: - name: train num_examples: 392702 - name: validation num_examples: 2490 - name: test num_examples: 5010 - config_name: gu features: - name: premise dtype: string - name: hypothesis dtype: string - name: label dtype: int64 splits: - name: train num_examples: 392702 - name: validation num_examples: 2490 - name: test num_examples: 5010 - config_name: hi features: - name: premise dtype: string - name: hypothesis dtype: string - name: label dtype: int64 splits: - name: train num_examples: 392702 - name: validation num_examples: 2490 - name: test num_examples: 5010 - config_name: kn features: - name: premise dtype: string - name: hypothesis dtype: string - name: label dtype: int64 splits: - name: train num_examples: 392702 - name: validation num_examples: 2490 - name: test num_examples: 5010 - config_name: ml features: - name: premise dtype: string - name: hypothesis dtype: string - name: label dtype: int64 splits: - name: train num_examples: 392702 - name: validation num_examples: 2490 - name: test num_examples: 5010 - config_name: mr features: - name: premise dtype: string - name: hypothesis dtype: string - name: label dtype: int64 splits: - name: train num_examples: 392702 - name: validation num_examples: 2490 - name: test num_examples: 5010 - config_name: or features: - name: premise dtype: string - name: hypothesis dtype: string - name: label dtype: int64 splits: - name: train num_examples: 392702 - name: validation num_examples: 2490 - name: test num_examples: 5010 - config_name: pa features: - name: premise dtype: string - name: hypothesis dtype: string - name: label dtype: int64 splits: - name: train num_examples: 392702 - name: validation num_examples: 2490 - name: test num_examples: 5010 - config_name: ta features: - name: premise dtype: string - name: hypothesis dtype: string - name: label dtype: int64 splits: - name: train num_examples: 392702 - name: validation num_examples: 3238 - name: test num_examples: 5010 - config_name: te features: - name: premise dtype: string - name: hypothesis dtype: string - name: label dtype: int64 splits: - name: train num_examples: 392702 - name: validation num_examples: 2490 - name: test num_examples: 5010 configs: - config_name: as data_files: - split: train path: data/as/train-* - split: validation path: data/as/validation-* - split: test path: data/as/test-* - config_name: bn data_files: - split: train path: data/bn/train-* - split: validation path: data/bn/validation-* - split: test path: data/bn/test-* - config_name: gu data_files: - split: train path: data/gu/train-* - split: validation path: data/gu/validation-* - split: test path: data/gu/test-* - config_name: hi data_files: - split: train path: data/hi/train-* - split: validation path: data/hi/validation-* - split: test path: data/hi/test-* - config_name: kn data_files: - split: train path: data/kn/train-* - split: validation path: data/kn/validation-* - split: test path: data/kn/test-* - config_name: ml data_files: - split: train path: data/ml/train-* - split: validation path: data/ml/validation-* - split: test path: data/ml/test-* - config_name: mr data_files: - split: train path: data/mr/train-* - split: validation path: data/mr/validation-* - split: test path: data/mr/test-* - config_name: or data_files: - split: train path: data/or/train-* - split: validation path: data/or/validation-* - split: test path: data/or/test-* - config_name: pa data_files: - split: train path: data/pa/train-* - split: validation path: data/pa/validation-* - split: test path: data/pa/test-* - config_name: ta data_files: - split: train path: data/ta/train-* - split: validation path: data/ta/validation-* - split: test path: data/ta/test-* - config_name: te data_files: - split: train path: data/te/train-* - split: validation path: data/te/validation-* - split: test path: data/te/test-* --- # IndicXNLI (Repaired) This is a repaired version of the [Divyanshu/indicxnli](https://huggingface.co/datasets/Divyanshu/indicxnli) dataset, converted to parquet format for compatibility with HuggingFace datasets 4.x+. ## Why this exists The original dataset uses a Python loading script (`indicxnli.py`) which is no longer supported in HuggingFace datasets 4.x. This version converts the data to native parquet format. ## Original Dataset - **Paper**: [IndicXNLI: Evaluating Multilingual Inference for Indian Languages](https://arxiv.org/abs/2204.08776) - **Original Repo**: [Divyanshu/indicxnli](https://huggingface.co/datasets/Divyanshu/indicxnli) ## Languages | Code | Language | |------|----------| | as | Assamese | | bn | Bengali | | gu | Gujarati | | hi | Hindi | | kn | Kannada | | ml | Malayalam | | mr | Marathi | | or | Oriya | | pa | Punjabi | | ta | Tamil | | te | Telugu | ## Usage ```python from datasets import load_dataset # Load Hindi validation split ds = load_dataset("AdaMLLab/indicxnli_repaired", "hi", split="validation") ``` ## Schema Each sample contains: - `premise` (string): The premise sentence - `hypothesis` (string): The hypothesis sentence - `label` (int): 0=entailment, 1=neutral, 2=contradiction ## Data Source This dataset uses the "forward" translation direction from the original dataset, where English XNLI was translated to Indic languages. ## Citation If you use this dataset, please cite the original paper: ```bibtex @inproceedings{aggarwal-etal-2022-indicxnli, title = "{I}ndic{XNLI}: Evaluating Multilingual Inference for {I}ndian Languages", author = "Aggarwal, Divyanshu and Gupta, Vivek and Kunchukuttan, Anoop", booktitle = "Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing", year = "2022" } ```
提供机构:
AdaMLLab
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作