AdaMLLab/indicxnli_repaired
收藏Hugging Face2026-01-22 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/AdaMLLab/indicxnli_repaired
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
- config_name: as
features:
- name: premise
dtype: string
- name: hypothesis
dtype: string
- name: label
dtype: int64
splits:
- name: train
num_examples: 392702
- name: validation
num_examples: 2490
- name: test
num_examples: 5010
- config_name: bn
features:
- name: premise
dtype: string
- name: hypothesis
dtype: string
- name: label
dtype: int64
splits:
- name: train
num_examples: 392702
- name: validation
num_examples: 2490
- name: test
num_examples: 5010
- config_name: gu
features:
- name: premise
dtype: string
- name: hypothesis
dtype: string
- name: label
dtype: int64
splits:
- name: train
num_examples: 392702
- name: validation
num_examples: 2490
- name: test
num_examples: 5010
- config_name: hi
features:
- name: premise
dtype: string
- name: hypothesis
dtype: string
- name: label
dtype: int64
splits:
- name: train
num_examples: 392702
- name: validation
num_examples: 2490
- name: test
num_examples: 5010
- config_name: kn
features:
- name: premise
dtype: string
- name: hypothesis
dtype: string
- name: label
dtype: int64
splits:
- name: train
num_examples: 392702
- name: validation
num_examples: 2490
- name: test
num_examples: 5010
- config_name: ml
features:
- name: premise
dtype: string
- name: hypothesis
dtype: string
- name: label
dtype: int64
splits:
- name: train
num_examples: 392702
- name: validation
num_examples: 2490
- name: test
num_examples: 5010
- config_name: mr
features:
- name: premise
dtype: string
- name: hypothesis
dtype: string
- name: label
dtype: int64
splits:
- name: train
num_examples: 392702
- name: validation
num_examples: 2490
- name: test
num_examples: 5010
- config_name: or
features:
- name: premise
dtype: string
- name: hypothesis
dtype: string
- name: label
dtype: int64
splits:
- name: train
num_examples: 392702
- name: validation
num_examples: 2490
- name: test
num_examples: 5010
- config_name: pa
features:
- name: premise
dtype: string
- name: hypothesis
dtype: string
- name: label
dtype: int64
splits:
- name: train
num_examples: 392702
- name: validation
num_examples: 2490
- name: test
num_examples: 5010
- config_name: ta
features:
- name: premise
dtype: string
- name: hypothesis
dtype: string
- name: label
dtype: int64
splits:
- name: train
num_examples: 392702
- name: validation
num_examples: 3238
- name: test
num_examples: 5010
- config_name: te
features:
- name: premise
dtype: string
- name: hypothesis
dtype: string
- name: label
dtype: int64
splits:
- name: train
num_examples: 392702
- name: validation
num_examples: 2490
- name: test
num_examples: 5010
configs:
- config_name: as
data_files:
- split: train
path: data/as/train-*
- split: validation
path: data/as/validation-*
- split: test
path: data/as/test-*
- config_name: bn
data_files:
- split: train
path: data/bn/train-*
- split: validation
path: data/bn/validation-*
- split: test
path: data/bn/test-*
- config_name: gu
data_files:
- split: train
path: data/gu/train-*
- split: validation
path: data/gu/validation-*
- split: test
path: data/gu/test-*
- config_name: hi
data_files:
- split: train
path: data/hi/train-*
- split: validation
path: data/hi/validation-*
- split: test
path: data/hi/test-*
- config_name: kn
data_files:
- split: train
path: data/kn/train-*
- split: validation
path: data/kn/validation-*
- split: test
path: data/kn/test-*
- config_name: ml
data_files:
- split: train
path: data/ml/train-*
- split: validation
path: data/ml/validation-*
- split: test
path: data/ml/test-*
- config_name: mr
data_files:
- split: train
path: data/mr/train-*
- split: validation
path: data/mr/validation-*
- split: test
path: data/mr/test-*
- config_name: or
data_files:
- split: train
path: data/or/train-*
- split: validation
path: data/or/validation-*
- split: test
path: data/or/test-*
- config_name: pa
data_files:
- split: train
path: data/pa/train-*
- split: validation
path: data/pa/validation-*
- split: test
path: data/pa/test-*
- config_name: ta
data_files:
- split: train
path: data/ta/train-*
- split: validation
path: data/ta/validation-*
- split: test
path: data/ta/test-*
- config_name: te
data_files:
- split: train
path: data/te/train-*
- split: validation
path: data/te/validation-*
- split: test
path: data/te/test-*
---
# IndicXNLI (Repaired)
This is a repaired version of the [Divyanshu/indicxnli](https://huggingface.co/datasets/Divyanshu/indicxnli) dataset, converted to parquet format for compatibility with HuggingFace datasets 4.x+.
## Why this exists
The original dataset uses a Python loading script (`indicxnli.py`) which is no longer supported in HuggingFace datasets 4.x. This version converts the data to native parquet format.
## Original Dataset
- **Paper**: [IndicXNLI: Evaluating Multilingual Inference for Indian Languages](https://arxiv.org/abs/2204.08776)
- **Original Repo**: [Divyanshu/indicxnli](https://huggingface.co/datasets/Divyanshu/indicxnli)
## Languages
| Code | Language |
|------|----------|
| as | Assamese |
| bn | Bengali |
| gu | Gujarati |
| hi | Hindi |
| kn | Kannada |
| ml | Malayalam |
| mr | Marathi |
| or | Oriya |
| pa | Punjabi |
| ta | Tamil |
| te | Telugu |
## Usage
```python
from datasets import load_dataset
# Load Hindi validation split
ds = load_dataset("AdaMLLab/indicxnli_repaired", "hi", split="validation")
```
## Schema
Each sample contains:
- `premise` (string): The premise sentence
- `hypothesis` (string): The hypothesis sentence
- `label` (int): 0=entailment, 1=neutral, 2=contradiction
## Data Source
This dataset uses the "forward" translation direction from the original dataset, where English XNLI was translated to Indic languages.
## Citation
If you use this dataset, please cite the original paper:
```bibtex
@inproceedings{aggarwal-etal-2022-indicxnli,
title = "{I}ndic{XNLI}: Evaluating Multilingual Inference for {I}ndian Languages",
author = "Aggarwal, Divyanshu and Gupta, Vivek and Kunchukuttan, Anoop",
booktitle = "Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing",
year = "2022"
}
```
提供机构:
AdaMLLab



