ai4bharat/IndicIFEval
收藏Hugging Face2026-02-26 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/ai4bharat/IndicIFEval
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- as
- bn
- gu
- hi
- kn
- ml
- mr
- ne
- or
- pa
- sa
- ta
- te
- ur
- en
license: cc-by-4.0
task_categories:
- text-generation
dataset_info:
- config_name: indicifeval-ground
features:
- name: key
dtype: int64
- name: prompt
dtype: string
- name: instruction_id_list
list: string
- name: kwargs
list:
- name: capital_frequency
dtype: 'null'
- name: capital_relation
dtype: 'null'
- name: end_phrase
dtype: 'null'
- name: first_word
dtype: string
- name: forbidden_words
list: string
- name: frequency
dtype: int64
- name: keyword
dtype: string
- name: keywords
dtype: 'null'
- name: language
dtype: 'null'
- name: let_frequency
dtype: 'null'
- name: let_relation
dtype: 'null'
- name: letter
dtype: 'null'
- name: nth_paragraph
dtype: int64
- name: num_bullets
dtype: 'null'
- name: num_highlights
dtype: 'null'
- name: num_paragraphs
dtype: int64
- name: num_placeholders
dtype: 'null'
- name: num_sections
dtype: 'null'
- name: num_sentences
dtype: int64
- name: num_words
dtype: 'null'
- name: postscript_marker
dtype: 'null'
- name: prompt_to_repeat
dtype: 'null'
- name: relation
dtype: 'null'
- name: section_spliter
dtype: 'null'
- name: tags
list: string
- name: resp_lang
dtype: string
splits:
- name: as
num_bytes: 274433
num_examples: 257
- name: bn
num_bytes: 512849
num_examples: 409
- name: gu
num_bytes: 417855
num_examples: 377
- name: hi
num_bytes: 565520
num_examples: 457
- name: kn
num_bytes: 399010
num_examples: 341
- name: ml
num_bytes: 492347
num_examples: 384
- name: mr
num_bytes: 411112
num_examples: 350
- name: ne
num_bytes: 426576
num_examples: 341
- name: or
num_bytes: 384418
num_examples: 368
- name: sa
num_bytes: 357504
num_examples: 336
- name: ta
num_bytes: 582072
num_examples: 430
- name: te
num_bytes: 456548
num_examples: 375
- name: ur
num_bytes: 279076
num_examples: 337
- name: pa
num_bytes: 456945
num_examples: 409
download_size: 2239855
dataset_size: 6016265
- config_name: indicifeval-trans
features:
- name: key
dtype: int64
- name: prompt
dtype: string
- name: instruction_id_list
list: string
- name: kwargs
list:
- name: num_highlights
dtype: int64
- name: relation
dtype: string
- name: num_words
dtype: int64
- name: num_placeholders
dtype: int64
- name: prompt_to_repeat
dtype: string
- name: num_bullets
dtype: int64
- name: section_spliter
dtype: string
- name: num_sections
dtype: int64
- name: capital_relation
dtype: string
- name: capital_frequency
dtype: int64
- name: keywords
list: string
- name: num_paragraphs
dtype: int64
- name: language
dtype: string
- name: let_relation
dtype: string
- name: letter
dtype: string
- name: let_frequency
dtype: int64
- name: end_phrase
dtype: string
- name: forbidden_words
list: string
- name: keyword
dtype: string
- name: frequency
dtype: int64
- name: num_sentences
dtype: int64
- name: postscript_marker
dtype: string
- name: first_word
dtype: string
- name: nth_paragraph
dtype: int64
- name: resp_lang
dtype: string
- name: tags
list: string
splits:
- name: as
num_bytes: 458525
num_examples: 490
- name: bn
num_bytes: 469369
num_examples: 490
- name: gu
num_bytes: 452357
num_examples: 490
- name: hi
num_bytes: 463815
num_examples: 490
- name: kn
num_bytes: 495654
num_examples: 490
- name: ml
num_bytes: 519668
num_examples: 490
- name: mr
num_bytes: 458844
num_examples: 490
- name: ne
num_bytes: 481230
num_examples: 490
- name: or
num_bytes: 482960
num_examples: 490
- name: pa
num_bytes: 459979
num_examples: 490
- name: sa
num_bytes: 462530
num_examples: 490
- name: ta
num_bytes: 537833
num_examples: 490
- name: te
num_bytes: 477992
num_examples: 490
- name: ur
num_bytes: 369380
num_examples: 490
- name: en
num_bytes: 263622
num_examples: 490
download_size: 3145340
dataset_size: 6853758
configs:
- config_name: indicifeval-ground
data_files:
- split: as
path: indicifeval-ground/as-*
- split: bn
path: indicifeval-ground/bn-*
- split: gu
path: indicifeval-ground/gu-*
- split: hi
path: indicifeval-ground/hi-*
- split: kn
path: indicifeval-ground/kn-*
- split: ml
path: indicifeval-ground/ml-*
- split: mr
path: indicifeval-ground/mr-*
- split: ne
path: indicifeval-ground/ne-*
- split: or
path: indicifeval-ground/or-*
- split: sa
path: indicifeval-ground/sa-*
- split: ta
path: indicifeval-ground/ta-*
- split: te
path: indicifeval-ground/te-*
- split: ur
path: indicifeval-ground/ur-*
- split: pa
path: indicifeval-ground/pa-*
- config_name: indicifeval-trans
data_files:
- split: en
path: indicifeval-trans/en-*
- split: as
path: indicifeval-trans/as-*
- split: bn
path: indicifeval-trans/bn-*
- split: gu
path: indicifeval-trans/gu-*
- split: hi
path: indicifeval-trans/hi-*
- split: kn
path: indicifeval-trans/kn-*
- split: ml
path: indicifeval-trans/ml-*
- split: mr
path: indicifeval-trans/mr-*
- split: ne
path: indicifeval-trans/ne-*
- split: or
path: indicifeval-trans/or-*
- split: pa
path: indicifeval-trans/pa-*
- split: sa
path: indicifeval-trans/sa-*
- split: ta
path: indicifeval-trans/ta-*
- split: te
path: indicifeval-trans/te-*
- split: ur
path: indicifeval-trans/ur-*
---
# IndicIFEval
[**Paper**](https://huggingface.co/papers/2602.22125) | [**GitHub**](https://github.com/ai4bharat/IndicIFEval)
Instruction-following benchmarks remain predominantly English-centric,
leaving a critical evaluation gap for the hundreds of millions of Indic
language speakers. We introduce IndicIFEval, a benchmark evaluating constrained
generation of LLMs across 14 Indic languages using automatically verifiable,
rule-based instructions. It combines two complementary tracks: IndicIFEval-Trans,
translated prompts from IFEval carefully localized for Indic
contexts, and IndicIFEval-Ground, synthetically generated instructions grounded
in native Indic content.
# Overview
14 Indic Languages: Assamese, Bengali, Gujarati, Hindi, Kannada, Malayalam,
Marathi, Nepali, Odia, Punjabi, Sanskrit, Tamil, Telugu, Urdu
We implement the same format of the original IFEval to allow a streamlined and compatible evaluation framework across all the languages. The table below details the column names, their respective data types, and a brief description of their contents.
| Column Name | Data Type | Description |
| :--- | :--- | :--- |
| key | `int64` | Unique identifier for each evaluation instance. |
| prompt | `str` | The natural language instruction presented to the model. |
| instruction_id_list | List of `str` | Identifiers specifying which format or constraint checks apply to the prompt. |
| kwargs | List of `dict` | Parameter values associated with each constraint specified in the instruction list. |
## IndicIFEval-Trans
Translated and localized prompts from the English
IFEval benchmark, carefully filtered and manually verified by native
speakers for cultural suitability and translation quality
| Language | Constraint | Prompt | English Prompt | Original Prompt |
|---|---|---|---|---|
| Marathi | Placeholder Inclusion | तुम्ही भारताचे पंतप्रधान असल्यासारखे निबंध लिहा आणि मातांना तुमचे प्रेक्षक म्हणून लक्ष्य करा. 'तारे जमीन पर' हा चित्रपट प्रत्येक मुलाच्या अद्वितीय क्षमतांना कसे वाढवायचे याचे प्रतीक आहे हा विषय आहे. प्रतिसादात [पत्ता] सारखे चौकोनी कंसात दर्शविलेले किमान 1 प्लेसहोल्डर असले पाहिजेत. | Write an essay as if you are the Prime Minister of India targeting moms as your audience. The subject is how the movie 'Taare Zameen Par' symbolizes the need to nurture every child's unique abilities. The response must contain at least 1 placeholders represented by square brackets, such as [address]. | Write an essay as if you are the president of the United States targeting moms as your audience. The subject is how the float from the movie "It" symbolizes the spirit of the nineteen-fifties. The response must contain at least 1 placeholders represented by square brackets, such as [address]. |
| Bengali | Word Count | আমার কাছে একটা টাকা আছে। এই টাকা দিয়ে আমি কী করতে পারি? ভারতের একজন প্রধানমন্ত্রীর স্টাইলে আমাকে পরামর্শ দিন এবং নিশ্চিত করুন যে এতে কমপক্ষে ৬০০ শব্দ আছে। | I have a rupee. What can I do with this rupee? Give me advice in the style of a Prime Minister of India and make sure it has at least 600 words. | I have a dime. What can I do with this dime? Give me advice in the style of a President of the United States and make sure it has at least 600 words. |
| Tamil | JSON, Forbidden Keywords | வேண்டுகோள்: 1. சென்னைல பார்க்க சிறந்த இடங்கள் என்ன? 2. பரிந்துரைக்கப்பட்ட ஹோட்டல்களின் பட்டியலைச் சேர்க்கவும். 3. முழு வெளியீட்டையும் JSON வடிவமைப்பில் மூடவும். 4. மெரினா, பெசன்ட், மயிலாப்பூர், கபாலீஸ்வரர், சாந்தோம் ஆகிய முக்கிய வார்த்தைகளை சேர்க்க வேண்டாம் | Request: 1. What are the best places to visit in Chennai? 2. Include a list of recommended hotels. 3. Wrap the ENTIRE output in JSON format. 4. Do not include the following keywords: Marina, Besant, Mylapore, Kapaleeshwarar, Santhome | Request: 1. What are the best places to visit in Bohemia, Czech Republic? 2. Include a list of recommended hotels. 3. Wrap the ENTIRE output in JSON format. 4. Do not include the following keywords: Moser, Glassworks, Pravcice, Karlovy, Vary |
| Punjabi | Title Format | ਕੋਟਲੀਨ ਬਨਾਮ ਜਾਵਾ ਦੇ ਕੀ ਫਾਇਦੇ ਅਤੇ ਨੁਕਸਾਨ ਹਨ? ਤੁਹਾਡੇ ਜਵਾਬ ਵਿੱਚ ਇੱਕ ਸਿਰਲੇਖ ਹੋਣਾ ਚਾਹੀਦਾ ਹੈ ਜੋ ਦੋਹਰੇ ਕੋਣੀ ਬਰੈਕਟਾਂ ਵਿੱਚ ਹੋਵੇ, ਜਿਵੇਂ ਕਿ << ਕੋਟਲੀਨ ਬਨਾਮ ਜਾਵਾ >> | What are the pros and cons of kotlin vs java? Your answer must have a title contained in double angular brackets, such as <<kotlin vs java>>. | No Modification |"
| Telugu | Bullet Point Format | ఈ ప్రకటన గురించి మీ అభిప్రాయం ఏమిటి: "యోగులు మాంత్రికుల కంటే గొప్పవారు ఎందుకంటే యోగులు తరచుగా శక్తిని కలిగి ఉంటారు కానీ విముక్తి చాలా ముఖ్యమైనదని తెలుసుకుని దానిని ఉపయోగించరు."? మీ సమాధానంలో కనీసం 30 వాక్యాలు మరియు సరిగ్గా 2 బుల్లెট పాయింట్లు ఉండాలి. అలాగే, కనీసం 8 ఖాళీలు చదరపు బ్రాకెట్లలో ఉండాలి, ఉదాహరణకు [address]. బుల్లెట్ పాయింట్లను ఈ విధంగా ఉపయోగించండి:* ఇది ఒక బుల్లెట్ పాయింట్ | What do you think about this statement: "Yogis are greater than sorcerers because yogis often possess power but choose not to use it, knowing that liberation is far more important."? Your response should contain at least 30 sentences and exactly 2 bullet points. Also, it must contain at least 8 placeholders represented by square brackets, such as [address]. Use the bullet points like:* This is a bullet point | What do you think about this statement: "Wizards are more powerful than sorcerers because they study magic instead of being born with it."? Your response should contain at least 30 sentences and exactly 2 bullet points. Also, it must contain at least 8 placeholders represented by square brackets, such as [address]. Use the bullet points like:* This is a bullet point |",
## IndicIFEval-Ground
Synthetically generated instructions grounded
in native Indic topics and content, manually verified by native
speakers. Unlike translated prompts, these reflect naturalistic
constraints with more real-world contexts.
| Language | Constraint | Prompt | English Translation |
|---|---|---|---|
| Odia | Keyword Frequency | ୧୯୪୮ ମସିହାରେ କଟକ ଡିଭିଜନରେ ପ୍ରାଦେଶିକ ରାଜସ୍ୱ ବୋର୍ଡର ପ୍ରତିଷ୍ଠା ଉପରେ ଏକ বୈଷୟିକ ବିবରଣୀ ଲେଖନ୍ତୁ। ଏହି ବିবରଣୀରେ, ବୋର୍ଡର ମୁଖ୍ୟ କାର୍ଯ୍ୟ ଏବଂ ପୂର୍ବତନ ପ୍ରଶାସନିକ ବ୍ୟବସ୍ଥାରେ ହୋଇଥିବା ପରିବର୍ତ୍ତନ ଉପରେ ଆଲୋକପାତ କରନ୍ତୁ। ଆପଣଙ୍କ ଉତ୍ତରରେ 'ରହିଲା' ଶବ୍ଦଟି ଠିକ୍ ଦୁଇଥର ରହିବା ଆବଶ୍ୟକ। | Write a technical account on the establishment of Provincial Board of Revenue in Cuttack Division in 1948. In this detail, highlight the main functions of the Board and the changes in the previous administrative system. Your answer must contain the word 'remained' exactly twice. |
| Kannada | Keyword Prohibition | ಬೆಂಗಳೂರಿನ ಪುರಸಭೆ ಕಚೇರಿಯಲ್ಲಿ ಹತಾಶೆಗೊಂಡಿರುವ ನಾಗರಿಕ, ಅರ್ಜುನ್, ಮತ್ತು ದಣಿದಿರುವ ಸರ್ಕಾರಿ ಅಧಿಕಾರಿ, ರಾವ್, ಇವರ ನಡುವೆ ಒಂದು ಸಣ್ಣ ಸಂಭಾಷಣೆಯನ್ನು ರಚಿಸಿ. ಅರ್ಜುನ್ ತನ್ನ ಹೊಸ ಮನೆ ನಿರ್ಮಾಣದ ಪರವಾನಗಿಗಾಗಿ ಆನ್ಲೈನ್ನಲ್ಲಿ ಸಲ್ಲಿಸಿದ ಅರ್ಜಿಯ ಸ್ಥಿತಿಯನ್ನು ತಿಳಿಯಲು ಬಂದಿದ್ದಾನೆ. ಆನ್ಲೈನ್ ವ್ಯವಸ್ಥೆಯು ಸ್ಥಗಿತಗೊಂಡಿರುವುದರಿಂದ, ಅರ್ಜಿಯನ್ನು ಕೈಯಾರೆ ಪುನಃ ಸಲ್ಲಿಸಬೇಕು ಎಂದು ರಾವ್ ವಿವರಿಸಬೇಕು. ಸಂಭಾಷಣೆಯ ಒಂದು ಪ್ರಮುಖ ಭಾಗದಲ್ಲಿ, ರಾವ್ ಅವರು ಹೊಸ ನಿಯಮದ ಪ್ರಕಾರ, ಅಧಿಕೃತ ಸಂವಹನದಲ್ಲಿ 'ಕಟ್ಟಡ' ಎಂಬ ಪದವನ್ನು ಬಳಸಬಾರದು ಎಂದು ಸ್ಪಷ್ಟವಾಗಿ ಒತ್ತಿಹೇಳಬೇಕು. ನಿಮ್ಮ ಉತ್ತರದಲ್ಲಿ 'ಕಟ್ಟಡ' ಎಂಬ ಪದವು ಎಲ್ಲೂ ಇರಬಾರದು. | Create a short conversation between a frustrated citizen, Arjun, and a tired government official, Rao, in a Bangalore municipal office. Arjun comes to know the status of his online application for construction permit for his new house. Rao should explain that since the online system is down, the application has to be re-submitted manually. In an important part of the conversation, Rao must clearly emphasize that the word 'building' should not be used in official communication as per the new rule. The word 'building' should not appear anywhere in your answer. |
| Malayalam | First Word | കോഴിക്കോട് ജില്ലയിലെ ഗ്രാമീണ മേഖലകളിൽ വർദ്ധിച്ചുവരുന്ന മനുഷ്യ-വನ್ಯജീവി സംഘർഷം എന്ന വിഷയത്തിൽ ഒരു ഉപന്യാസം എഴുതുക. ഈ പ്രശ്നത്തിന്റെ സാമൂഹിക പ്രത്യാഘാതങ്ങളും പരിಹാര മാർഗ്ಗങ്ങളും ചർച്ച ചെയ്യണം. നിങ്ങളുടെ ഉപന്യാസം ഒരു പത്രത്തിലെ ഫീച്ചർ ലേഖനത്തിന്റെ രൂപത്തിലായിരിക്കണം, അതിനാൽ അത് 'കോഴിക്കോട്:' എന്ന വാക്കിൽ തന്നെ ആരംഭിക്കേണ്ടത് അത്യാവശ്യമാണ്. | Write an essay on the topic of increasing human-wildlife conflict in rural areas of Kozhikode district. Discuss the social implications of this problem and possible solutions. Your essay should be in the form of a feature article in a newspaper, so it is essential that it begins with the word 'Kozhikode:'. |
| Urdu | Sentence Count | ہندوستان کی تحریک آزادی کے گمنام ہیروز پر ایک تاریخی مضمون کے لیے، اشفاق اللہ خان کے بارے میں ایک تحقیقی سوال تیار کریں۔ براہ کرم اپنا سوال صرف ایک جملے میں لکھیں۔ | For a historical essay on the unsung heroes of the Indian Independence Movement, prepare a research question about Ashfaqulla Khan. Please write your question in only one sentence. |
| Gujarati | Paragraph Count | તમે એક ડિજિટલ સમાચાર પોર્ટલ માટે કન્ટેન્ટ એડિটর છો. તમને નીચેનો ડ્રાફ્ટ મળ્યો છે, જે એક હૃદયસ્પર્શી વાયરલ વીડિયોનું વર્ણನ કરે છે. આ લખાણ પુનરાವರ್તિત અને થોડું અવ્યವಸ್ಥಿತ છે. તમારું કાર્ય તેને એક જ, ಸುಸಂಗತ ಮತ್ತು ઔપચારિક ફકરામાં ફરીથી લખવાનું છે જે અમારા પોર્ટલ પર પ્રકાશિત કરવા માટે યોગ્ય હોય. <br><br>સોશિયલ મીડિયા પર એકથી વધુ વીડિયો વાયરલ થતા રહે છે. કેટલાક વીડિયો જોયા પછી આપણને ગમે છે, જ્યારે કેટલાક વીડિયો રિલેક્સ કરતા હોય છે. હાલમાં જ ઈન્સ્ટાગ್ರામ પર એક વીડિયો પોસ્ટ કરવામાં આવ્યો હતો અને હવે તે દરેકના દિલને સ્પર્શી જવાને કારણે ઘણો પોપ્યુલર થઈ રહ્યો છે. વીડિયોમાં જોઈ શકાય છે કે કેવી રીતે એક ખાદ્ય વિક્રેતા મોચીને મફતમાં ખાવાનું ખવડાવવાનું કહે છે. આ વીડિયો સોશિયલ મીડિયા પર વાયરલ થઈ રહ્યો છે, લોકો આ વીડિયોને ખૂબ પસંદ કરી રહ્યા છે. સોશિયલ મીડિયા પર લોકો આ વીડિયોને ખૂબ પસંદ કરી રહ્યા છે. આ વાયરલ વીડિયો ઈન્સ્ટાગ್ರામ પર શેર કરવામાં આવ્યો છે. લોકો આ વીડિયોને ખૂબ પસંદ કરી રહ્યા છે. આ વીડિયોને લાખો વ્યૂઝ મળ્યા છે. તો આ વીડિયો તમામ સોશિયલ મીડિયા પ્લેટફોર્મ પર શેર કરવામાં આવી રહ્યો છે. | You are a content editor for a digital news portal. You've got the draft below, which describes a heartwarming viral video. The text is repetitive and a bit disorganized. Your task is to rewrite it into a single, coherent and formal paragraph that is suitable for publishing on our portal.<br><br>More than one video keeps going viral on social media. After watching some videos we like it while some videos are relaxing. Recently a video was posted on Instagram and now it is becoming very popular as it has touched everyone's heart. In the video, one can see how a food vendor asks the cobbler to feed him free food. This video is going viral on social media and people are liking this video a lot. People are liking this video a lot on social media. This viral video has been shared on Instagram. People are liking this video a lot. This video has received millions of views. So this video is being shared on all social media platforms. |
## Citation
```bibtex
@article{jayakumar2026indicifeval,
title={IndicIFEval: A Benchmark for Verifiable Instruction-Following Evaluation in 14 Indic Languages},
author={Thanmay Jayakumar and Mohammed Safi Ur Rahman Khan and Raj Dabre and Ratish Puduppully and Anoop Kunchukuttan},
year={2026},
eprint={2602.22125},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2602.22125},
}
```
提供机构:
ai4bharat



