five

ai4bharat/IndicIFEval

收藏
Hugging Face2026-02-26 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/ai4bharat/IndicIFEval
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - as - bn - gu - hi - kn - ml - mr - ne - or - pa - sa - ta - te - ur - en license: cc-by-4.0 task_categories: - text-generation dataset_info: - config_name: indicifeval-ground features: - name: key dtype: int64 - name: prompt dtype: string - name: instruction_id_list list: string - name: kwargs list: - name: capital_frequency dtype: 'null' - name: capital_relation dtype: 'null' - name: end_phrase dtype: 'null' - name: first_word dtype: string - name: forbidden_words list: string - name: frequency dtype: int64 - name: keyword dtype: string - name: keywords dtype: 'null' - name: language dtype: 'null' - name: let_frequency dtype: 'null' - name: let_relation dtype: 'null' - name: letter dtype: 'null' - name: nth_paragraph dtype: int64 - name: num_bullets dtype: 'null' - name: num_highlights dtype: 'null' - name: num_paragraphs dtype: int64 - name: num_placeholders dtype: 'null' - name: num_sections dtype: 'null' - name: num_sentences dtype: int64 - name: num_words dtype: 'null' - name: postscript_marker dtype: 'null' - name: prompt_to_repeat dtype: 'null' - name: relation dtype: 'null' - name: section_spliter dtype: 'null' - name: tags list: string - name: resp_lang dtype: string splits: - name: as num_bytes: 274433 num_examples: 257 - name: bn num_bytes: 512849 num_examples: 409 - name: gu num_bytes: 417855 num_examples: 377 - name: hi num_bytes: 565520 num_examples: 457 - name: kn num_bytes: 399010 num_examples: 341 - name: ml num_bytes: 492347 num_examples: 384 - name: mr num_bytes: 411112 num_examples: 350 - name: ne num_bytes: 426576 num_examples: 341 - name: or num_bytes: 384418 num_examples: 368 - name: sa num_bytes: 357504 num_examples: 336 - name: ta num_bytes: 582072 num_examples: 430 - name: te num_bytes: 456548 num_examples: 375 - name: ur num_bytes: 279076 num_examples: 337 - name: pa num_bytes: 456945 num_examples: 409 download_size: 2239855 dataset_size: 6016265 - config_name: indicifeval-trans features: - name: key dtype: int64 - name: prompt dtype: string - name: instruction_id_list list: string - name: kwargs list: - name: num_highlights dtype: int64 - name: relation dtype: string - name: num_words dtype: int64 - name: num_placeholders dtype: int64 - name: prompt_to_repeat dtype: string - name: num_bullets dtype: int64 - name: section_spliter dtype: string - name: num_sections dtype: int64 - name: capital_relation dtype: string - name: capital_frequency dtype: int64 - name: keywords list: string - name: num_paragraphs dtype: int64 - name: language dtype: string - name: let_relation dtype: string - name: letter dtype: string - name: let_frequency dtype: int64 - name: end_phrase dtype: string - name: forbidden_words list: string - name: keyword dtype: string - name: frequency dtype: int64 - name: num_sentences dtype: int64 - name: postscript_marker dtype: string - name: first_word dtype: string - name: nth_paragraph dtype: int64 - name: resp_lang dtype: string - name: tags list: string splits: - name: as num_bytes: 458525 num_examples: 490 - name: bn num_bytes: 469369 num_examples: 490 - name: gu num_bytes: 452357 num_examples: 490 - name: hi num_bytes: 463815 num_examples: 490 - name: kn num_bytes: 495654 num_examples: 490 - name: ml num_bytes: 519668 num_examples: 490 - name: mr num_bytes: 458844 num_examples: 490 - name: ne num_bytes: 481230 num_examples: 490 - name: or num_bytes: 482960 num_examples: 490 - name: pa num_bytes: 459979 num_examples: 490 - name: sa num_bytes: 462530 num_examples: 490 - name: ta num_bytes: 537833 num_examples: 490 - name: te num_bytes: 477992 num_examples: 490 - name: ur num_bytes: 369380 num_examples: 490 - name: en num_bytes: 263622 num_examples: 490 download_size: 3145340 dataset_size: 6853758 configs: - config_name: indicifeval-ground data_files: - split: as path: indicifeval-ground/as-* - split: bn path: indicifeval-ground/bn-* - split: gu path: indicifeval-ground/gu-* - split: hi path: indicifeval-ground/hi-* - split: kn path: indicifeval-ground/kn-* - split: ml path: indicifeval-ground/ml-* - split: mr path: indicifeval-ground/mr-* - split: ne path: indicifeval-ground/ne-* - split: or path: indicifeval-ground/or-* - split: sa path: indicifeval-ground/sa-* - split: ta path: indicifeval-ground/ta-* - split: te path: indicifeval-ground/te-* - split: ur path: indicifeval-ground/ur-* - split: pa path: indicifeval-ground/pa-* - config_name: indicifeval-trans data_files: - split: en path: indicifeval-trans/en-* - split: as path: indicifeval-trans/as-* - split: bn path: indicifeval-trans/bn-* - split: gu path: indicifeval-trans/gu-* - split: hi path: indicifeval-trans/hi-* - split: kn path: indicifeval-trans/kn-* - split: ml path: indicifeval-trans/ml-* - split: mr path: indicifeval-trans/mr-* - split: ne path: indicifeval-trans/ne-* - split: or path: indicifeval-trans/or-* - split: pa path: indicifeval-trans/pa-* - split: sa path: indicifeval-trans/sa-* - split: ta path: indicifeval-trans/ta-* - split: te path: indicifeval-trans/te-* - split: ur path: indicifeval-trans/ur-* --- # IndicIFEval [**Paper**](https://huggingface.co/papers/2602.22125) | [**GitHub**](https://github.com/ai4bharat/IndicIFEval) Instruction-following benchmarks remain predominantly English-centric, leaving a critical evaluation gap for the hundreds of millions of Indic language speakers. We introduce IndicIFEval, a benchmark evaluating constrained generation of LLMs across 14 Indic languages using automatically verifiable, rule-based instructions. It combines two complementary tracks: IndicIFEval-Trans, translated prompts from IFEval carefully localized for Indic contexts, and IndicIFEval-Ground, synthetically generated instructions grounded in native Indic content. # Overview 14 Indic Languages: Assamese, Bengali, Gujarati, Hindi, Kannada, Malayalam, Marathi, Nepali, Odia, Punjabi, Sanskrit, Tamil, Telugu, Urdu We implement the same format of the original IFEval to allow a streamlined and compatible evaluation framework across all the languages. The table below details the column names, their respective data types, and a brief description of their contents. | Column Name | Data Type | Description | | :--- | :--- | :--- | | key | `int64` | Unique identifier for each evaluation instance. | | prompt | `str` | The natural language instruction presented to the model. | | instruction_id_list | List of `str` | Identifiers specifying which format or constraint checks apply to the prompt. | | kwargs | List of `dict` | Parameter values associated with each constraint specified in the instruction list. | ## IndicIFEval-Trans Translated and localized prompts from the English IFEval benchmark, carefully filtered and manually verified by native speakers for cultural suitability and translation quality | Language | Constraint | Prompt | English Prompt | Original Prompt | |---|---|---|---|---| | Marathi | Placeholder Inclusion | तुम्ही भारताचे पंतप्रधान असल्यासारखे निबंध लिहा आणि मातांना तुमचे प्रेक्षक म्हणून लक्ष्य करा. 'तारे जमीन पर' हा चित्रपट प्रत्येक मुलाच्या अद्वितीय क्षमतांना कसे वाढवायचे याचे प्रतीक आहे हा विषय आहे. प्रतिसादात [पत्ता] सारखे चौकोनी कंसात दर्शविलेले किमान 1 प्लेसहोल्डर असले पाहिजेत. | Write an essay as if you are the Prime Minister of India targeting moms as your audience. The subject is how the movie 'Taare Zameen Par' symbolizes the need to nurture every child's unique abilities. The response must contain at least 1 placeholders represented by square brackets, such as [address]. | Write an essay as if you are the president of the United States targeting moms as your audience. The subject is how the float from the movie "It" symbolizes the spirit of the nineteen-fifties. The response must contain at least 1 placeholders represented by square brackets, such as [address]. | | Bengali | Word Count | আমার কাছে একটা টাকা আছে। এই টাকা দিয়ে আমি কী করতে পারি? ভারতের একজন প্রধানমন্ত্রীর স্টাইলে আমাকে পরামর্শ দিন এবং নিশ্চিত করুন যে এতে কমপক্ষে ৬০০ শব্দ আছে। | I have a rupee. What can I do with this rupee? Give me advice in the style of a Prime Minister of India and make sure it has at least 600 words. | I have a dime. What can I do with this dime? Give me advice in the style of a President of the United States and make sure it has at least 600 words. | | Tamil | JSON, Forbidden Keywords | வேண்டுகோள்: 1. சென்னைல பார்க்க சிறந்த இடங்கள் என்ன? 2. பரிந்துரைக்கப்பட்ட ஹோட்டல்களின் பட்டியலைச் சேர்க்கவும். 3. முழு வெளியீட்டையும் JSON வடிவமைப்பில் மூடவும். 4. மெரினா, பெசன்ட், மயிலாப்பூர், கபாலீஸ்வரர், சாந்தோம் ஆகிய முக்கிய வார்த்தைகளை சேர்க்க வேண்டாம் | Request: 1. What are the best places to visit in Chennai? 2. Include a list of recommended hotels. 3. Wrap the ENTIRE output in JSON format. 4. Do not include the following keywords: Marina, Besant, Mylapore, Kapaleeshwarar, Santhome | Request: 1. What are the best places to visit in Bohemia, Czech Republic? 2. Include a list of recommended hotels. 3. Wrap the ENTIRE output in JSON format. 4. Do not include the following keywords: Moser, Glassworks, Pravcice, Karlovy, Vary | | Punjabi | Title Format | ਕੋਟਲੀਨ ਬਨਾਮ ਜਾਵਾ ਦੇ ਕੀ ਫਾਇਦੇ ਅਤੇ ਨੁਕਸਾਨ ਹਨ? ਤੁਹਾਡੇ ਜਵਾਬ ਵਿੱਚ ਇੱਕ ਸਿਰਲੇਖ ਹੋਣਾ ਚਾਹੀਦਾ ਹੈ ਜੋ ਦੋਹਰੇ ਕੋਣੀ ਬਰੈਕਟਾਂ ਵਿੱਚ ਹੋਵੇ, ਜਿਵੇਂ ਕਿ << ਕੋਟਲੀਨ ਬਨਾਮ ਜਾਵਾ >> | What are the pros and cons of kotlin vs java? Your answer must have a title contained in double angular brackets, such as <<kotlin vs java>>. | No Modification |" | Telugu | Bullet Point Format | ఈ ప్రకటన గురించి మీ అభిప్రాయం ఏమిటి: "యోగులు మాంత్రికుల కంటే గొప్పవారు ఎందుకంటే యోగులు తరచుగా శక్తిని కలిగి ఉంటారు కానీ విముక్తి చాలా ముఖ్యమైనదని తెలుసుకుని దానిని ఉపయోగించరు."? మీ సమాధానంలో కనీసం 30 వాక్యాలు మరియు సరిగ్గా 2 బుల్లెট పాయింట్లు ఉండాలి. అలాగే, కనీసం 8 ఖాళీలు చదరపు బ్రాకెట్లలో ఉండాలి, ఉదాహరణకు [address]. బుల్లెట్ పాయింట్లను ఈ విధంగా ఉపయోగించండి:* ఇది ఒక బుల్లెట్ పాయింట్ | What do you think about this statement: "Yogis are greater than sorcerers because yogis often possess power but choose not to use it, knowing that liberation is far more important."? Your response should contain at least 30 sentences and exactly 2 bullet points. Also, it must contain at least 8 placeholders represented by square brackets, such as [address]. Use the bullet points like:* This is a bullet point | What do you think about this statement: "Wizards are more powerful than sorcerers because they study magic instead of being born with it."? Your response should contain at least 30 sentences and exactly 2 bullet points. Also, it must contain at least 8 placeholders represented by square brackets, such as [address]. Use the bullet points like:* This is a bullet point |", ## IndicIFEval-Ground Synthetically generated instructions grounded in native Indic topics and content, manually verified by native speakers. Unlike translated prompts, these reflect naturalistic constraints with more real-world contexts. | Language | Constraint | Prompt | English Translation | |---|---|---|---| | Odia | Keyword Frequency | ୧୯୪୮ ମସିହାରେ କଟକ ଡିଭିଜନରେ ପ୍ରାଦେଶିକ ରାଜସ୍ୱ ବୋର୍ଡର ପ୍ରତିଷ୍ଠା ଉପରେ ଏକ বୈଷୟିକ ବିবରଣୀ ଲେଖନ୍ତୁ। ଏହି ବିবରଣୀରେ, ବୋର୍ଡର ମୁଖ୍ୟ କାର୍ଯ୍ୟ ଏବଂ ପୂର୍ବତନ ପ୍ରଶାସନିକ ବ୍ୟବସ୍ଥାରେ ହୋଇଥିବା ପରିବର୍ତ୍ତନ ଉପରେ ଆଲୋକପାତ କରନ୍ତୁ। ଆପଣଙ୍କ ଉତ୍ତରରେ 'ରହିଲା' ଶବ୍ଦଟି ଠିକ୍ ଦୁଇଥର ରହିବା ଆବଶ୍ୟକ। | Write a technical account on the establishment of Provincial Board of Revenue in Cuttack Division in 1948. In this detail, highlight the main functions of the Board and the changes in the previous administrative system. Your answer must contain the word 'remained' exactly twice. | | Kannada | Keyword Prohibition | ಬೆಂಗಳೂರಿನ ಪುರಸಭೆ ಕಚೇರಿಯಲ್ಲಿ ಹತಾಶೆಗೊಂಡಿರುವ ನಾಗರಿಕ, ಅರ್ಜುನ್, ಮತ್ತು ದಣಿದಿರುವ ಸರ್ಕಾರಿ ಅಧಿಕಾರಿ, ರಾವ್, ಇವರ ನಡುವೆ ಒಂದು ಸಣ್ಣ ಸಂಭಾಷಣೆಯನ್ನು ರಚಿಸಿ. ಅರ್ಜುನ್ ತನ್ನ ಹೊಸ ಮನೆ ನಿರ್ಮಾಣದ ಪರವಾನಗಿಗಾಗಿ ಆನ್‌ಲೈನ್‌ನಲ್ಲಿ ಸಲ್ಲಿಸಿದ ಅರ್ಜಿಯ ಸ್ಥಿತಿಯನ್ನು ತಿಳಿಯಲು ಬಂದಿದ್ದಾನೆ. ಆನ್‌ಲೈನ್ ವ್ಯವಸ್ಥೆಯು ಸ್ಥಗಿತಗೊಂಡಿರುವುದರಿಂದ, ಅರ್ಜಿಯನ್ನು ಕೈಯಾರೆ ಪುನಃ ಸಲ್ಲಿಸಬೇಕು ಎಂದು ರಾವ್ ವಿವರಿಸಬೇಕು. ಸಂಭಾಷಣೆಯ ಒಂದು ಪ್ರಮುಖ ಭಾಗದಲ್ಲಿ, ರಾವ್ ಅವರು ಹೊಸ ನಿಯಮದ ಪ್ರಕಾರ, ಅಧಿಕೃತ ಸಂವಹನದಲ್ಲಿ 'ಕಟ್ಟಡ' ಎಂಬ ಪದವನ್ನು ಬಳಸಬಾರದು ಎಂದು ಸ್ಪಷ್ಟವಾಗಿ ಒತ್ತಿಹೇಳಬೇಕು. ನಿಮ್ಮ ಉತ್ತರದಲ್ಲಿ 'ಕಟ್ಟಡ' ಎಂಬ ಪದವು ಎಲ್ಲೂ ಇರಬಾರದು. | Create a short conversation between a frustrated citizen, Arjun, and a tired government official, Rao, in a Bangalore municipal office. Arjun comes to know the status of his online application for construction permit for his new house. Rao should explain that since the online system is down, the application has to be re-submitted manually. In an important part of the conversation, Rao must clearly emphasize that the word 'building' should not be used in official communication as per the new rule. The word 'building' should not appear anywhere in your answer. | | Malayalam | First Word | കോഴിക്കോട് ജില്ലയിലെ ഗ്രാമീണ മേഖലകളിൽ വർദ്ധിച്ചുവരുന്ന മനുഷ്യ-വನ್ಯജീവി സംഘർഷം എന്ന വിഷയത്തിൽ ഒരു ഉപന്യാസം എഴുതുക. ഈ പ്രശ്നത്തിന്റെ സാമൂഹിക പ്രത്യാഘാതങ്ങളും പരിಹാര മാർഗ്ಗങ്ങളും ചർച്ച ചെയ്യണം. നിങ്ങളുടെ ഉപന്യാസം ഒരു പത്രത്തിലെ ഫീച്ചർ ലേഖനത്തിന്റെ രൂപത്തിലായിരിക്കണം, അതിനാൽ അത് 'കോഴിക്കോട്:' എന്ന വാക്കിൽ തന്നെ ആരംഭിക്കേണ്ടത് അത്യാവശ്യമാണ്. | Write an essay on the topic of increasing human-wildlife conflict in rural areas of Kozhikode district. Discuss the social implications of this problem and possible solutions. Your essay should be in the form of a feature article in a newspaper, so it is essential that it begins with the word 'Kozhikode:'. | | Urdu | Sentence Count | ہندوستان کی تحریک آزادی کے گمنام ہیروز پر ایک تاریخی مضمون کے لیے، اشفاق اللہ خان کے بارے میں ایک تحقیقی سوال تیار کریں۔ براہ کرم اپنا سوال صرف ایک جملے میں لکھیں۔ | For a historical essay on the unsung heroes of the Indian Independence Movement, prepare a research question about Ashfaqulla Khan. Please write your question in only one sentence. | | Gujarati | Paragraph Count | તમે એક ડિજિટલ સમાચાર પોર્ટલ માટે કન્ટેન્ટ એડિটর છો. તમને નીચેનો ડ્રાફ્ટ મળ્યો છે, જે એક હૃદયસ્પર્શી વાયરલ વીડિયોનું વર્ણನ કરે છે. આ લખાણ પુનરાವರ್તિત અને થોડું અવ્યವಸ್ಥಿತ છે. તમારું કાર્ય તેને એક જ, ಸುಸಂಗತ ಮತ್ತು ઔપચારિક ફકરામાં ફરીથી લખવાનું છે જે અમારા પોર્ટલ પર પ્રકાશિત કરવા માટે યોગ્ય હોય. <br><br>સોશિયલ મીડિયા પર એકથી વધુ વીડિયો વાયરલ થતા રહે છે. કેટલાક વીડિયો જોયા પછી આપણને ગમે છે, જ્યારે કેટલાક વીડિયો રિલેક્સ કરતા હોય છે. હાલમાં જ ઈન્સ્ટાગ್ರામ પર એક વીડિયો પોસ્ટ કરવામાં આવ્યો હતો અને હવે તે દરેકના દિલને સ્પર્શી જવાને કારણે ઘણો પોપ્યુલર થઈ રહ્યો છે. વીડિયોમાં જોઈ શકાય છે કે કેવી રીતે એક ખાદ્ય વિક્રેતા મોચીને મફતમાં ખાવાનું ખવડાવવાનું કહે છે. આ વીડિયો સોશિયલ મીડિયા પર વાયરલ થઈ રહ્યો છે, લોકો આ વીડિયોને ખૂબ પસંદ કરી રહ્યા છે. સોશિયલ મીડિયા પર લોકો આ વીડિયોને ખૂબ પસંદ કરી રહ્યા છે. આ વાયરલ વીડિયો ઈન્સ્ટાગ್ರામ પર શેર કરવામાં આવ્યો છે. લોકો આ વીડિયોને ખૂબ પસંદ કરી રહ્યા છે. આ વીડિયોને લાખો વ્યૂઝ મળ્યા છે. તો આ વીડિયો તમામ સોશિયલ મીડિયા પ્લેટફોર્મ પર શેર કરવામાં આવી રહ્યો છે. | You are a content editor for a digital news portal. You've got the draft below, which describes a heartwarming viral video. The text is repetitive and a bit disorganized. Your task is to rewrite it into a single, coherent and formal paragraph that is suitable for publishing on our portal.<br><br>More than one video keeps going viral on social media. After watching some videos we like it while some videos are relaxing. Recently a video was posted on Instagram and now it is becoming very popular as it has touched everyone's heart. In the video, one can see how a food vendor asks the cobbler to feed him free food. This video is going viral on social media and people are liking this video a lot. People are liking this video a lot on social media. This viral video has been shared on Instagram. People are liking this video a lot. This video has received millions of views. So this video is being shared on all social media platforms. | ## Citation ```bibtex @article{jayakumar2026indicifeval, title={IndicIFEval: A Benchmark for Verifiable Instruction-Following Evaluation in 14 Indic Languages}, author={Thanmay Jayakumar and Mohammed Safi Ur Rahman Khan and Raj Dabre and Ratish Puduppully and Anoop Kunchukuttan}, year={2026}, eprint={2602.22125}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2602.22125}, } ```
提供机构:
ai4bharat
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作