five

mhardalov/exams

收藏
Hugging Face2024-02-06 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/mhardalov/exams
下载链接
链接失效反馈
官方服务:
资源简介:
--- annotations_creators: - found language_creators: - found language: - ar - bg - de - es - fr - hr - hu - it - lt - mk - pl - pt - sq - sr - tr - vi license: - cc-by-sa-4.0 multilinguality: - monolingual - multilingual size_categories: - 10K<n<100K - 1K<n<10K - n<1K source_datasets: - original task_categories: - question-answering task_ids: - multiple-choice-qa paperswithcode_id: exams pretty_name: EXAMS config_names: - alignments - crosslingual_bg - crosslingual_hr - crosslingual_hu - crosslingual_it - crosslingual_mk - crosslingual_pl - crosslingual_pt - crosslingual_sq - crosslingual_sr - crosslingual_test - crosslingual_tr - crosslingual_vi - crosslingual_with_para_bg - crosslingual_with_para_hr - crosslingual_with_para_hu - crosslingual_with_para_it - crosslingual_with_para_mk - crosslingual_with_para_pl - crosslingual_with_para_pt - crosslingual_with_para_sq - crosslingual_with_para_sr - crosslingual_with_para_test - crosslingual_with_para_tr - crosslingual_with_para_vi - multilingual - multilingual_with_para dataset_info: - config_name: alignments features: - name: source_id dtype: string - name: target_id_list sequence: string splits: - name: full num_bytes: 1265256 num_examples: 10834 download_size: 184096 dataset_size: 1265256 - config_name: crosslingual_bg features: - name: id dtype: string - name: question struct: - name: stem dtype: string - name: choices sequence: - name: text dtype: string - name: label dtype: string - name: para dtype: string - name: answerKey dtype: string - name: info struct: - name: grade dtype: int32 - name: subject dtype: string - name: language dtype: string splits: - name: train num_bytes: 1077329 num_examples: 2344 - name: validation num_bytes: 281771 num_examples: 593 download_size: 514922 dataset_size: 1359100 - config_name: crosslingual_hr features: - name: id dtype: string - name: question struct: - name: stem dtype: string - name: choices sequence: - name: text dtype: string - name: label dtype: string - name: para dtype: string - name: answerKey dtype: string - name: info struct: - name: grade dtype: int32 - name: subject dtype: string - name: language dtype: string splits: - name: train num_bytes: 807104 num_examples: 2341 - name: validation num_bytes: 176594 num_examples: 538 download_size: 450090 dataset_size: 983698 - config_name: crosslingual_hu features: - name: id dtype: string - name: question struct: - name: stem dtype: string - name: choices sequence: - name: text dtype: string - name: label dtype: string - name: para dtype: string - name: answerKey dtype: string - name: info struct: - name: grade dtype: int32 - name: subject dtype: string - name: language dtype: string splits: - name: train num_bytes: 677535 num_examples: 1731 - name: validation num_bytes: 202012 num_examples: 536 download_size: 401455 dataset_size: 879547 - config_name: crosslingual_it features: - name: id dtype: string - name: question struct: - name: stem dtype: string - name: choices sequence: - name: text dtype: string - name: label dtype: string - name: para dtype: string - name: answerKey dtype: string - name: info struct: - name: grade dtype: int32 - name: subject dtype: string - name: language dtype: string splits: - name: train num_bytes: 399312 num_examples: 1010 - name: validation num_bytes: 93175 num_examples: 246 download_size: 226376 dataset_size: 492487 - config_name: crosslingual_mk features: - name: id dtype: string - name: question struct: - name: stem dtype: string - name: choices sequence: - name: text dtype: string - name: label dtype: string - name: para dtype: string - name: answerKey dtype: string - name: info struct: - name: grade dtype: int32 - name: subject dtype: string - name: language dtype: string splits: - name: train num_bytes: 825702 num_examples: 1665 - name: validation num_bytes: 204318 num_examples: 410 download_size: 394548 dataset_size: 1030020 - config_name: crosslingual_pl features: - name: id dtype: string - name: question struct: - name: stem dtype: string - name: choices sequence: - name: text dtype: string - name: label dtype: string - name: para dtype: string - name: answerKey dtype: string - name: info struct: - name: grade dtype: int32 - name: subject dtype: string - name: language dtype: string splits: - name: train num_bytes: 573410 num_examples: 1577 - name: validation num_bytes: 141633 num_examples: 394 download_size: 341925 dataset_size: 715043 - config_name: crosslingual_pt features: - name: id dtype: string - name: question struct: - name: stem dtype: string - name: choices sequence: - name: text dtype: string - name: label dtype: string - name: para dtype: string - name: answerKey dtype: string - name: info struct: - name: grade dtype: int32 - name: subject dtype: string - name: language dtype: string splits: - name: train num_bytes: 374798 num_examples: 740 - name: validation num_bytes: 87714 num_examples: 184 download_size: 208021 dataset_size: 462512 - config_name: crosslingual_sq features: - name: id dtype: string - name: question struct: - name: stem dtype: string - name: choices sequence: - name: text dtype: string - name: label dtype: string - name: para dtype: string - name: answerKey dtype: string - name: info struct: - name: grade dtype: int32 - name: subject dtype: string - name: language dtype: string splits: - name: train num_bytes: 423744 num_examples: 1194 - name: validation num_bytes: 110093 num_examples: 311 download_size: 247052 dataset_size: 533837 - config_name: crosslingual_sr features: - name: id dtype: string - name: question struct: - name: stem dtype: string - name: choices sequence: - name: text dtype: string - name: label dtype: string - name: para dtype: string - name: answerKey dtype: string - name: info struct: - name: grade dtype: int32 - name: subject dtype: string - name: language dtype: string splits: - name: train num_bytes: 649560 num_examples: 1323 - name: validation num_bytes: 145724 num_examples: 314 download_size: 327466 dataset_size: 795284 - config_name: crosslingual_test features: - name: id dtype: string - name: question struct: - name: stem dtype: string - name: choices sequence: - name: text dtype: string - name: label dtype: string - name: para dtype: string - name: answerKey dtype: string - name: info struct: - name: grade dtype: int32 - name: subject dtype: string - name: language dtype: string splits: - name: test num_bytes: 8402575 num_examples: 19736 download_size: 3438526 dataset_size: 8402575 - config_name: crosslingual_tr features: - name: id dtype: string - name: question struct: - name: stem dtype: string - name: choices sequence: - name: text dtype: string - name: label dtype: string - name: para dtype: string - name: answerKey dtype: string - name: info struct: - name: grade dtype: int32 - name: subject dtype: string - name: language dtype: string splits: - name: train num_bytes: 717599 num_examples: 1571 - name: validation num_bytes: 182730 num_examples: 393 download_size: 440914 dataset_size: 900329 - config_name: crosslingual_vi features: - name: id dtype: string - name: question struct: - name: stem dtype: string - name: choices sequence: - name: text dtype: string - name: label dtype: string - name: para dtype: string - name: answerKey dtype: string - name: info struct: - name: grade dtype: int32 - name: subject dtype: string - name: language dtype: string splits: - name: train num_bytes: 953167 num_examples: 1955 - name: validation num_bytes: 231976 num_examples: 488 download_size: 462940 dataset_size: 1185143 - config_name: crosslingual_with_para_bg features: - name: id dtype: string - name: question struct: - name: stem dtype: string - name: choices sequence: - name: text dtype: string - name: label dtype: string - name: para dtype: string - name: answerKey dtype: string - name: info struct: - name: grade dtype: int32 - name: subject dtype: string - name: language dtype: string splits: - name: train num_bytes: 47066808 num_examples: 2344 - name: validation num_bytes: 11916026 num_examples: 593 download_size: 15794611 dataset_size: 58982834 - config_name: crosslingual_with_para_hr features: - name: id dtype: string - name: question struct: - name: stem dtype: string - name: choices sequence: - name: text dtype: string - name: label dtype: string - name: para dtype: string - name: answerKey dtype: string - name: info struct: - name: grade dtype: int32 - name: subject dtype: string - name: language dtype: string splits: - name: train num_bytes: 24889604 num_examples: 2341 - name: validation num_bytes: 5695066 num_examples: 538 download_size: 9839452 dataset_size: 30584670 - config_name: crosslingual_with_para_hu features: - name: id dtype: string - name: question struct: - name: stem dtype: string - name: choices sequence: - name: text dtype: string - name: label dtype: string - name: para dtype: string - name: answerKey dtype: string - name: info struct: - name: grade dtype: int32 - name: subject dtype: string - name: language dtype: string splits: - name: train num_bytes: 19035663 num_examples: 1731 - name: validation num_bytes: 6043265 num_examples: 536 download_size: 9263625 dataset_size: 25078928 - config_name: crosslingual_with_para_it features: - name: id dtype: string - name: question struct: - name: stem dtype: string - name: choices sequence: - name: text dtype: string - name: label dtype: string - name: para dtype: string - name: answerKey dtype: string - name: info struct: - name: grade dtype: int32 - name: subject dtype: string - name: language dtype: string splits: - name: train num_bytes: 16409235 num_examples: 1010 - name: validation num_bytes: 4018329 num_examples: 246 download_size: 6907617 dataset_size: 20427564 - config_name: crosslingual_with_para_mk features: - name: id dtype: string - name: question struct: - name: stem dtype: string - name: choices sequence: - name: text dtype: string - name: label dtype: string - name: para dtype: string - name: answerKey dtype: string - name: info struct: - name: grade dtype: int32 - name: subject dtype: string - name: language dtype: string splits: - name: train num_bytes: 38445894 num_examples: 1665 - name: validation num_bytes: 9673574 num_examples: 410 download_size: 12878474 dataset_size: 48119468 - config_name: crosslingual_with_para_pl features: - name: id dtype: string - name: question struct: - name: stem dtype: string - name: choices sequence: - name: text dtype: string - name: label dtype: string - name: para dtype: string - name: answerKey dtype: string - name: info struct: - name: grade dtype: int32 - name: subject dtype: string - name: language dtype: string splits: - name: train num_bytes: 16373781 num_examples: 1577 - name: validation num_bytes: 4158832 num_examples: 394 download_size: 6539172 dataset_size: 20532613 - config_name: crosslingual_with_para_pt features: - name: id dtype: string - name: question struct: - name: stem dtype: string - name: choices sequence: - name: text dtype: string - name: label dtype: string - name: para dtype: string - name: answerKey dtype: string - name: info struct: - name: grade dtype: int32 - name: subject dtype: string - name: language dtype: string splits: - name: train num_bytes: 12185383 num_examples: 740 - name: validation num_bytes: 3093712 num_examples: 184 download_size: 4956969 dataset_size: 15279095 - config_name: crosslingual_with_para_sq features: - name: id dtype: string - name: question struct: - name: stem dtype: string - name: choices sequence: - name: text dtype: string - name: label dtype: string - name: para dtype: string - name: answerKey dtype: string - name: info struct: - name: grade dtype: int32 - name: subject dtype: string - name: language dtype: string splits: - name: train num_bytes: 17341277 num_examples: 1194 - name: validation num_bytes: 4449952 num_examples: 311 download_size: 7112236 dataset_size: 21791229 - config_name: crosslingual_with_para_sr features: - name: id dtype: string - name: question struct: - name: stem dtype: string - name: choices sequence: - name: text dtype: string - name: label dtype: string - name: para dtype: string - name: answerKey dtype: string - name: info struct: - name: grade dtype: int32 - name: subject dtype: string - name: language dtype: string splits: - name: train num_bytes: 24575845 num_examples: 1323 - name: validation num_bytes: 5772509 num_examples: 314 download_size: 8035415 dataset_size: 30348354 - config_name: crosslingual_with_para_test features: - name: id dtype: string - name: question struct: - name: stem dtype: string - name: choices sequence: - name: text dtype: string - name: label dtype: string - name: para dtype: string - name: answerKey dtype: string - name: info struct: - name: grade dtype: int32 - name: subject dtype: string - name: language dtype: string splits: - name: test num_bytes: 207974374 num_examples: 13510 download_size: 62878029 dataset_size: 207974374 - config_name: crosslingual_with_para_tr features: - name: id dtype: string - name: question struct: - name: stem dtype: string - name: choices sequence: - name: text dtype: string - name: label dtype: string - name: para dtype: string - name: answerKey dtype: string - name: info struct: - name: grade dtype: int32 - name: subject dtype: string - name: language dtype: string splits: - name: train num_bytes: 18597131 num_examples: 1571 - name: validation num_bytes: 4763097 num_examples: 393 download_size: 7346658 dataset_size: 23360228 - config_name: crosslingual_with_para_vi features: - name: id dtype: string - name: question struct: - name: stem dtype: string - name: choices sequence: - name: text dtype: string - name: label dtype: string - name: para dtype: string - name: answerKey dtype: string - name: info struct: - name: grade dtype: int32 - name: subject dtype: string - name: language dtype: string splits: - name: train num_bytes: 40882999 num_examples: 1955 - name: validation num_bytes: 10260374 num_examples: 488 download_size: 13028078 dataset_size: 51143373 - config_name: multilingual features: - name: id dtype: string - name: question struct: - name: stem dtype: string - name: choices sequence: - name: text dtype: string - name: label dtype: string - name: para dtype: string - name: answerKey dtype: string - name: info struct: - name: grade dtype: int32 - name: subject dtype: string - name: language dtype: string splits: - name: train num_bytes: 3381837 num_examples: 7961 - name: validation num_bytes: 1141687 num_examples: 2672 - name: test num_bytes: 5746781 num_examples: 13510 download_size: 4323915 dataset_size: 10270305 - config_name: multilingual_with_para features: - name: id dtype: string - name: question struct: - name: stem dtype: string - name: choices sequence: - name: text dtype: string - name: label dtype: string - name: para dtype: string - name: answerKey dtype: string - name: info struct: - name: grade dtype: int32 - name: subject dtype: string - name: language dtype: string splits: - name: train num_bytes: 127294567 num_examples: 7961 - name: validation num_bytes: 42711689 num_examples: 2672 - name: test num_bytes: 207974374 num_examples: 13510 download_size: 112597818 dataset_size: 377980630 configs: - config_name: alignments data_files: - split: full path: alignments/full-* - config_name: crosslingual_bg data_files: - split: train path: crosslingual_bg/train-* - split: validation path: crosslingual_bg/validation-* - config_name: crosslingual_hr data_files: - split: train path: crosslingual_hr/train-* - split: validation path: crosslingual_hr/validation-* - config_name: crosslingual_hu data_files: - split: train path: crosslingual_hu/train-* - split: validation path: crosslingual_hu/validation-* - config_name: crosslingual_it data_files: - split: train path: crosslingual_it/train-* - split: validation path: crosslingual_it/validation-* - config_name: crosslingual_mk data_files: - split: train path: crosslingual_mk/train-* - split: validation path: crosslingual_mk/validation-* - config_name: crosslingual_pl data_files: - split: train path: crosslingual_pl/train-* - split: validation path: crosslingual_pl/validation-* - config_name: crosslingual_pt data_files: - split: train path: crosslingual_pt/train-* - split: validation path: crosslingual_pt/validation-* - config_name: crosslingual_sq data_files: - split: train path: crosslingual_sq/train-* - split: validation path: crosslingual_sq/validation-* - config_name: crosslingual_sr data_files: - split: train path: crosslingual_sr/train-* - split: validation path: crosslingual_sr/validation-* - config_name: crosslingual_test data_files: - split: test path: crosslingual_test/test-* - config_name: crosslingual_tr data_files: - split: train path: crosslingual_tr/train-* - split: validation path: crosslingual_tr/validation-* - config_name: crosslingual_vi data_files: - split: train path: crosslingual_vi/train-* - split: validation path: crosslingual_vi/validation-* - config_name: crosslingual_with_para_bg data_files: - split: train path: crosslingual_with_para_bg/train-* - split: validation path: crosslingual_with_para_bg/validation-* - config_name: crosslingual_with_para_hr data_files: - split: train path: crosslingual_with_para_hr/train-* - split: validation path: crosslingual_with_para_hr/validation-* - config_name: crosslingual_with_para_hu data_files: - split: train path: crosslingual_with_para_hu/train-* - split: validation path: crosslingual_with_para_hu/validation-* - config_name: crosslingual_with_para_it data_files: - split: train path: crosslingual_with_para_it/train-* - split: validation path: crosslingual_with_para_it/validation-* - config_name: crosslingual_with_para_mk data_files: - split: train path: crosslingual_with_para_mk/train-* - split: validation path: crosslingual_with_para_mk/validation-* - config_name: crosslingual_with_para_pl data_files: - split: train path: crosslingual_with_para_pl/train-* - split: validation path: crosslingual_with_para_pl/validation-* - config_name: crosslingual_with_para_pt data_files: - split: train path: crosslingual_with_para_pt/train-* - split: validation path: crosslingual_with_para_pt/validation-* - config_name: crosslingual_with_para_sq data_files: - split: train path: crosslingual_with_para_sq/train-* - split: validation path: crosslingual_with_para_sq/validation-* - config_name: crosslingual_with_para_sr data_files: - split: train path: crosslingual_with_para_sr/train-* - split: validation path: crosslingual_with_para_sr/validation-* - config_name: crosslingual_with_para_test data_files: - split: test path: crosslingual_with_para_test/test-* - config_name: crosslingual_with_para_tr data_files: - split: train path: crosslingual_with_para_tr/train-* - split: validation path: crosslingual_with_para_tr/validation-* - config_name: crosslingual_with_para_vi data_files: - split: train path: crosslingual_with_para_vi/train-* - split: validation path: crosslingual_with_para_vi/validation-* - config_name: multilingual data_files: - split: train path: multilingual/train-* - split: validation path: multilingual/validation-* - split: test path: multilingual/test-* - config_name: multilingual_with_para data_files: - split: train path: multilingual_with_para/train-* - split: validation path: multilingual_with_para/validation-* - split: test path: multilingual_with_para/test-* default: true --- # Dataset Card for [Dataset Name] ## Table of Contents - [Dataset Description](#dataset-description) - [Dataset Summary](#dataset-summary) - [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards) - [Languages](#languages) - [Dataset Structure](#dataset-structure) - [Data Instances](#data-instances) - [Data Fields](#data-fields) - [Data Splits](#data-splits) - [Dataset Creation](#dataset-creation) - [Curation Rationale](#curation-rationale) - [Source Data](#source-data) - [Annotations](#annotations) - [Personal and Sensitive Information](#personal-and-sensitive-information) - [Considerations for Using the Data](#considerations-for-using-the-data) - [Social Impact of Dataset](#social-impact-of-dataset) - [Discussion of Biases](#discussion-of-biases) - [Other Known Limitations](#other-known-limitations) - [Additional Information](#additional-information) - [Dataset Curators](#dataset-curators) - [Licensing Information](#licensing-information) - [Citation Information](#citation-information) - [Contributions](#contributions) ## Dataset Description - **Repository:** https://github.com/mhardalov/exams-qa - **Paper:** [EXAMS: A Multi-Subject High School Examinations Dataset for Cross-Lingual and Multilingual Question Answering](https://arxiv.org/abs/2011.03080) - **Point of Contact:** [hardalov@@fmi.uni-sofia.bg](hardalov@@fmi.uni-sofia.bg) ### Dataset Summary EXAMS is a benchmark dataset for multilingual and cross-lingual question answering from high school examinations. It consists of more than 24,000 high-quality high school exam questions in 16 languages, covering 8 language families and 24 school subjects from Natural Sciences and Social Sciences, among others. ### Supported Tasks and Leaderboards [More Information Needed] ### Languages The languages in the dataset are: - ar - bg - de - es - fr - hr - hu - it - lt - mk - pl - pt - sq - sr - tr - vi ## Dataset Structure ### Data Instances An example of a data instance (with support paragraphs, in Bulgarian) is: ``` {'answerKey': 'C', 'id': '35dd6b52-7e71-11ea-9eb1-54bef70b159e', 'info': {'grade': 12, 'language': 'Bulgarian', 'subject': 'Biology'}, 'question': {'choices': {'label': ['A', 'B', 'C', 'D'], 'para': ['Това води до наследствени изменения между организмите. Мирновременните вождове са наследствени. Черният, сивият и кафявият цвят на оцветяване на тялото се определя от пигмента меланин и възниква в резултат на наследствени изменения. Тези различия, според Монтескьо, не са наследствени. Те са и важни наследствени вещи в клана. Те са били наследствени архонти и управляват демократично. Реликвите са исторически, религиозни, семейни (наследствени) и технически. Общо са направени 800 изменения. Не всички наследствени аномалии на хемоглобина са вредни, т.е. Моногенните наследствени болести, които водят до мигрена, са редки. Няма наследствени владетели. Повечето от тях са наследствени и се предават на потомството. Всичките синове са ерцхерцози на всичките наследствени земи и претенденти. През 1509 г. Фраунбергите са издигнати на наследствени имперски графове. Фамилията Валдбург заради постиженията са номинирани на „наследствени имперски трушсеси“. Фамилията Валдбург заради постиженията са номинирани на „наследствени имперски трушсеси“. Описани са единични наследствени случаи, но по-често липсва фамилна обремененост. Позициите им са наследствени и се предават в рамките на клана. Внесени са изменения в конструкцията на веригите. и са направени изменения в ходовата част. На храма са правени лоши архитектурни изменения. Изменения са предприети и вътре в двореца. Имало двама наследствени вождове. Имало двама наследствени вождове. Годишният календар, „компасът“ и биологичния часовник са наследствени и при много бозайници.', 'Постепенно задълбочаващите се функционални изменения довеждат и до структурни изменения. Те се дължат както на растягането на кожата, така и на въздействието на хормоналните изменения върху кожната тъкан. тези изменения се долавят по-ясно. Впоследствие, той претърпява изменения. Ширината остава без изменения. След тяхното издаване се налагат изменения в първоначалния Кодекс, защото не е съобразен с направените в Дигестите изменения. Еволюционният преход се характеризира със следните изменения: Наблюдават се и сезонни изменения в теглото. Приемат се изменения и допълнения към Устава. Тук се размножават и предизвикват възпалителни изменения. Общо са направени 800 изменения. Бронирането не претърпява съществени изменения. При животните се откриват изменения при злокачествената форма. Срещат се и дегенеративни изменения в семенните каналчета. ТАВКР „Баку“ се строи по изменения проект 1143.4. Трансът се съпровожда с определени изменения на мозъчната дейност. На изменения е подложен и Светия Синод. Внесени са изменения в конструкцията на веригите. На храма са правени лоши архитектурни изменения. Оттогава стиховете претърпяват изменения няколко пъти. Настъпват съществени изменения в музикалната култура. По-късно той претърпява леки изменения. Настъпват съществени изменения в музикалната култура. Претърпява сериозни изменения само носовата надстройка. Хоризонталното брониране е оставено без изменения.', 'Модификациите са обратими. Тези реакции са обратими. В началните стадии тези натрупвания са обратими. Всички такива ефекти са временни и обратими. Много от реакциите са обратими и идентични с тези при гликолизата. Ако в обращение има книжни пари, те са обратими в злато при поискване . Общо са направени 800 изменения. Непоследователността е представена от принципа на "симетрия", при който взаимоотношенията са разглеждани като симетрични или обратими. Откакто формулите в клетките на електронната таблица не са обратими, тази техника е с ограничена стойност. Ефектът на Пелтие-Зеебек и ефектът Томсън са обратими (ефектът на Пелтие е обратен на ефекта на Зеебек). Плазмолизата протича в три етапа, в зависимост от силата и продължителността на въздействието:\n\nПървите два етапа са обратими. Внесени са изменения в конструкцията на веригите. и са направени изменения в ходовата част. На храма са правени лоши архитектурни изменения. Изменения са предприети и вътре в двореца. Оттогава насетне екипите не са претърпявали съществени изменения. Изменения са направени и в колесника на машината. Тези изменения са обявени през октомври 1878 година. Последните изменения са внесени през януари 2009 година. В процеса на последващото проектиране са внесени някои изменения. Сериозните изменения са в края на Втората световна война. Внесени са изменения в конструкцията на погребите и подемниците. Внесени са изменения в конструкцията на погребите и подемниците. Внесени са изменения в конструкцията на погребите и подемниците. Постепенно задълбочаващите се функционални изменения довеждат и до структурни изменения.', 'Ерозионни процеси от масов характер липсват. Обновлението в редиците на партията приема масов характер. Тя обаче няма масов характер поради спецификата на формата. Движението против десятъка придобива масов характер и в Балчишка околия. Понякога екзекутирането на „обсебените от Сатана“ взимало невероятно масов характер. Укриването на дължими като наряд продукти в селата придобива масов характер. Периодичните миграции са в повечето случаи с масов характер и са свързани със сезонните изменения в природата, а непериодичните са премествания на животни, които настъпват след пожари, замърсяване на средата, висока численост и др. Имат необратим характер. Именно по време на двувековните походи на западните рицари използването на гербовете придобива масов характер. След присъединяването на Южен Кавказ към Русия, изселването на азербайджанци от Грузия придобива масов характер. Те имат нормативен характер. Те имат установителен характер. Освобождаването на работна сила обикновено има масов характер, защото обхваща големи контингенти от носителите на труд. Валежите имат подчертано континентален характер. Имат най-често издънков характер. Приливите имат предимно полуденонощен характер. Някои от тях имат мистериален характер. Тези сведения имат случаен, епизодичен характер. Те имат сезонен или годишен характер. Временните обезпечителни мерки имат временен характер. Други имат пожелателен характер (Здравко, Слава). Ловът и събирачеството имат спомагателен характер. Фактически успяват само малко да усилят бронирането на артилерийските погреби, другите изменения носят само частен характер. Някои карикатури имат само развлекателен характер, докато други имат политически нюанси. Поемите на Хезиод имат по-приложен характер.'], 'text': ['дължат се на фенотипни изменения', 'имат масов характер', 'са наследствени', 'са обратими']}, 'stem': 'Мутационите изменения:'}} ``` ### Data Fields A data instance contains the following fields: - `id`: A question ID, unique across the dataset - `question`: the question contains the following: - `stem`: a stemmed representation of the question textual - `choices`: a set of 3 to 5 candidate answers, which each have: - `text`: the text of the answers - `label`: a label in `['A', 'B', 'C', 'D', 'E']` used to match to the `answerKey` - `para`: (optional) a supported paragraph from Wikipedia in the same language as the question and answer - `answerKey`: the key corresponding to the right answer's `label` - `info`: some additional information on the question including: - `grade`: the school grade for the exam this question was taken from - `subject`: a free text description of the academic subject - `language`: the English name of the language for this question ### Data Splits Depending on the configuration, the dataset have different splits: - "alignments": a single "full" split - "multilingual" and "multilingual_with_para": "train", "validation" and "test" splits - "crosslingual_test" and "crosslingual_with_para_test": a single "test" split - the rest of crosslingual configurations: "train" and "validation" splits ## Dataset Creation ### Curation Rationale [More Information Needed] ### Source Data #### Initial Data Collection and Normalization Eχαµs was collected from official state exams prepared by the ministries of education of various countries. These exams are taken by students graduating from high school, and often require knowledge learned through the entire course. The questions cover a large variety of subjects and material based on the country’s education system. They cover major school subjects such as Biology, Chemistry, Geography, History, and Physics, but we also highly specialized ones such as Agriculture, Geology, Informatics, as well as some applied and profiled studies. Some countries allow students to take official examinations in several languages. This dataset provides 9,857 parallel question pairs spread across seven languages coming from Croatia (Croatian, Serbian, Italian, Hungarian), Hungary (Hungarian, German, French, Spanish, Croatian, Serbian, Italian), and North Macedonia (Macedonian, Albanian, Turkish). For all languages in the dataset, the first step in the process of data collection was to download the PDF files per year, per subject, and per language (when parallel languages were available in the same source), convert the PDF files to text, and select those that were well formatted and followed the document structure. Then, Regular Expressions (RegEx) were used to parse the questions, their corresponding choices and the correct answer choice. In order to ensure that all our questions are answerable using textual input only, questions that contained visual information were removed, as selected by using curated list of words such as map, table, picture, graph, etc., in the corresponding language. #### Who are the source language producers? [More Information Needed] ### Annotations #### Annotation process [More Information Needed] #### Who are the annotators? [More Information Needed] ### Personal and Sensitive Information [More Information Needed] ## Considerations for Using the Data ### Social Impact of Dataset [More Information Needed] ### Discussion of Biases [More Information Needed] ### Other Known Limitations [More Information Needed] ## Additional Information ### Dataset Curators [More Information Needed] ### Licensing Information The dataset, which contains paragraphs from Wikipedia, is licensed under CC-BY-SA 4.0. The code in this repository is licensed according the [LICENSE file](https://raw.githubusercontent.com/mhardalov/exams-qa/main/LICENSE). ### Citation Information ``` @inproceedings{hardalov-etal-2020-exams, title = "{EXAMS}: A Multi-subject High School Examinations Dataset for Cross-lingual and Multilingual Question Answering", author = "Hardalov, Momchil and Mihaylov, Todor and Zlatkova, Dimitrina and Dinkov, Yoan and Koychev, Ivan and Nakov, Preslav", editor = "Webber, Bonnie and Cohn, Trevor and He, Yulan and Liu, Yang", booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)", month = nov, year = "2020", address = "Online", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2020.emnlp-main.438", doi = "10.18653/v1/2020.emnlp-main.438", pages = "5427--5444", } ``` ### Contributions Thanks to [@yjernite](https://github.com/yjernite) for adding this dataset.
提供机构:
mhardalov
原始信息汇总

数据集概述

基本信息

  • 名称: EXAMS
  • 语言: 支持多种语言,包括阿拉伯语(ar)、保加利亚语(bg)、德语(de)、西班牙语(es)、法语(fr)、克罗地亚语(hr)、匈牙利语(hu)、意大利语(it)、立陶宛语(lt)、马其顿语(mk)、波兰语(pl)、葡萄牙语(pt)、阿尔巴尼亚语(sq)、塞尔维亚语(sr)、土耳其语(tr)、越南语(vi)等。
  • 许可证: cc-by-sa-4.0
  • 多语言性: 支持单语和多语种数据。

数据集大小

  • 规模: 包含多个大小类别,包括小于1K、1K到10K、10K到100K的记录。

数据来源

  • 源数据集: 原始数据。

任务类型

  • 任务类别: 问答(question-answering)
  • 具体任务: 多项选择问答(multiple-choice-qa)

配置信息

  • 配置名称: 包括alignments、多种跨语言配置(如crosslingual_bg、crosslingual_hr等)以及多语言配置(multilingual、multilingual_with_para)。

数据集结构

  • 特征: 每个配置下的数据集特征包括ID、问题(包含stem和choices)、answerKey、info(包含grade、subject、language)等。
  • 分割: 数据集通常分为训练集(train)和验证集(validation),部分配置还包括测试集(test)。
  • 大小: 每个配置的数据集大小不同,从几千字节到数百万字节不等。

示例配置详情

  • 配置名称: crosslingual_bg

    • 特征: 包括ID、问题(stem和choices)、answerKey、info(grade、subject、language)。
    • 分割: 训练集(train)和验证集(validation)。
    • 大小: 训练集约1077329字节,验证集约281771字节。
  • 配置名称: multilingual

    • 特征: 同上。
    • 分割: 训练集(train)、验证集(validation)和测试集(test)。
    • 大小: 训练集约3381837字节,验证集约1141687字节,测试集约5746781字节。

结论

EXAMS数据集是一个多语言、多配置的问答数据集,适用于多项选择问答任务的训练和评估。

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作