five

taeminlee/CLIcK

收藏
Hugging Face2024-05-21 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/taeminlee/CLIcK
下载链接
链接失效反馈
官方服务:
资源简介:
--- task_categories: - multiple-choice language: - ko tags: - Culture - Language size_categories: - 1K<n<10K configs: - config_name: KL_Grammar data_files: - path: - Dataset/Language/Grammar/Grammar_CSAT.json - Dataset/Language/Grammar/Grammar_TOPIK.json - Dataset/Language/Grammar/Grammar_Kedu.json split: test - config_name: KL_Textual data_files: - path: - Dataset/Language/Textual/Textual_TOPIK.json - Dataset/Language/Textual/Textual_CSAT.json split: test - config_name: KL_Functional data_files: - path: - Dataset/Language/Functional/Functional_Kedu.json - Dataset/Language/Functional/Functional_PSE.json - Dataset/Language/Functional/Functional_CSAT.json split: test - config_name: KC_Law data_files: - path: - Dataset/Culture/Korean Law/Law_KIIP.json - Dataset/Culture/Korean Law/Law_PSAT.json split: test - config_name: KC_Popular data_files: - path: - Dataset/Culture/Korean Popular/Popular_Kedu.json - Dataset/Culture/Korean Popular/Popular_KIIP.json split: test - config_name: KC_Politics data_files: - path: - Dataset/Culture/Korean Politics/Politics_Kedu.json - Dataset/Culture/Korean Politics/Politics_KIIP.json split: test - config_name: KC_Geography data_files: - path: - Dataset/Culture/Korean Geography/Geography_KIIP.json - Dataset/Culture/Korean Geography/Geography_Kedu.json - Dataset/Culture/Korean Geography/Geography_CSAT.json split: test - config_name: KC_Economy data_files: - path: - Dataset/Culture/Korean Economy/Economy_KIIP.json - Dataset/Culture/Korean Economy/Economy_Kedu.json split: test - config_name: KC_History data_files: - path: - Dataset/Culture/Korean History/History_Kedu.json - Dataset/Culture/Korean History/History_PSE.json - Dataset/Culture/Korean History/History_KHB.json split: test - config_name: KC_Society data_files: - path: - Dataset/Culture/Korean Society/Society_Kedu.json - Dataset/Culture/Korean Society/Society_KIIP.json split: test - config_name: KC_Tradition data_files: - path: - Dataset/Culture/Korean Tradition/Tradition_Kedu.json - Dataset/Culture/Korean Tradition/Tradition_KIIP.json split: test --- ## This dataset is the same as https://huggingface.co/datasets/EunsuKim/CLIcK. This dataset has been subdivided for simplified viewing and evaluation. <div align="center"> <h1>CLIcK 🇰🇷🧠</h1> <p>Evaluation of Cultural and Linguistic Intelligence in Korean</p> <p> <a href="https://huggingface.co/datasets/your_username/CLIcK"><img src="https://img.shields.io/badge/Dataset-CLIcK-blue" alt="Dataset"></a> <a href="https://arxiv.org/abs/2403.06412"><img src="https://img.shields.io/badge/Paper-LREC--COLING-green" alt="Paper"></a> </p> </div> ## Introduction 🎉 CLIcK (Cultural and Linguistic Intelligence in Korean) is a comprehensive dataset designed to evaluate cultural and linguistic intelligence in the context of Korean language models. In an era where diverse language models are continually emerging, there is a pressing need for robust evaluation datasets, especially for non-English languages like Korean. CLIcK fills this gap by providing a rich, well-categorized dataset focusing on both cultural and linguistic aspects, enabling a nuanced assessment of Korean language models. ## News 📰 - **[LREC-COLING]** Our paper introducing CLIcK has been accepted to LREC-COLING 2024!🎉 ## Dataset Description 📊 The CLIcK benchmark comprises two broad categories: Culture and Language, which are further divided into 11 fine-grained subcategories. ### Categories 📂 - **Language** 🗣️ - Textual Knowledge - Grammatical Knowledge - Functional Knowledge - **Culture** 🌍 - Korean Society - Korean Tradition - Korean Politics - Korean Economy - Korean Law - Korean History - Korean Geography - Korean Popular Culture (K-Pop) ### Construction 🏗️ CLIcK was developed using two human-centric approaches: 1. Reclassification of **official and well-designed exam data** into our defined categories. 2. Generation of questions using ChatGPT, based on **official educational materials** from the Korean Ministry of Justice, followed by our own validation process. ### Structure 🏛️ The dataset is organized as follows, with each subcategory containing relevant JSON files: ``` 📦CLIcK └─ Dataset ├─ Culture │ ├─ [Each cultural subcategory with associated JSON files] └─ Language ├─ [Each language subcategory with associated JSON files] ``` ### Exam Code Descriptions 📜 - KIIP: Korea Immigration & Integration Program ([Website](www.immigration.go.kr)) - CSAT: College Scholastic Ability Test for Korean ([Website](https://www.suneung.re.kr/)) - Kedu: Test of Teaching Korean as a Foreign Language exams ([Website](https://www.q-net.or.kr/man001.do?gSite=L&gId=36)) - PSE: Public Service Exam for 9th grade - TOPIK: Test of Proficiency in Korean ([Website](https://www.topik.go.kr/)) - KHB: Korean History Exam Basic ([Website](https://www.historyexam.go.kr/)) - PSAT: Public Service Aptitude Test in Korea ## Results | Models | Average Accuracy (Korean Culture) | Average Accuracy (Korean Language) | |-------------------|-----------------------------------|------------------------------------| | Polyglot-Ko 1.3B | 32.71% | 22.88% | | Polyglot-Ko 3.8B | 32.90% | 22.38% | | Polyglot-Ko 5.8B | 33.14% | 23.27% | | Polyglot-Ko 12.8B | 33.40% | 22.24% | | KULLM 5.8B | 33.79% | 23.50% | | KULLM 12.8B | 33.51% | 23.78% | | KoAlpaca 5.8B | 32.33% | 23.87% | | KoAlpaca 12.8B | 33.80% | 22.42% | | LLaMA-Ko 7B | 33.26% | 25.69% | | LLaMA 7B | 35.44% | 27.17% | | LLaMA 13B | **36.22%** | **26.71%** | | GPT-3.5 | 49.30% | 42.32% | | Claude2 | **51.72%** | **45.39%** | ## Dataset Link 🔗 The CLIcK dataset is available on the Hugging Face Hub: [CLIcK Dataset](https://huggingface.co/datasets/your_username/CLIcK) ## Citation 📝 If you use CLIcK in your research, please cite our paper: ```bibtex @misc{kim2024click, title={CLIcK: A Benchmark Dataset of Cultural and Linguistic Intelligence in Korean}, author={Eunsu Kim and Juyoung Suk and Philhoon Oh and Haneul Yoo and James Thorne and Alice Oh}, year={2024}, eprint={2403.06412}, archivePrefix={arXiv}, primaryClass={cs.CL} } ``` ## Contact 📧 For any questions or inquiries, please contact [kes0317@kaist.ac.kr](mailto:kes0317@kaist.ac.kr).
提供机构:
taeminlee
原始信息汇总

数据集概述

任务类别

  • 多选题

语言

  • 韩语

标签

  • 文化
  • 语言

大小类别

  • 1K<n<10K

配置详情

  • KL_Grammar

    • 数据文件路径:
      • Dataset/Language/Grammar/Grammar_CSAT.json
      • Dataset/Language/Grammar/Grammar_TOPIK.json
      • Dataset/Language/Grammar/Grammar_Kedu.json
    • 分割: 测试
  • KL_Textual

    • 数据文件路径:
      • Dataset/Language/Textual/Textual_TOPIK.json
      • Dataset/Language/Textual/Textual_CSAT.json
    • 分割: 测试
  • KL_Functional

    • 数据文件路径:
      • Dataset/Language/Functional/Functional_Kedu.json
      • Dataset/Language/Functional/Functional_PSE.json
      • Dataset/Language/Functional/Functional_CSAT.json
    • 分割: 测试
  • KC_Law

    • 数据文件路径:
      • Dataset/Culture/Korean Law/Law_KIIP.json
      • Dataset/Culture/Korean Law/Law_PSAT.json
    • 分割: 测试
  • KC_Popular

    • 数据文件路径:
      • Dataset/Culture/Korean Popular/Popular_Kedu.json
      • Dataset/Culture/Korean Popular/Popular_KIIP.json
    • 分割: 测试
  • KC_Politics

    • 数据文件路径:
      • Dataset/Culture/Korean Politics/Politics_Kedu.json
      • Dataset/Culture/Korean Politics/Politics_KIIP.json
    • 分割: 测试
  • KC_Geography

    • 数据文件路径:
      • Dataset/Culture/Korean Geography/Geography_KIIP.json
      • Dataset/Culture/Korean Geography/Geography_Kedu.json
      • Dataset/Culture/Korean Geography/Geography_CSAT.json
    • 分割: 测试
  • KC_Economy

    • 数据文件路径:
      • Dataset/Culture/Korean Economy/Economy_KIIP.json
      • Dataset/Culture/Korean Economy/Economy_Kedu.json
    • 分割: 测试
  • KC_History

    • 数据文件路径:
      • Dataset/Culture/Korean History/History_Kedu.json
      • Dataset/Culture/Korean History/History_PSE.json
      • Dataset/Culture/Korean History/History_KHB.json
    • 分割: 测试
  • KC_Society

    • 数据文件路径:
      • Dataset/Culture/Korean Society/Society_Kedu.json
      • Dataset/Culture/Korean Society/Society_KIIP.json
    • 分割: 测试
  • KC_Tradition

    • 数据文件路径:
      • Dataset/Culture/Korean Tradition/Tradition_Kedu.json
      • Dataset/Culture/Korean Tradition/Tradition_KIIP.json
    • 分割: 测试

数据集结构

📦CLIcK └─ Dataset ├─ Culture │ ├─ [各文化子类别及其关联的JSON文件] └─ Language ├─ [各语言子类别及其关联的JSON文件]

考试代码描述

  • KIIP: 韩国移民与整合计划
  • CSAT: 韩国大学学术能力测试
  • Kedu: 韩国外语教学考试
  • PSE: 公共服务考试第九级
  • TOPIK: 韩国语能力测试
  • KHB: 韩国历史基础考试
  • PSAT: 韩国公共服务能力倾向测试
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作