five

gordicaleksa/slovenian-llm-eval-v0

收藏
Hugging Face2024-04-04 更新2024-06-11 收录
下载链接:
https://hf-mirror.com/datasets/gordicaleksa/slovenian-llm-eval-v0
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 language: sl --- # Slovenian LLM eval 🇸🇮 This dataset should be used for Slovenian LLM evaluation. Here is the [GitHub project](https://github.com/gordicaleksa/slovenian-llm-eval) used to build this dataset. For technical report of the project see this in-depth [Weights & Biases report](https://wandb.ai/gordicaleksa/serbian_llm_eval/reports/First-Serbian-LLM-eval---Vmlldzo2MjgwMDA5). ❤️ Even though this one was written for Serbian LLM eval the same process was used to build Slovenian LLM eval. I'll give a TL;DR here: ## What is covered? Common sense reasoning: * Hellaswag, Winogrande, PIQA, OpenbookQA, ARC-Easy, ARC-Challenge World knowledge: * NaturalQuestions, TriviaQA Reading comprehension: * BoolQ ## How was the eval created? 3 steps (for this version, v0, we've only done the translation and are looking for donations to push through the whole pipeline): 1. Machine Translation from English -> Slovenian using Google Translate 2. Refinement via GPT-4 3. Minor manual work by me (Aleksa Gordić) + we'll likely have a new version of Winogrande that was annotated by a human annotator Please see [the report](https://wandb.ai/gordicaleksa/serbian_llm_eval/reports/First-Serbian-LLM-eval---Vmlldzo2MjgwMDA5) for more detail. Note that even though the report is for Serbian same process was used for Slovenian. ## Example of how to use 1. Create a python environment and install HuggingFace datasets (`pip install datasets`). 2. Run: ```Python import datasets tasks = ["arc_challenge", "arc_easy", "boolq", "hellaswag", "nq_open", "openbookqa", "piqa", "triviaqa", "winogrande"] for task in tasks: dataset = datasets.load_dataset("gordicaleksa/slovenian-llm-eval-v1", task) for split in dataset.keys(): dataset = dataset[split] print(f"Task: {task}, Split: {split}") for example in dataset: print(example) ``` # Project Sponsors Your name will be here if you support the project, we are still looking for GPT-4 credits! :) ## Credits Thank you to the following individuals from my [Discord server](https://discord.gg/peBrCpheKE ) who helped with donating Google Translate credits & running machine translation part of the pipeline: [Raphael Vienne](https://www.linkedin.com/in/raphael-vienne/), [Brian Pulfer](https://www.brianpulfer.ch/), [Timotej Petrič](https://si.linkedin.com/in/timopetric), [Aljaž Potočnik](https://www.linkedin.com/in/aljaž-potočnik-70325365/), [Damjan Kodre](https://www.linkedin.com/in/damjan-kodre-34063430) ## Citation ``` @article{slovenian-llm-eval, author = "Gordić Aleksa", title = "Slovenian LLM Eval", year = "2024" howpublished = {\url{https://huggingface.co/datasets/gordicaleksa/slovenian-llm-eval-v1}}, } ``` ## License Apache 2.0.
提供机构:
gordicaleksa
原始信息汇总

Slovenian LLM eval 🇸🇮

概述

该数据集用于斯洛文尼亚语言模型(LLM)的评估。

涵盖内容

  • 常识推理
    • Hellaswag
    • Winogrande
    • PIQA
    • OpenbookQA
    • ARC-Easy
    • ARC-Challenge
  • 世界知识
    • NaturalQuestions
    • TriviaQA
  • 阅读理解
    • BoolQ

创建过程

数据集的创建分为三个步骤:

  1. 使用Google Translate从英语机器翻译到斯洛文尼亚语。
  2. 通过GPT-4进行细化。
  3. 由Aleksa Gordić进行少量手动工作,并可能有一个由人工标注的新版本Winogrande。

使用示例

  1. 创建Python环境并安装HuggingFace数据集(pip install datasets)。
  2. 运行以下代码:

Python import datasets

tasks = ["arc_challenge", "arc_easy", "boolq", "hellaswag", "nq_open", "openbookqa", "piqa", "triviaqa", "winogrande"]

for task in tasks: dataset = datasets.load_dataset("gordicaleksa/slovenian-llm-eval-v1", task) for split in dataset.keys(): dataset = dataset[split] print(f"Task: {task}, Split: {split}") for example in dataset: print(example)

引用

@article{slovenian-llm-eval, author = "Gordić Aleksa", title = "Slovenian LLM Eval", year = "2024" howpublished = {url{https://huggingface.co/datasets/gordicaleksa/slovenian-llm-eval-v1}}, }

许可证

Apache 2.0。

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作