gordicaleksa/slovenian-llm-eval-v0

Name: gordicaleksa/slovenian-llm-eval-v0
Creator: gordicaleksa
Published: 2024-04-04 08:14:17
License: 暂无描述

Hugging Face2024-04-04 更新2024-06-11 收录

下载链接：

https://hf-mirror.com/datasets/gordicaleksa/slovenian-llm-eval-v0

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: apache-2.0 language: sl --- # Slovenian LLM eval 🇸🇮 This dataset should be used for Slovenian LLM evaluation. Here is the [GitHub project](https://github.com/gordicaleksa/slovenian-llm-eval) used to build this dataset. For technical report of the project see this in-depth [Weights & Biases report](https://wandb.ai/gordicaleksa/serbian_llm_eval/reports/First-Serbian-LLM-eval---Vmlldzo2MjgwMDA5). ❤️ Even though this one was written for Serbian LLM eval the same process was used to build Slovenian LLM eval. I'll give a TL;DR here: ## What is covered? Common sense reasoning: * Hellaswag, Winogrande, PIQA, OpenbookQA, ARC-Easy, ARC-Challenge World knowledge: * NaturalQuestions, TriviaQA Reading comprehension: * BoolQ ## How was the eval created? 3 steps (for this version, v0, we've only done the translation and are looking for donations to push through the whole pipeline): 1. Machine Translation from English -> Slovenian using Google Translate 2. Refinement via GPT-4 3. Minor manual work by me (Aleksa Gordić) + we'll likely have a new version of Winogrande that was annotated by a human annotator Please see [the report](https://wandb.ai/gordicaleksa/serbian_llm_eval/reports/First-Serbian-LLM-eval---Vmlldzo2MjgwMDA5) for more detail. Note that even though the report is for Serbian same process was used for Slovenian. ## Example of how to use 1. Create a python environment and install HuggingFace datasets (`pip install datasets`). 2. Run: ```Python import datasets tasks = ["arc_challenge", "arc_easy", "boolq", "hellaswag", "nq_open", "openbookqa", "piqa", "triviaqa", "winogrande"] for task in tasks: dataset = datasets.load_dataset("gordicaleksa/slovenian-llm-eval-v1", task) for split in dataset.keys(): dataset = dataset[split] print(f"Task: {task}, Split: {split}") for example in dataset: print(example) ``` # Project Sponsors Your name will be here if you support the project, we are still looking for GPT-4 credits! :) ## Credits Thank you to the following individuals from my [Discord server](https://discord.gg/peBrCpheKE ) who helped with donating Google Translate credits & running machine translation part of the pipeline: [Raphael Vienne](https://www.linkedin.com/in/raphael-vienne/), [Brian Pulfer](https://www.brianpulfer.ch/), [Timotej Petrič](https://si.linkedin.com/in/timopetric), [Aljaž Potočnik](https://www.linkedin.com/in/aljaž-potočnik-70325365/), [Damjan Kodre](https://www.linkedin.com/in/damjan-kodre-34063430) ## Citation ``` @article{slovenian-llm-eval, author = "Gordić Aleksa", title = "Slovenian LLM Eval", year = "2024" howpublished = {\url{https://huggingface.co/datasets/gordicaleksa/slovenian-llm-eval-v1}}, } ``` ## License Apache 2.0.

提供机构：

gordicaleksa

原始信息汇总

Slovenian LLM eval 🇸🇮

概述

该数据集用于斯洛文尼亚语言模型（LLM）的评估。

涵盖内容

常识推理：
- Hellaswag
- Winogrande
- PIQA
- OpenbookQA
- ARC-Easy
- ARC-Challenge
世界知识：
- NaturalQuestions
- TriviaQA
阅读理解：
- BoolQ

创建过程

数据集的创建分为三个步骤：

使用Google Translate从英语机器翻译到斯洛文尼亚语。
通过GPT-4进行细化。
由Aleksa Gordić进行少量手动工作，并可能有一个由人工标注的新版本Winogrande。

使用示例

创建Python环境并安装HuggingFace数据集（pip install datasets）。
运行以下代码：

Python import datasets

tasks = ["arc_challenge", "arc_easy", "boolq", "hellaswag", "nq_open", "openbookqa", "piqa", "triviaqa", "winogrande"]

for task in tasks: dataset = datasets.load_dataset("gordicaleksa/slovenian-llm-eval-v1", task) for split in dataset.keys(): dataset = dataset[split] print(f"Task: {task}, Split: {split}") for example in dataset: print(example)

引用

@article{slovenian-llm-eval, author = "Gordić Aleksa", title = "Slovenian LLM Eval", year = "2024" howpublished = {url{https://huggingface.co/datasets/gordicaleksa/slovenian-llm-eval-v1}}, }

许可证

Apache 2.0。

5,000+

优质数据集

54 个

任务类型

进入经典数据集