malhajar/hellaswag_tr-v0.2
收藏Hugging Face2024-04-26 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/malhajar/hellaswag_tr-v0.2
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: ctx
dtype: string
- name: ctx_a
dtype: string
- name: ctx_b
dtype: string
- name: endings
sequence: string
- name: ctx_en
dtype: string
- name: ctx_a_en
dtype: string
- name: ctx_b_en
dtype: string
- name: endings_en
sequence: string
- name: label
dtype: string
splits:
- name: validation
num_bytes: 18863391
num_examples: 8857
download_size: 10946714
dataset_size: 18863391
configs:
- config_name: default
data_files:
- split: validation
path: data/validation-*
---
This Dataset is part of a series of datasets aimed at advancing Turkish LLM Developments by establishing rigid Turkish benchmarks to evaluate the performance of LLM's Produced in the Turkish Language.
# Dataset Card for Hellaswag-Turkish v0.2
`malhajar/hellaswag_tr-v0.2` is an advanced version of the original `hellaswag-turkish`, aimed specifically to be used in the [`OpenLLMTurkishLeaderboard_v0.2`](https://huggingface.co/spaces/malhajar/OpenLLMTurkishLeaderboard_v0.2). Unlike its predecessor which was a direct translation, this dataset has been completely and intelligently generated by GPT-4, with each entry carefully crafted and reviewed by human experts to ensure it aligns with the paper's definition of the dataset. This process enhances the dataset's utility in testing the completion abilities of language models.
## Dataset Description
- **Homepage:** [Original Hellaswag Dataset](https://rowanzellers.com/hellaswag/)
- **Paper:** [Can a Machine Really Finish Your Sentence?](https://arxiv.org/abs/1905.07830)
- **Leaderboard:** [OpenLLMTurkishLeaderboard_v0.2](https://huggingface.co/spaces/malhajar/OpenLLMTurkishLeaderboard_v0.2)
### Dataset Summary
`hellaswag_tr-v0.2` pushes the boundary of what is possible in language understanding by Turkish LLMs by providing contextually rich, creative continuations that test the completion abilities of models. This dataset is not merely a translation from English to Turkish but an enhancement, with each prompt and completion generated to reflect nuanced, culturally relevant contexts that are specific to the Turkish language.
### Supported Tasks and Leaderboards
This dataset is particularly suited for testing advanced text completion and generation tasks, evaluating both the creativity and understanding of Turkish language models.
### Languages
The dataset is presented in Turkish, crafted to ensure high-quality and context-aware machine generated content.
## Dataset Structure
### Data Instances
A typical data instance comprises a context and a set of ending choices, where the model needs to select or generate the most appropriate ending based on the given context.
```python
{
'context': 'Bir grup öğrenci okul projeleri için deney yapıyor. Öğretmen onlara...',
'endings': [
'bir sonraki adımın ne olması gerektiğini söyler.',
'hangi malzemeleri kullanmaları gerektiğini anlatır.',
'deneyin sonuçlarını tahmin etmelerini ister.',
'projeleri için daha fazla fon sağlar.'
],
'correct_ending': 2
}
```
## Licensing Information
This dataset is licensed under the [Apache License, Version 2.0](http://www.apache.org/licenses/LICENSE-2.0).
## Citation Information
```bibtex
@misc{hellaswag_tr_v0.2,
title = "Hellaswag Turkish v0.2",
author = "Mohamad Alhajar",
year = 2024,
url = "https://huggingface.co/datasets/malhajar/hellaswag_tr-v0.2"
}
```
提供机构:
malhajar
原始信息汇总
数据集概述
数据集名称
hellaswag_tr-v0.2
数据集特征
- ctx: 字符串类型
- ctx_a: 字符串类型
- ctx_b: 字符串类型
- endings: 字符串序列类型
- ctx_en: 字符串类型
- ctx_a_en: 字符串类型
- ctx_b_en: 字符串类型
- endings_en: 字符串序列类型
- label: 字符串类型
数据集分割
- 验证集:
- 字节数: 18863391
- 示例数: 8857
数据集大小
- 下载大小: 10946714
- 数据集大小: 18863391
数据集配置
- 默认配置:
- 数据文件路径:
data/validation-*
- 数据文件路径:
数据集结构
- 数据实例:
- 上下文: 字符串
- 结束选项: 字符串列表
- 正确结束: 整数索引
许可证信息
- 许可证: Apache License, Version 2.0
引用信息
bibtex @misc{hellaswag_tr_v0.2, title = "Hellaswag Turkish v0.2", author = "Mohamad Alhajar", year = 2024, url = "https://huggingface.co/datasets/malhajar/hellaswag_tr-v0.2" }



