five

malhajar/hellaswag_tr-v0.2

收藏
Hugging Face2024-04-26 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/malhajar/hellaswag_tr-v0.2
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: features: - name: ctx dtype: string - name: ctx_a dtype: string - name: ctx_b dtype: string - name: endings sequence: string - name: ctx_en dtype: string - name: ctx_a_en dtype: string - name: ctx_b_en dtype: string - name: endings_en sequence: string - name: label dtype: string splits: - name: validation num_bytes: 18863391 num_examples: 8857 download_size: 10946714 dataset_size: 18863391 configs: - config_name: default data_files: - split: validation path: data/validation-* --- This Dataset is part of a series of datasets aimed at advancing Turkish LLM Developments by establishing rigid Turkish benchmarks to evaluate the performance of LLM's Produced in the Turkish Language. # Dataset Card for Hellaswag-Turkish v0.2 `malhajar/hellaswag_tr-v0.2` is an advanced version of the original `hellaswag-turkish`, aimed specifically to be used in the [`OpenLLMTurkishLeaderboard_v0.2`](https://huggingface.co/spaces/malhajar/OpenLLMTurkishLeaderboard_v0.2). Unlike its predecessor which was a direct translation, this dataset has been completely and intelligently generated by GPT-4, with each entry carefully crafted and reviewed by human experts to ensure it aligns with the paper's definition of the dataset. This process enhances the dataset's utility in testing the completion abilities of language models. ## Dataset Description - **Homepage:** [Original Hellaswag Dataset](https://rowanzellers.com/hellaswag/) - **Paper:** [Can a Machine Really Finish Your Sentence?](https://arxiv.org/abs/1905.07830) - **Leaderboard:** [OpenLLMTurkishLeaderboard_v0.2](https://huggingface.co/spaces/malhajar/OpenLLMTurkishLeaderboard_v0.2) ### Dataset Summary `hellaswag_tr-v0.2` pushes the boundary of what is possible in language understanding by Turkish LLMs by providing contextually rich, creative continuations that test the completion abilities of models. This dataset is not merely a translation from English to Turkish but an enhancement, with each prompt and completion generated to reflect nuanced, culturally relevant contexts that are specific to the Turkish language. ### Supported Tasks and Leaderboards This dataset is particularly suited for testing advanced text completion and generation tasks, evaluating both the creativity and understanding of Turkish language models. ### Languages The dataset is presented in Turkish, crafted to ensure high-quality and context-aware machine generated content. ## Dataset Structure ### Data Instances A typical data instance comprises a context and a set of ending choices, where the model needs to select or generate the most appropriate ending based on the given context. ```python { 'context': 'Bir grup öğrenci okul projeleri için deney yapıyor. Öğretmen onlara...', 'endings': [ 'bir sonraki adımın ne olması gerektiğini söyler.', 'hangi malzemeleri kullanmaları gerektiğini anlatır.', 'deneyin sonuçlarını tahmin etmelerini ister.', 'projeleri için daha fazla fon sağlar.' ], 'correct_ending': 2 } ``` ## Licensing Information This dataset is licensed under the [Apache License, Version 2.0](http://www.apache.org/licenses/LICENSE-2.0). ## Citation Information ```bibtex @misc{hellaswag_tr_v0.2, title = "Hellaswag Turkish v0.2", author = "Mohamad Alhajar", year = 2024, url = "https://huggingface.co/datasets/malhajar/hellaswag_tr-v0.2" } ```
提供机构:
malhajar
原始信息汇总

数据集概述

数据集名称

hellaswag_tr-v0.2

数据集特征

  • ctx: 字符串类型
  • ctx_a: 字符串类型
  • ctx_b: 字符串类型
  • endings: 字符串序列类型
  • ctx_en: 字符串类型
  • ctx_a_en: 字符串类型
  • ctx_b_en: 字符串类型
  • endings_en: 字符串序列类型
  • label: 字符串类型

数据集分割

  • 验证集:
    • 字节数: 18863391
    • 示例数: 8857

数据集大小

  • 下载大小: 10946714
  • 数据集大小: 18863391

数据集配置

  • 默认配置:
    • 数据文件路径: data/validation-*

数据集结构

  • 数据实例:
    • 上下文: 字符串
    • 结束选项: 字符串列表
    • 正确结束: 整数索引

许可证信息

  • 许可证: Apache License, Version 2.0

引用信息

bibtex @misc{hellaswag_tr_v0.2, title = "Hellaswag Turkish v0.2", author = "Mohamad Alhajar", year = 2024, url = "https://huggingface.co/datasets/malhajar/hellaswag_tr-v0.2" }

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作