five

CIIRC-NLP/truthful_qa-cs

收藏
Hugging Face2024-09-03 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/CIIRC-NLP/truthful_qa-cs
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - cs license: apache-2.0 size_categories: - n<1K task_categories: - multiple-choice - question-answering pretty_name: Czech TruthfulQA dataset_info: - config_name: multiple_choice features: - name: question dtype: string - name: mc1_targets struct: - name: choices sequence: string - name: labels sequence: int32 - name: mc2_targets struct: - name: choices sequence: string - name: labels sequence: int32 splits: - name: validation num_bytes: 650313 num_examples: 817 download_size: 312789 dataset_size: 650313 - config_name: shuffled_mc1 features: - name: question dtype: string - name: choices sequence: string - name: target_idx dtype: int64 splits: - name: validation num_bytes: 286901 num_examples: 817 download_size: 150893 dataset_size: 286901 configs: - config_name: multiple_choice data_files: - split: validation path: multiple_choice/validation-* - config_name: shuffled_mc1 data_files: - split: validation path: shuffled_mc1/validation-* --- # Czech TruthfulQA This is a Czech translation of the original [TruthfulQA](https://huggingface.co/datasets/truthful_qa) dataset, created using the [WMT 21 En-X](https://huggingface.co/facebook/wmt21-dense-24-wide-en-x) model. Only the multiple-choice variant of the dataset is included. The translation was completed for use within the [Czech-Bench](https://gitlab.com/jirkoada/czech-bench) evaluation framework. The script used for translation can be reviewed [here](https://gitlab.com/jirkoada/czech-bench/-/blob/main/benchmarks/dataset_translation.py?ref_type=heads). ## Citation Original dataset: ```bibtex @misc{lin2021truthfulqa, title={TruthfulQA: Measuring How Models Mimic Human Falsehoods}, author={Stephanie Lin and Jacob Hilton and Owain Evans}, year={2021}, eprint={2109.07958}, archivePrefix={arXiv}, primaryClass={cs.CL} } ``` Czech translation: ```bibtex @masterthesis{jirkovsky-thesis, author = {Jirkovský, Adam}, title = {Benchmarking Techniques for Evaluation of Large Language Models}, school = {Czech Technical University in Prague, Faculty of Electrical Engineering}, year = 2024, URL = {https://dspace.cvut.cz/handle/10467/115227} } ```
提供机构:
CIIRC-NLP
原始信息汇总

数据集概述

基本信息

  • 数据集名称: Czech TruthfulQA
  • 配置名称: multiple_choice
  • 任务类别:
    • multiple-choice
    • question-answering
  • 语言: cs
  • 大小类别: n<1K
  • 许可证: apache-2.0

数据结构

  • 特征:
    • question: 字符串类型
    • mc1_targets:
      • choices: 字符串序列
      • labels: 整数序列
    • mc2_targets:
      • choices: 字符串序列
      • labels: 整数序列

数据集分割

  • 验证集:
    • 大小: 650313字节
    • 示例数量: 817

下载信息

  • 下载大小: 312789字节
  • 数据集大小: 650313字节
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作