CIIRC-NLP/truthful_qa-cs

Name: CIIRC-NLP/truthful_qa-cs
Creator: CIIRC-NLP
Published: 2024-09-03 12:32:04
License: 暂无描述

Hugging Face2024-09-03 更新2024-06-12 收录

下载链接：

https://hf-mirror.com/datasets/CIIRC-NLP/truthful_qa-cs

下载链接

链接失效反馈

官方服务：

资源简介：

--- language: - cs license: apache-2.0 size_categories: - n<1K task_categories: - multiple-choice - question-answering pretty_name: Czech TruthfulQA dataset_info: - config_name: multiple_choice features: - name: question dtype: string - name: mc1_targets struct: - name: choices sequence: string - name: labels sequence: int32 - name: mc2_targets struct: - name: choices sequence: string - name: labels sequence: int32 splits: - name: validation num_bytes: 650313 num_examples: 817 download_size: 312789 dataset_size: 650313 - config_name: shuffled_mc1 features: - name: question dtype: string - name: choices sequence: string - name: target_idx dtype: int64 splits: - name: validation num_bytes: 286901 num_examples: 817 download_size: 150893 dataset_size: 286901 configs: - config_name: multiple_choice data_files: - split: validation path: multiple_choice/validation-* - config_name: shuffled_mc1 data_files: - split: validation path: shuffled_mc1/validation-* --- # Czech TruthfulQA This is a Czech translation of the original [TruthfulQA](https://huggingface.co/datasets/truthful_qa) dataset, created using the [WMT 21 En-X](https://huggingface.co/facebook/wmt21-dense-24-wide-en-x) model. Only the multiple-choice variant of the dataset is included. The translation was completed for use within the [Czech-Bench](https://gitlab.com/jirkoada/czech-bench) evaluation framework. The script used for translation can be reviewed [here](https://gitlab.com/jirkoada/czech-bench/-/blob/main/benchmarks/dataset_translation.py?ref_type=heads). ## Citation Original dataset: ```bibtex @misc{lin2021truthfulqa, title={TruthfulQA: Measuring How Models Mimic Human Falsehoods}, author={Stephanie Lin and Jacob Hilton and Owain Evans}, year={2021}, eprint={2109.07958}, archivePrefix={arXiv}, primaryClass={cs.CL} } ``` Czech translation: ```bibtex @masterthesis{jirkovsky-thesis, author = {Jirkovský, Adam}, title = {Benchmarking Techniques for Evaluation of Large Language Models}, school = {Czech Technical University in Prague, Faculty of Electrical Engineering}, year = 2024, URL = {https://dspace.cvut.cz/handle/10467/115227} } ```

提供机构：

CIIRC-NLP

原始信息汇总

数据集概述

基本信息

数据集名称: Czech TruthfulQA
配置名称: multiple_choice
任务类别:
- multiple-choice
- question-answering
语言: cs
大小类别: n<1K
许可证: apache-2.0

数据结构

特征:
- question: 字符串类型
- mc1_targets:
  - choices: 字符串序列
  - labels: 整数序列
- mc2_targets:
  - choices: 字符串序列
  - labels: 整数序列

数据集分割

验证集:
- 大小: 650313字节
- 示例数量: 817

下载信息

下载大小: 312789字节
数据集大小: 650313字节

5,000+

优质数据集

54 个

任务类型

进入经典数据集