sorenmulli/citizenship-test-da

Name: sorenmulli/citizenship-test-da
Creator: sorenmulli
Published: 2024-01-15 19:34:15
License: 暂无描述

Hugging Face2024-01-15 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/sorenmulli/citizenship-test-da

下载链接

链接失效反馈

官方服务：

资源简介：

--- dataset_info: - config_name: default features: - name: question dtype: string - name: index dtype: int64 - name: option-A dtype: string - name: option-B dtype: string - name: option-C dtype: string - name: correct dtype: string - name: origin dtype: string splits: - name: train num_bytes: 103251.0 num_examples: 605 download_size: 43667 dataset_size: 103251.0 - config_name: raw features: - name: question dtype: string - name: index dtype: int64 - name: option-A dtype: string - name: option-B dtype: string - name: option-C dtype: string - name: correct dtype: string - name: origin dtype: string splits: - name: train num_bytes: 103906 num_examples: 605 download_size: 45297 dataset_size: 103906 configs: - config_name: default data_files: - split: train path: data/train-* - config_name: raw data_files: - split: train path: raw/train-* --- # [WIP] Dataset Card for "citizenship-test-da" *Please note that this dataset and dataset card both are works in progress. For now refer to the related [thesis](https://sorenmulli.github.io/thesis/thesis.pdf) for all details* This dataset contains scraped questions and answers from Danish citizen tests (Danish: *indfødsretsprøver* og *medborgerskabsprøver*) from Juni 2019 to May 2023 from PDF's produced by ''Styrelsen for International Rekruttering og Integration'' (SIRI). The dataset is released as an appendix to the thesis [''Are GLLMs Danoliterate? Benchmarking Generative NLP in Danish''](https://sorenmulli.github.io/thesis/thesis.pdf) and permission by SIRI for this specific purpose. The PDF's are available on [SIRI's website](https://siri.dk/nyheder/?categorizations=9115). The `default` configuration has been semi-automatically cleaned to remove PDF artifacts using the [Alvenir 3gram DSL language model](https://github.com/danspeech/danspeech/releases/tag/v0.02-alpha). The examples were not deduplicated.

提供机构：

sorenmulli

原始信息汇总

数据集概述

数据集配置

default
- 特征:
  - question: string
  - index: int64
  - option-A: string
  - option-B: string
  - option-C: string
  - correct: string
  - origin: string
- 分割:
  - train
    - 字节数: 103251.0
    - 样本数: 605
- 下载大小: 43667
- 数据集大小: 103251.0
raw
- 特征:
  - question: string
  - index: int64
  - option-A: string
  - option-B: string
  - option-C: string
  - correct: string
  - origin: string
- 分割:
  - train
    - 字节数: 103906
    - 样本数: 605
- 下载大小: 45297
- 数据集大小: 103906

数据文件

default
- 分割:
  - train: data/train-*
raw
- 分割:
  - train: raw/train-*

5,000+

优质数据集

54 个

任务类型

进入经典数据集