five

sorenmulli/citizenship-test-da

收藏
Hugging Face2024-01-15 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/sorenmulli/citizenship-test-da
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: - config_name: default features: - name: question dtype: string - name: index dtype: int64 - name: option-A dtype: string - name: option-B dtype: string - name: option-C dtype: string - name: correct dtype: string - name: origin dtype: string splits: - name: train num_bytes: 103251.0 num_examples: 605 download_size: 43667 dataset_size: 103251.0 - config_name: raw features: - name: question dtype: string - name: index dtype: int64 - name: option-A dtype: string - name: option-B dtype: string - name: option-C dtype: string - name: correct dtype: string - name: origin dtype: string splits: - name: train num_bytes: 103906 num_examples: 605 download_size: 45297 dataset_size: 103906 configs: - config_name: default data_files: - split: train path: data/train-* - config_name: raw data_files: - split: train path: raw/train-* --- # [WIP] Dataset Card for "citizenship-test-da" *Please note that this dataset and dataset card both are works in progress. For now refer to the related [thesis](https://sorenmulli.github.io/thesis/thesis.pdf) for all details* This dataset contains scraped questions and answers from Danish citizen tests (Danish: *indfødsretsprøver* og *medborgerskabsprøver*) from Juni 2019 to May 2023 from PDF's produced by ''Styrelsen for International Rekruttering og Integration'' (SIRI). The dataset is released as an appendix to the thesis [''Are GLLMs Danoliterate? Benchmarking Generative NLP in Danish''](https://sorenmulli.github.io/thesis/thesis.pdf) and permission by SIRI for this specific purpose. The PDF's are available on [SIRI's website](https://siri.dk/nyheder/?categorizations=9115). The `default` configuration has been semi-automatically cleaned to remove PDF artifacts using the [Alvenir 3gram DSL language model](https://github.com/danspeech/danspeech/releases/tag/v0.02-alpha). The examples were not deduplicated.
提供机构:
sorenmulli
原始信息汇总

数据集概述

数据集配置

  • default

    • 特征:
      • question: string
      • index: int64
      • option-A: string
      • option-B: string
      • option-C: string
      • correct: string
      • origin: string
    • 分割:
      • train
        • 字节数: 103251.0
        • 样本数: 605
    • 下载大小: 43667
    • 数据集大小: 103251.0
  • raw

    • 特征:
      • question: string
      • index: int64
      • option-A: string
      • option-B: string
      • option-C: string
      • correct: string
      • origin: string
    • 分割:
      • train
        • 字节数: 103906
        • 样本数: 605
    • 下载大小: 45297
    • 数据集大小: 103906

数据文件

  • default

    • 分割:
      • train: data/train-*
  • raw

    • 分割:
      • train: raw/train-*
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作