five

kyle-obrien/multilingual-squad

收藏
Hugging Face2023-08-16 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/kyle-obrien/multilingual-squad
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-sa-4.0 configs: - config_name: default data_files: - split: en path: data/en-* - split: de path: data/de-* - split: es path: data/es-* - split: it path: data/it-* dataset_info: features: - name: id dtype: string - name: context dtype: string - name: question dtype: string - name: answers sequence: - name: text dtype: string - name: answer_start dtype: int32 splits: - name: en num_bytes: 729684.8924369748 num_examples: 778 - name: de num_bytes: 838982.7781512605 num_examples: 778 - name: es num_bytes: 832574.4117647059 num_examples: 778 - name: it num_bytes: 739646.5447480896 num_examples: 778 download_size: 729382 dataset_size: 3140888.627101031 --- # Dataset Card ### Dataset Summary This dataset contains multilingual, parallel SQuAD dataset examples across EN, DE, ES, and IT. To construct the dataset, identifiers were aligned across the following SQuAD-related datasets: * EN, DE, ES: [XQuAD (Cross-lingual Question Answering Dataset)](https://huggingface.co/datasets/xquad) * IT: [SQuAD-it](https://huggingface.co/datasets/squad_it) See citation information below. ### Citation Information XQuAD: ``` @article{Artetxe:etal:2019, author = {Mikel Artetxe and Sebastian Ruder and Dani Yogatama}, title = {On the cross-lingual transferability of monolingual representations}, journal = {CoRR}, volume = {abs/1910.11856}, year = {2019}, archivePrefix = {arXiv}, eprint = {1910.11856} } ``` SQuAD-it: ``` @InProceedings{10.1007/978-3-030-03840-3_29, author="Croce, Danilo and Zelenanska, Alexandra and Basili, Roberto", editor="Ghidini, Chiara and Magnini, Bernardo and Passerini, Andrea and Traverso, Paolo", title="Neural Learning for Question Answering in Italian", booktitle="AI*IA 2018 -- Advances in Artificial Intelligence", year="2018", publisher="Springer International Publishing", address="Cham", pages="389--402", isbn="978-3-030-03840-3" } ```
提供机构:
kyle-obrien
原始信息汇总

数据集概述

数据集信息

  • 许可: cc-by-sa-4.0
  • 配置:
    • 默认配置:
      • 数据文件:
        • 语言: en, de, es, it
        • 路径: data/en-, data/de-, data/es-, data/it-
  • 数据集特征:
    • id: 字符串类型
    • context: 字符串类型
    • question: 字符串类型
    • answers:
      • text: 字符串类型
      • answer_start: 整数类型 (int32)
  • 数据集分割:
    • en: 778个示例, 729684.8924369748字节
    • de: 778个示例, 838982.7781512605字节
    • es: 778个示例, 832574.4117647059字节
    • it: 778个示例, 739646.5447480896字节
  • 下载大小: 729382字节
  • 数据集大小: 3140888.627101031字节

数据集来源

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作