KennethTM/squad_pairs_danish
收藏Hugging Face2024-02-12 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/KennethTM/squad_pairs_danish
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: query
dtype: string
- name: passage
dtype: string
splits:
- name: train
num_bytes: 69338889
num_examples: 87599
download_size: 11644151
dataset_size: 69338889
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
language:
- da
task_categories:
- feature-extraction
- question-answering
license: cc-by-sa-4.0
---
# SQuAD question-answer pairs in Danish
## About
This dataset is a version of the [SQuAD question-answer pairs dataset](https://huggingface.co/datasets/sentence-transformers/embedding-training-data) machine-translated from English to Danish ([link to original dataset](https://huggingface.co/datasets/squad)).
Machine translation is performed using the Helsinki NLP [English-to-Danish OPUS-MT model](https://huggingface.co/Helsinki-NLP/opus-mt-en-da).
The dataset contains ~87k question-answer pairs and can be used to train embedding and question-answer models. Each pair consists of one question ('query') and one passage containing the answer ('passage').
## Usage
Using the HuggingFace datasets library:
```python
from datasets import load_dataset
dataset = load_dataset("KennethTM/squad_pairs_danish")
```
提供机构:
KennethTM
原始信息汇总
SQuAD question-answer pairs in Danish
数据集概述
特征
- query: 数据类型为字符串。
- passage: 数据类型为字符串。
数据分割
- train: 包含87599个样本,总大小为69338889字节。
数据大小
- 下载大小: 11644151字节。
- 数据集大小: 69338889字节。
配置
- default: 包含训练数据文件,路径为
data/train-*。
语言
- 丹麦语 (da)
任务类别
- 特征提取
- 问答系统
许可证
- CC BY-SA 4.0
数据集描述
该数据集是SQuAD question-answer pairs dataset的丹麦语版本,由英语机器翻译而来。使用Helsinki NLP的English-to-Danish OPUS-MT model进行翻译。
数据集包含约87k个问答对,可用于训练嵌入和问答模型。每个问答对包含一个问题(query)和一个包含答案的段落(passage)。



