ucsahin/TR-Extractive-QA-82K

Name: ucsahin/TR-Extractive-QA-82K
Creator: ucsahin
Published: 2024-07-06 09:17:06
License: 暂无描述

Hugging Face2024-07-06 更新2024-07-06 收录

下载链接：

https://hf-mirror.com/datasets/ucsahin/TR-Extractive-QA-82K

下载链接

链接失效反馈

官方服务：

资源简介：

数据集包含约82K个土耳其语的{Context, Question, Answer}三元组。由于大多数答案只有几个词，并且直接来自提供的上下文，因此更适合用于微调仅编码器模型（如BERT）以进行抽取式问答，或用于检索的嵌入模型。该数据集是多个土耳其语问答数据集的过滤和组合版本。

The dataset consists of nearly 82K {Context, Question, Answer} triplets in Turkish. Since most of the answers are only a few words and taken directly from the provided context, it can be better used in finetuning encoder-only models like BERT for extractive question answering or embedding models for retrieval. The dataset is a filtered and combined version of multiple Turkish QA-based datasets.

提供机构：

ucsahin

原始信息汇总

数据集概述

数据集信息

特征:
- question: 问题，数据类型为字符串。
- context: 上下文，数据类型为字符串。
- answer: 答案，数据类型为字符串。
分割:
- train: 训练集，包含65,594个样本，占用65,343,587字节。
- test: 测试集，包含16,399个样本，占用16,441,554字节。
下载大小: 50,866,268字节
数据集大小: 81,785,141字节

配置

配置名称: default
- 数据文件:
  - train: 路径为 data/train-*
  - test: 路径为 data/test-*

语言

数据集语言: 土耳其语

数据集描述

数据集包含近82,000个{上下文, 问题, 答案}三元组，适用于微调像BERT这样的编码器模型，用于提取式问答或嵌入模型用于检索。
数据集是多个土耳其语问答数据集的过滤和组合版本。

5,000+

优质数据集

54 个

任务类型

进入经典数据集