csebuetnlp/squad_bn

Name: csebuetnlp/squad_bn
Creator: csebuetnlp
Published: 2024-09-10 13:28:27
License: 暂无描述

Hugging Face2024-09-10 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/csebuetnlp/squad_bn

下载链接

链接失效反馈

官方服务：

资源简介：

这是一个针对孟加拉语的问答（QA）数据集，从SQuAD 2.0和TyDI-QA数据集中提取，并使用最先进的英语到孟加拉语翻译模型进行翻译。该数据集专为开放领域QA和抽取式QA等任务设计，是单语种的，仅包含孟加拉语数据，适用于非商业研究目的，并遵循特定的许可证。

提供机构：

csebuetnlp

原始信息汇总

数据集概述

名称: squad_bn
语言: 孟加拉语 (Bengali)
许可证: Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0)
多语言性: 单语种
大小: 100K<n<1M
任务类别: 问答
任务ID:
- open-domain-qa
- extractive-qa

数据以JSON格式存储，包含以下字段：

注释过程: 使用Language-Agnostic BERT Sentence Embeddings (LaBSE)计算翻译和原始句子之间的相似度，接受相似度超过0.7的数据点。

许可证信息: 本数据集内容仅限于非商业研究目的使用。
引用信息: 使用本数据集时，请引用以下论文：

@misc{bhattacharjee2021banglabert, title={BanglaBERT: Combating Embedding Barrier in Multilingual Models for Low-Resource Language Understanding}, author={Abhik Bhattacharjee and Tahmid Hasan and Kazi Samin and Md Saiful Islam and M. Sohel Rahman and Anindya Iqbal and Rifat Shahriyar}, year={2021}, eprint={2101.00204}, archivePrefix={arXiv}, primaryClass={cs.CL} }

5,000+

优质数据集

54 个

任务类型

进入经典数据集