five

SQuAD-sr

收藏
arXiv2024-04-13 更新2024-06-21 收录
下载链接:
https://www.kaggle.com/datasets/aleksacvetanovic/squad-sr
下载链接
链接失效反馈
官方服务:
资源简介:
SQuAD-sr是由贝尔格莱德大学电气工程学院创建的塞尔维亚语问答数据集,包含超过87,000个样本,支持西里尔字母和拉丁字母两种版本。该数据集通过Translate-Align-Retrieve方法从英文SQuAD v1.1数据集转换而来,旨在为塞尔维亚语的问答模型提供训练和评估资源。数据集的创建过程涉及翻译、对齐和检索等步骤,确保了数据的质量和适用性。SQuAD-sr主要用于微调Transformer模型,以提高在塞尔维亚语环境下的问答性能,特别是在缺乏人工标注数据的情况下,该数据集显得尤为重要。

SQuAD-sr is a Serbian question answering dataset developed by the School of Electrical Engineering, University of Belgrade. It contains over 87,000 samples and supports both Cyrillic and Latin alphabet versions. This dataset is derived from the English SQuAD v1.1 dataset via the Translate-Align-Retrieve method, with the goal of providing training and evaluation resources for Serbian-language question answering models. The dataset creation process involves steps including translation, alignment and retrieval, which ensures the quality and applicability of the data. SQuAD-sr is primarily used for fine-tuning Transformer models to enhance question answering performance in Serbian contexts, and it is particularly valuable when manually annotated data is scarce.
提供机构:
贝尔格莱德大学电气工程学院
创建时间:
2024-04-13
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作