makcedward/hudocvqa
收藏Hugging Face2024-12-06 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/makcedward/hudocvqa
下载链接
链接失效反馈官方服务:
资源简介:
HuDocVQA,匈牙利文档视觉问答数据集,用于训练、评估和分析匈牙利自然语言理解系统。我们使用匈牙利维基百科语料库作为种子文档生成问题和答案,并使用SambaNova Cloud的Llama 3.1生成资源。为了增加输入数据的多样性,我们插入了一些随机图像(来自ImageNet)和文本(如人名和页码)。在文档样式方面,我们引入了不同的设置,如文档大小和方向、段落字体和字体大小、页眉和页脚的对齐和字体格式等。
HuDocVQA, the Hungarian Document Visual Question Answering dataset, is designed for training, evaluating, and analyzing Hungarian natural language understanding systems. The dataset uses the Hungarian Wikipedia corpus as a seed document to generate questions and answers, and the Llama 3.1 model from SambaNova Cloud is used to generate the resource. The dataset includes various features such as images, text, questions, and answers, aiming to train and evaluate Hungarian natural language understanding systems. The dataset is divided into training, testing, and validation sets, with detailed statistics provided.
提供机构:
makcedward



