five

LocalDoc/LDQuAd

收藏
Hugging Face2024-06-01 更新2024-05-25 收录
下载链接:
https://hf-mirror.com/datasets/LocalDoc/LDQuAd
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - az license: cc-by-nc-nd-4.0 size_categories: - 100K<n<1M task_categories: - question-answering pretty_name: LocalDoc Question Answer Dataset in Azerbaijani dataset_info: features: - name: id dtype: string - name: title dtype: string - name: context dtype: string - name: question dtype: string - name: answer_text dtype: string - name: answer_start dtype: int64 splits: - name: train num_bytes: 174936179 num_examples: 154198 download_size: 31812425 dataset_size: 174936179 configs: - config_name: default data_files: - split: train path: data/train-* --- # LDQuAd (LocalDoc Question Answer Dataset) ## Overview The **LDQuAd** is a comprehensive dataset consisting of 154,000 question-answer pairs in Azerbaijani language. This dataset is the first of its kind in the Azerbaijani language at such a scale, and LocalDoc is proud to present it. ## Dataset Features - **Total Q&A Pairs**: 154,000 - **Questions without Answers**: Approximately 30% of the questions do not have answers. This design choice helps in training models to effectively handle unanswerable questions by relying on the context. ## Significance The **LDQuAd** is a significant contribution to the NLP community, especially for Azerbaijani language processing. It stands alongside other major datasets such as <a target="_blank" href="https://huggingface.co/datasets/rajpurkar/squad_v2">SQuAD v2</a>, which also contains 150,000 question-answer pairs. ## Dataset Structure - **Context**: The text passage containing the information from which the question is derived. - **Question**: The question posed that can be answered from the context. - **Answer**: The answer to the question, or an indication that the question is unanswerable based on the context. - **Answer Start**: Starting index of the answer in the text. ## License The **LDQuAd** licensed under the CC BY-NC-ND 4.0 license. What does this license allow? Attribution: You must give appropriate credit, provide a link to the license, and indicate if changes were made. Non-Commercial: You may not use the material for commercial purposes. No Derivatives: If you remix, transform, or build upon the material, you may not distribute the modified material. For more information, please refer to the <a target="_blank" href="https://creativecommons.org/licenses/by-nc-nd/4.0/">CC BY-NC-ND 4.0 license</a>. ## Contact For more information, questions, or issues, please contact LocalDoc at [v.resad.89@gmail.com].
提供机构:
LocalDoc
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作