five

ColombiaRACFullyCurated

收藏
arXiv2024-05-15 更新2024-06-24 收录
下载链接:
https://huggingface.co/somosnlp/gemma-1.1-2b-it_ColombiaRAC_FullyCurated_format_chatML_V1
下载链接
链接失效反馈
官方服务:
资源简介:
本研究创建的ColombiaRACFullyCurated数据集,由自由大学Los Libertadores基金会开发,旨在简化哥伦比亚航空法规(RAC)的理解。该数据集包含24,478个专家标注的问答对,来源于RAC的前五份文档。创建过程涉及自动化PDF转换为文本,并通过GPT API处理,最终由航空工程专家使用Argilla框架在Hugging Face环境中进行质量评估和标注。此数据集主要应用于提升RAC的普及性和可访问性,帮助航空专业人士和公众更好地理解和遵守航空法规。

The ColombiaRACFullyCurated dataset, developed by Fundación Universidad Los Libertadores for this research, aims to simplify comprehension of Colombian aviation regulations (RAC). It consists of 24,478 expert-annotated question-answer pairs derived from the first five documents of the RAC. The dataset development workflow includes automated PDF-to-text conversion, processing via the GPT API, and final quality assessment and annotation conducted by aerospace engineering experts using the Argilla framework within the Hugging Face environment. This dataset is primarily intended to improve the accessibility and public awareness of RAC, helping aviation professionals and the general public better understand and comply with Colombian aviation regulations.
提供机构:
自由大学Los Libertadores基金会
创建时间:
2024-05-15
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作