five

SailGenie Dataset

收藏
NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/7794130
下载链接
链接失效反馈
官方服务:
资源简介:
The dataset contains suitable data in the scenario of generating a Knowledge Graph in the world of sailing. The dataset includes the original corpus, i.e., the set of textual excerpts extracted from several information sources, a suitable ground truth, i.e., a set composed of domain triplets manually inferred by the corpus, and a set of triplets generated with the OIE-based tools, annotated and evaluated by human assessors. The two folders identify the two sailing sub-domains:  - `BASICS` related to fundamental knowledge for beginners about navigation, behavior, and maneuvers;  - `SAFETY`: includes information on measures, legal requirements, instruments, and best practices specifically required to ensure everyone's safety during sailing. Each folder has the following data:  - `sentences.txt` contains the collected sailing excerpts in the form o an ordered list of sentences;  - `ground_truth_triplets.csv` contains the manually identified triplets from the collected sentences;  - `annotaded_triplets.csv` contains the manually supervised sample of automatically extracted triplets, all labeled as valid or invalid. The `sen_index` field in both `ground_truth_triplets.csv` and `annotated_triplets.csv` refers to a line number in `sentences.txt` (start counting from zero) where to find the sentence related to each triplet.
创建时间:
2023-04-03
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作