SailGenie Dataset
收藏NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/7794130
下载链接
链接失效反馈官方服务:
资源简介:
The dataset contains suitable data in the scenario of generating a Knowledge Graph in the world of sailing.
The dataset includes the original corpus, i.e., the set of textual excerpts extracted from several information sources, a suitable ground truth, i.e., a set composed of domain triplets manually inferred by the corpus, and a set of triplets generated with the OIE-based tools, annotated and evaluated by human assessors.
The two folders identify the two sailing sub-domains:
- `BASICS` related to fundamental knowledge for beginners about navigation, behavior, and maneuvers;
- `SAFETY`: includes information on measures, legal requirements, instruments, and best practices specifically required to ensure everyone's safety during sailing.
Each folder has the following data:
- `sentences.txt` contains the collected sailing excerpts in the form o an ordered list of sentences;
- `ground_truth_triplets.csv` contains the manually identified triplets from the collected sentences;
- `annotaded_triplets.csv` contains the manually supervised sample of automatically extracted triplets, all labeled as valid or invalid.
The `sen_index` field in both `ground_truth_triplets.csv` and `annotated_triplets.csv` refers to a line number in `sentences.txt` (start counting from zero) where to find the sentence related to each triplet.
创建时间:
2023-04-03



