WWTP-HDQA: A dataset for hazard detection and question answering in wastewater treatment plants in China
收藏DataCite Commons2026-02-02 更新2026-02-09 收录
下载链接:
https://figshare.com/articles/dataset/WWTP-HDQA_A_dataset_for_hazard_detection_and_question_answering_in_wastewater_treatment_plants_in_China/30948518/1
下载链接
链接失效反馈官方服务:
资源简介:
At the top level, the dataset is organized into two folders: hazard detection/ and qa/. Each folder contains subdirectories organized by individual subsets. Under hazard detection/, the secondary clarifier/ and confined space/ directories share the same file structure, each containing images.zip, labels.zip, train.txt, test.txt, and classes.txt. In secondary clarifier/, images.zip contains 1,600 secondary clarifier images and labels.zip contains the correspond?ing 1,600 YOLO-format annotation files (.txt). The files train.txt and test.txt list the image paths for the training and test sets, respectively. In confined space/, images.zip contains 3,500 confined space images and labels.zip contains the cor?responding annotations. train.txt and test.txt similarly list the image paths for the training and test sets. In both directories, classes.txt provides the label map by listing class names in the order of their numeric IDs.Under qa/, the policy qa/ directory contains train.json, val.json, and test.json with 19,074, 2,119, and 2,355 policy QA samples, respectively. The knowledge qa/ directory contains the same three files, with 12,168, 1,513, and 1,578 knowledge QA samples. All QA samples follow an Alpaca-style instruction-tuning format and are stored as JSON objects with three fields: instruction, input, and output. Here, instruction corresponds to the question, output corresponds to the answer, and input provides optional supplementary context (set to the empty string "" for all samples in this dataset).
提供机构:
figshare
创建时间:
2025-12-25



