five

Flash Flood Information Retrieval System : Machine Learning Dataset

收藏
DataCite Commons2025-06-02 更新2025-04-16 收录
下载链接:
https://www.designsafe-ci.org/data/browser/public/designsafe.storage.published/PRJ-3429/?version=2
下载链接
链接失效反馈
官方服务:
资源简介:
This published dataset was used to train the Flash Flood Information Retrieval (FF-IR) System. The FF-IR system is a domain-specific search engine that accepts a past flash flood as an event of interest, fetches candidate webpages from the web, converts the webpages into numerical features, and uses the numerical features to make a prediction as to whether a candidate webpage contains relevant information about the event of interest or not. The system then returns a list of webpages that contain relevant information. FF-IR uses a Random Forest machine learning (ML) binary classification model to make the final prediction about relevance of the webpage; the dataset in this publication was used to train and test multiple ML techniques and selected Random Forest as the best performing one. The FF-IR outperforms direct Google searches by over 100%, measured by the F2-score. Natural hazard researchers and practitioners can use FF-IR to facilitate FF risk assessments and mitigation planning. The details of the system are added in this landing page as Related Work (https://doi.org/10.1016/j.envsoft.2023.105734). The public access web software for the FF-IR system called Flood Finder 360 is currently under development in continuous updates here: https://www.floodfinder360.com/. We created this dataset by randomly selecting 500 past flash floods, fetching their candidate webpages, manually assigning a label of relevance to them, and computing a set of numerical features to represent a webpage. In the dataset, the ‘Label’ column is the relevance label, and the remaining columns are the numerical features generated for each webpage using the codes available at https://doi.org/10.17603/ds2-ed3t-b759. The remaining details regarding the dataset (feature descriptions) are available in the readme file. This published dataset is a source for researchers and practitioners to improve upon the performances of the Flash Flood Information Retrieval (FF-IR) System.
提供机构:
Designsafe-CI
创建时间:
2022-03-10
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作