Flash Flood BERT Text Classification Model: Dataset
收藏DataCite Commons2025-06-02 更新2025-04-16 收录
下载链接:
https://www.designsafe-ci.org/data/browser/public/designsafe.storage.published/PRJ-4111
下载链接
链接失效反馈官方服务:
资源简介:
This published dataset is a source for researchers and practitioners to improve upon the performances of the (Flash Flood) FF-BERT model. FF-BERT is a multi-label text classification model that classifies the paragraphs of FF event related webpages into one of more of the seven categories: (1) Damage and Economic Impact (DI), (2) Fatalities, Injuries, and Rescue (FIR), (3) Hydrometeorology (HM), (4) Warning and Emergency (WE), (5) Response and Recovery (RR), (6) Public Health (PH), and (7) Mitigation (MG). To develop FF-BERT, we labeled 21,180 paragraphs from FF-related webpages and performed experiments with multiple model architectures based on the widely used Bidirectional Encoder Representation from Transformers (BERT). The FF-related webpages are the ones labeled 1 in the FF-IR ML dataset (https://doi.org/10.17603/ds2-rwg3-v337). Our final model outperforms the baseline by 11.83%, as measured by the micro-F1 score. In addition, FF-BERT significantly improves the prediction of minority labels (RR-32.1%, PH-260.4%, and MG-138.6%). The study is currently under publication. We have designed FF-BERT to be jointly used with FF-IR published here: https://doi.org/10.1016/j.envsoft.2023.105734. Researchers and practitioners can use FF-IR jointly with FF-BERT to uncover information about past flash flood events from the web and enhance the information contained in existing databases. This dataset contains two csv files, ‘FF_BERT_Train.csv’ and ‘FF_BERT_Test.csv’ which contain 80% and 20% of the 21,180 paragraphs, respectively, manually labeled for developing FF-BERT. We used the train file to train multiple BERT-based models and the test file to compare the performances of the trained models using unseen test data. Details regarding the dataset are available in the readme file.
提供机构:
Designsafe-CI
创建时间:
2023-08-23



