CrisisBench-english|社交媒体分析数据集|危机响应数据集

魔搭社区2025-06-20 更新2025-06-21 收录

社交媒体分析

危机响应

下载链接：

https://modelscope.cn/datasets/QCRI/CrisisBench-english

下载链接

链接失效反馈

资源简介：

# [CrisisBench: Benchmarking Crisis-related Social Media Datasets for Humanitarian Information Processing](https://ojs.aaai.org/index.php/ICWSM/article/view/18115/17918) The crisis benchmark dataset consists of data from several different sources, such as CrisisLex ([CrisisLex26](http://crisislex.org/data-collections.html#CrisisLexT26), [CrisisLex6](http://crisislex.org/data-collections.html#CrisisLexT6)), [CrisisNLP](https://crisisnlp.qcri.org/lrec2016/lrec2016.html), [SWDM2013](http://mimran.me/papers/imran_shady_carlos_fernando_patrick_practical_2013.pdf), [ISCRAM13](http://mimran.me/papers/imran_shady_carlos_fernando_patrick_iscram2013.pdf), Disaster Response Data (DRD), [Disasters on Social Media (DSM)](https://data.world/crowdflower/disasters-on-social-media), [CrisisMMD](https://crisisnlp.qcri.org/crisismmd), and data from [AIDR](http://aidr.qcri.org/). The purpose of this work was to map the class labels, remove duplicates, and provide benchmark results for the community. ## Dataset This is the set with English languages of the whole CrisisBench dataset. Please check the [CrisisBench Collection](https://huggingface.co/collections/QCRI/crisisbench-672c4b82bcc344d504d775fc) ## Data format Each JSON object contains the following fields: * **id:** Corresponds to the user tweet ID from Twitter. * **event:** Event name associated with the respective dataset. * **source:** Source of the dataset. * **text:** Tweet text. * **lang:** Language tag obtained either from Twitter or from the Google Language Detection API. * **lang_conf:** Confidence score obtained from the Google Language Detection API. In some cases, the tag is marked as "NA," indicating that the language tag was obtained from Twitter rather than the API. * **class_label:** Class label assigned to a given tweet text. ## **Downloads (Alternate Links)** Labeled data and other resources - **Crisis dataset version v1.0:** [Download](https://crisisnlp.qcri.org/data/crisis_datasets_benchmarks/crisis_datasets_benchmarks_v1.0.tar.gz) - **Alternate download link:** [Dataverse](https://doi.org/10.7910/DVN/G98BQG) ## Experimental Scripts: Source code to run the experiments is available at [https://github.com/firojalam/crisis_datasets_benchmarks](https://github.com/firojalam/crisis_datasets_benchmarks) ## License This version of the dataset is distributed under the **Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0)**. The full license text can be found in the accompanying `licenses_by-nc-sa_4.0_legalcode.txt` file. ## Citation If you use this data in your research, please consider citing the following paper: [1] Firoj Alam, Hassan Sajjad, Muhammad Imran and Ferda Ofli, CrisisBench: Benchmarking Crisis-related Social Media Datasets for Humanitarian Information Processing, In ICWSM, 2021. [Paper](https://ojs.aaai.org/index.php/ICWSM/article/view/18115/17918) ``` @inproceedings{firojalamcrisisbenchmark2020, Author = {Firoj Alam, Hassan Sajjad, Muhammad Imran, Ferda Ofli}, Keywords = {Social Media, Crisis Computing, Tweet Text Classification, Disaster Response}, Booktitle = {15th International Conference on Web and Social Media (ICWSM)}, Title = {{CrisisBench:} Benchmarking Crisis-related Social Media Datasets for Humanitarian Information Processing}, Year = {2021} } ``` * and the following associated papers * Muhammad Imran, Prasenjit Mitra, Carlos Castillo. Twitter as a Lifeline: Human-annotated Twitter Corpora for NLP of Crisis-related Messages. In Proceedings of the 10th Language Resources and Evaluation Conference (LREC), 2016, Slovenia. * A. Olteanu, S. Vieweg, C. Castillo. 2015. What to Expect When the Unexpected Happens: Social Media Communications Across Crises. In Proceedings of the ACM 2015 Conference on Computer Supported Cooperative Work and Social Computing (CSCW '15). ACM, Vancouver, BC, Canada. * A. Olteanu, C. Castillo, F. Diaz, S. Vieweg. 2014. CrisisLex: A Lexicon for Collecting and Filtering Microblogged Communications in Crises. In Proceedings of the AAAI Conference on Weblogs and Social Media (ICWSM'14). AAAI Press, Ann Arbor, MI, USA. * Muhammad Imran, Shady Elbassuoni, Carlos Castillo, Fernando Diaz and Patrick Meier. Practical Extraction of Disaster-Relevant Information from Social Media. In Social Web for Disaster Management (SWDM'13) - Co-located with WWW, May 2013, Rio de Janeiro, Brazil. * Muhammad Imran, Shady Elbassuoni, Carlos Castillo, Fernando Diaz and Patrick Meier. Extracting Information Nuggets from Disaster-Related Messages in Social Media. In Proceedings of the 10th International Conference on Information Systems for Crisis Response and Management (ISCRAM), May 2013, Baden-Baden, Germany. ``` @inproceedings{imran2016lrec, author = {Muhammad Imran and Prasenjit Mitra and Carlos Castillo}, title = {Twitter as a Lifeline: Human-annotated Twitter Corpora for NLP of Crisis-related Messages}, booktitle = {Proc. of the LREC, 2016}, year = {2016}, month = {5}, publisher = {ELRA}, address = {Paris, France}, isbn = {978-2-9517408-9-1}, language = {english} } @inproceedings{olteanu2015expect, title={What to expect when the unexpected happens: Social media communications across crises}, author={Olteanu, Alexandra and Vieweg, Sarah and Castillo, Carlos}, booktitle={Proc. of the 18th ACM Conference on Computer Supported Cooperative Work \& Social Computing}, pages={994--1009}, year={2015}, organization={ACM} } @inproceedings{olteanu2014crisislex, title={CrisisLex: A Lexicon for Collecting and Filtering Microblogged Communications in Crises.}, author={Olteanu, Alexandra and Castillo, Carlos and Diaz, Fernando and Vieweg, Sarah}, booktitle = "Proc. of the 8th ICWSM, 2014", publisher = "AAAI press", year={2014} } @inproceedings{imran2013practical, title={Practical extraction of disaster-relevant information from social media}, author={Imran, Muhammad and Elbassuoni, Shady and Castillo, Carlos and Diaz, Fernando and Meier, Patrick}, booktitle={Proc. of the 22nd WWW}, pages={1021--1024}, year={2013}, organization={ACM} } @inproceedings{imran2013extracting, title={Extracting information nuggets from disaster-related messages in social media}, author={Imran, Muhammad and Elbassuoni, Shady Mamoon and Castillo, Carlos and Diaz, Fernando and Meier, Patrick}, booktitle={Proc. of the 12th ISCRAM}, year={2013} } ```

提供机构：

maas

创建时间：

2025-06-17

用户留言

有没有相关的论文或文献参考？

这个数据集是基于什么背景创建的？

数据集的作者是谁？

能帮我联系到这个数据集的作者吗？

这个数据集如何下载？

点击留言

数据主题

具身智能

数据集 4098个

机构 8个

大模型

数据集 439个

机构 10个

无人机

数据集 37个

机构 6个

指令微调

数据集 36个

机构 6个

蛋白质结构

数据集 50个

机构 8个

空间智能

数据集 21个

机构 5个

5,000+

优质数据集

54 个

任务类型

进入经典数据集

热门数据集

GME Data

关于2021年GameStop股票活动的数据，包括每日合并的GME短期成交量数据、每日失败交付数据、可借股数、期权链数据以及不同时间框架的开盘/最高/最低/收盘/成交量条形图。

github 收录

Amazon Reviews 2023

该数据集包含用户评论，如评分、评论文本、有用投票等，以及商品元数据，如产品描述、定价、图片等。数据集比以前的版本大245.2%，包含571.54M条评论，并具有更丰富的描述性商品特征和细粒度的时间戳。

github 收录

中国地质调查局: 全国1∶200 000区域水文地质图空间数据库

全国1∶200 000区域水文地质图空间数据库以建国后在全国范围内(本次未在香港特别行政区、澳门特别行政区和台湾省开展工作) 30个省开展的1∶200 000区域水文地质普查工作所取得的区域水文地质普查报告、综合水文地质图等地质资料为数据源，在制定的“1∶200 000区域水文地质图空间数据库图层及属性文件格式标准”的基础上，建成了一个全国性的、大型的区域水文地质学空间数据库。该数据库总共采集、处理了全国范围内1∶200 000图幅的<number>1 017</number>幅全要素综合水文地质图信息，全部数据量约50 GB。数据库涵盖了以1∶200 000国际标准图幅为管理单位的水文地质要素空间数据图层，内容包括：地理要素(交通层、水系层、行政区划层等)，基础地质要素(地层分区层、断裂构造层)，水文地质要素(地下水类型层、地下水富水性层、地下水迳流模数层，地下水水质层、水文地质特征层、地下水利用规划层)，专题要素(综合水文地质柱状图，水文地质剖面图) 四大类近30个要素图层。空间数据库主要采用MapGIS地理信息系统格式存储，形成了目前国内覆盖范围最广、包含信息最完整的区域水文地质图空间数据库成果，是地质领域全国性最重要的基础信息资源之一。

DataCite Commons 收录

Pet Disease images

Comprehensive Image Dataset for Detecting Pet Diseases Across Multiple Species

kaggle 收录

TCIA

TCIA（The Cancer Imaging Archive）是一个公开的癌症影像数据集，包含多种癌症类型的医学影像数据，如CT、MRI、PET等。这些数据通常与临床和病理信息相结合，用于癌症研究和临床试验。