Mega-COV

Name: Mega-COV
Creator: 英属哥伦比亚大学自然语言处理实验室
Published: 2021-02-06 06:19:06
License: 暂无描述

arXiv2021-02-06 更新2024-06-21 收录

下载链接：

https://github.com/UBC-NLP/megacov

下载链接

链接失效反馈

官方服务：

资源简介：

Mega-COV是由英属哥伦比亚大学自然语言处理实验室创建的大规模数据集，专门用于研究COVID-19。该数据集包含超过15亿条推文，覆盖268个国家，支持100多种语言，并包含约1.69亿条地理位置标记的推文。数据集不仅规模庞大，而且具有时间跨度，数据收集始于2007年，使得研究者能够进行纵向比较分析。Mega-COV旨在作为一个内容库，捕捉数千万人在疫情期间的生活细节，特别适用于研究与疫情相关的广泛现象，如信息传播、公众情绪和行为变化等。

Mega-COV is a large-scale dataset developed by the Natural Language Processing Lab at the University of British Columbia, exclusively designed for COVID-19-related research. This dataset comprises over 1.5 billion tweets across 268 countries, supporting more than 100 languages, and includes approximately 169 million geotagged tweets. Beyond its exceptional scale, the dataset features a broad temporal coverage, with data collection initiated in 2007, which enables researchers to conduct longitudinal comparative analyses. Mega-COV aims to serve as a comprehensive content repository that captures the daily lives and detailed experiences of tens of millions of people during the pandemic, and is particularly suitable for studying a wide range of pandemic-associated phenomena such as information dissemination, public sentiment shifts and behavioral changes.

提供机构：

英属哥伦比亚大学自然语言处理实验室

创建时间：

2020-05-02

搜集汇总

数据集介绍

背景与挑战

背景概述

Mega-COV是一个从Twitter收集的十亿级多语言数据集，专门用于COVID-19研究，覆盖234个国家和100多种语言，时间跨度从2007年开始，包含约3200万条带地理位置标记的推文，具有多样性和纵向特点。数据集仅发布推文ID，需遵守Twitter服务条款和CC BY-NC-SA 4.0许可证，并强调伦理使用，避免推断用户敏感信息。

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集