COVID-19印度数据集
收藏arXiv2021-12-07 更新2024-07-31 收录
下载链接:
http://ibm.biz/covid-data-india
下载链接
链接失效反馈官方服务:
资源简介:
COVID-19印度数据集是由IBM研究院和亚利桑那州立大学合作创建的,旨在自动化提取印度各州每日健康公报中的COVID-19数据。该数据集包含详细的状态指标,如医院化数据、病例的年龄和性别分布等,通过结合经典PDF解析器和先进的机器学习技术实现数据提取。创建过程涉及从各州网站下载公报、定义数据表结构和使用多种技术(如OCR和深度学习)提取数据。该数据集的应用领域包括实时疫情分析、政策制定支持和模型验证,以帮助研究人员和决策者更好地理解和应对疫情。
The COVID-19 India Dataset was co-developed by IBM Research and Arizona State University, with the goal of automating the extraction of COVID-19-related data from daily health bulletins issued by Indian states. This dataset contains detailed state-level metrics such as hospitalization statistics, age and gender distributions of confirmed cases, and so on. The data extraction process combines classic PDF parsers and cutting-edge machine learning technologies. Specifically, the dataset creation workflow includes downloading bulletins from official state websites, defining standardized data table structures, and utilizing multiple technologies including OCR and deep learning to extract the required data. Applications of this dataset cover real-time pandemic analysis, policy-making support and model validation, aiming to assist researchers and policymakers in better understanding and responding to the COVID-19 pandemic.
提供机构:
IBM研究院 和 亚利桑那州立大学
创建时间:
2021-09-28



