3W Dataset|油井安全数据集|机器学习数据集
收藏3W 数据集概述
数据集描述
3W 数据集是首个公开的、包含石油井中罕见不良实际事件的真实数据集,可作为开发与实际数据固有困难相关的机器学习技术的基准数据集。该数据集由8种不良事件的实例组成,涉及8个过程变量,经过专家验证的历史实例以及模拟和手绘实例。
数据集结构
3W 数据集包含1,984个CSV文件,存储在7z格式的文件中,位于data
目录下。每个文件代表一个实例,文件名揭示其来源。数据格式为每行一个观测值,每列一个系列,列之间用逗号分隔,小数点用点表示。首列为时间戳,末列为观测标签,其余列为多变量时间序列数据。
引用信息
使用3W数据集时,应引用以下文献:
@article{VARGAS2019106223, title = "A realistic and public dataset with rare undesirable real events in oil wells", journal = "Journal of Petroleum Science and Engineering", volume = "181", pages = "106223", year = "2019", issn = "0920-4105", doi = "https://doi.org/10.1016/j.petrol.2019.106223", url = "http://www.sciencedirect.com/science/article/pii/S0920410519306357", author = "Ricardo Emanuel Vaz Vargas and Celso José Munaro and Patrick Marques Ciarelli and André Gonçalves Medeiros and Bruno Guberfain do Amaral and Daniel Centurion Barrionuevo and Jean Carlos Dias de Araújo and Jorge Lins Ribeiro and Lucas Pierezan Magalhães", keywords = "Fault detection and diagnosis, Oil well monitoring, Abnormal event management, Multivariate time series classification", abstract = "Detection of undesirable events in oil and gas wells can help prevent production losses, environmental accidents, and human casualties and reduce maintenance costs. The scarcity of measurements in such processes is a drawback due to the low reliability of instrumentation in such hostile environments. Another issue is the absence of adequately structured data related to events that should be detected. To contribute to providing a priori knowledge about undesirable events for diagnostic algorithms in offshore naturally flowing wells, this work presents an original and valuable dataset with instances of eight types of undesirable events characterized by eight process variables. Many hours of expert work were required to validate historical instances and to produce simulated and hand-drawn instances that can be useful to distinguish normal and abnormal actual events under different operating conditions. The choices made during this datasets preparation are described and justified, and specific benchmarks that practitioners and researchers can use together with the published dataset are defined. This work has resulted in two relevant contributions. A challenging public dataset that can be used as a benchmark for the development of (i) machine learning techniques related to inherent difficulties of actual data, and (ii) methods for specific tasks associated with detecting and diagnosing undesirable events in offshore naturally flowing oil and gas wells. The other contribution is the proposal of the defined benchmarks." }
数据集使用
数据集提供了一些基准实验的结果,包括:
- 基准1:使用模拟和手绘实例的影响(代码和结果链接)
- 基准2:异常检测(代码和结果链接)
这些结果可作为研究人员和实践者的基准参考。

糖尿病预测数据集
糖尿病相关的医学研究或者健康数据
AI_Studio 收录
China Groundgroundwater Monitoring Network
该数据集包含中国地下水监测网络的数据,涵盖了全国范围内的地下水位、水质和相关环境参数的监测信息。数据包括但不限于监测站点位置、监测时间、水位深度、水质指标(如pH值、溶解氧、总硬度等)以及环境因素(如气温、降水量等)。
www.ngac.org.cn 收录
SWaT Dataset
SWaT Dataset是一个用于工业控制系统(ICS)安全研究的数据集,包含了模拟的网络攻击和正常操作的数据。该数据集由新加坡科技设计大学(Singapore University of Technology and Design)发布,旨在帮助研究人员开发和测试用于检测工业控制系统中网络攻击的算法和模型。
itrust.sutd.edu.sg 收录
Materials Project
材料项目是一组标有不同属性的化合物。数据集链接: MP 2018.6.1(69,239 个材料) MP 2019.4.1(133,420 个材料)
OpenDataLab 收录
中国省级灾害统计空间分布数据集(1999-2020年)
该数据集为中国省级灾害统计空间分布数据集,时间为1999-2020年。该数据集包含中国各省自然灾害、地质灾害、地震灾害、森林火灾、森林病虫鼠害、草原灾害六类灾害的详细数据。数据量为206MB,数据格式为excel。
国家地球系统科学数据中心 收录