氨、乙醛、丙酮、乙烯、乙醇和甲苯6 种气体数据集
收藏帕依提提2024-03-04 收录
下载链接:
https://www.payititi.com/opendatasets/show-26469.html
下载链接
链接失效反馈官方服务:
资源简介:
该档案包含来自 16 个化学传感器的 13910 个测量值,用于模拟在不同浓度水平的 6 种气体的区分任务中进行漂移补偿。目标是随着时间的推移实现良好的性能(或尽可能低的退化),如以下第 2 节:数据收集中提到的论文所述。提供此数据集的主要目的是使其可供化学传感器研究界和人工智能免费在线访问,以制定应对传感器/概念漂移的策略。该数据集可以专门用于研究目的。完全排除商业目的。 该数据集是在 2007 年 1 月至 2011 年 2 月(36 个月)期间在位于加州大学圣地亚哥分校生物电路研究所化学信号实验室的气体输送平台设施中收集的。完全由完全计算机化的环境操作 - 由 LabVIEW 控制 - 配备适当串行数据采集板的 PC 上的 National Instruments 软件。测量系统平台提供了多功能性,以高精度和高度可重复的方式获得目标化学物质的所需浓度,从而最大限度地减少由人为干预引起的常见错误,并可以专注于化学传感器以补偿实际漂移。 所得数据集包括来自六种不同纯气态物质的记录,即氨、乙醛、丙酮、乙烯、乙醇和甲苯,每种物质的剂量范围从 5 到 1000 ppmv。有关此数据集中考虑的测量记录的气体身份名称、浓度值和时间分布序列的详细信息,请参阅下面引用的手稿的表 1 和表 2。 Batch10.dat 已于 2013 年 10 月 14 日更新,以更正文件最后 120 行中的一些损坏值。此数据集的浓度值扩展可在不同浓度数据集的气体传感器阵列漂移数据集. Attribute Information: 所述传感器的响应以跨过每个传感器的有源层的电阻的形式读出;因此,每次测量都会产生一个 16 通道的时间序列,每个通道都由一个特征集合表示,这些特征反映了传感器表面发生的所有动态过程,以响应被评估的化学物质。特别是,在创建此数据集时考虑了两种不同类型的特征:(i)所谓的稳态特征(ΔR),定义为最大电阻变化与基线及其标准化版本的差异当化学蒸气存在于测试室中时,由最大电阻值与基线值的比值表示。和 (ii),在受控条件下的整个测量过程中,反映传感器响应的增加/衰减瞬态部分的传感器动态特性的集合,即指数移动平均 (emaα)。这些特征的聚合是一种变换,借用了最初由 Muezzinoglu 等人引入化学传感社区的计量经济学领域。 (2009),通过估计其指数移动平均值 (emaα) 的最大值——传感器响应衰减部分的最小值,将所述瞬态部分转换为实标量,初始条件设置为0 和算子的标量平滑参数 α,它定义了特征的质量及其沿时间序列出现的时间标量,设置范围在 0 和 1 之间。特别是 α 的三个不同值± 被设置为从传感器响应的预先记录的上升部分和三个具有相同 α 值的附加特征获得三个不同的特征值,但用于传感器响应的衰减部分,从而覆盖整个传感器响应动态。有关这些功能的更详细分析和讨论以及它们的图形说明,请分别参阅注释手稿的第 2.3 节和图 2。 一旦计算出上述特征,一个是形成一个特征向量,其中包含从每个特定传感器中提取的 8 个特征乘以这里考虑的 16 个传感器。最后,得到的包含上述所有特征(8 个特征 × 16 个传感器)的 128 维特征向量组织如下:ΔR_1、|ΔR|_1、EMAi0.001_1、EMAi0.01_1、EMAi0.1_1、EMAd0.001_1、EMAd0.01_1、EMAd0.1_1、ΔR_2、|ΔR|_2、EMAi2,.00 EMAi0.01_2, EMAi0.1_2, EMAd0.001_2, EMAd0.01_2, EMAd0.1_2,..., ΔR_16, |ΔR|_16, EMAi0.001_16, EMAi0.01_16, EMA_16,010dEMA_16,01d.1 , EMAd0.01_16, EMAd0.1_16,其中:“ΔR_1”和“|ΔR|_1”分别是ΔR和标准化ΔR特征,分别是“EMAi0.001_1”、“EMAi0.01_1”和“ €œEMAi0.1_1â€,传感器响应的上升瞬态部分的 emaα 分别等于 0.001、0.01 和 0.1,以及“EMAd0.001_1”、“EMAd0.01_1”和“ “EMAd0.1_1”,α 的传感器响应衰减瞬态部分的 emaα 分别等于 0.001、0.01 和 0.1,均对应于传感器 #1; “ΔR_2”和“|ΔR|_2”分别是ΔR和标准化ΔR特征,分别是“EMAi0.001_2”、“EMAi0.01_2”和“EMAi0” .1_2”,α 的传感器响应上升瞬态部分的 emaα 分别等于 0.001、0.01 和 0.1,以及“EMAd0.001_2”、“EMAd0.01_2”和“EMAd0”。 1_2',α 的传感器响应衰减瞬态部分的 emaα 分别等于 0.001、0.01 和 0.1,均对应于传感器 #2;依此类推,直到第 16 号传感器,从而形成要提取到分类器进行训练的 128 维特征向量。 出于处理目的,数据被组织成十个批次,每个批次包含下表所示的每个班级和每个月的测量数量。进行这种数据重组是为了确保在训练分类器时,每个班级和每个月都有足够多且尽可能均匀分布的实验次数。 数据集组织详细信息。每行对应于组合成一个批次的月份: 批次 ID 月份 ID 批次 1 第 1 个月和第 2 个月 批次 2 第 3、4、8、9 和 10 个月 第 3 批第 11、12 和 13 个月 第 4 批第 14 个月和第 15 个月 批次 5 月 16 第 6 批第 17、18、19 和 20 个月 批次 7 月 21 第 8 批第 22 个月和第 23 个月 第 9 批第 24 个月和第 30 个月 批次 10 月 36 数据格式遵循与libsvm相同的编码风格,其中一个表示每个数据点所属的类别(1:乙醇;2:乙烯;3:氨;4:乙醛;5:丙酮;6:甲苯),以及,然后是 x:v 格式的特征集合,其中 x 代表特征编号,v 代表特征的实际值。例如,在1 1:15596.162100 2:1.868245 3:2.371604 4:2.803678 5:7.512213 … 128:-2.654529 数字“1”代表类别编号(在本例中为乙醇),而其余 128 列列出了按上述方式组织的每个测量记录的实际特征值。最后,为了使相关文章中呈现的结果可供读者重现,请在训练任务中使用以下参数值: Batch C Gamma (ɤ) Rate 1 256.0 0.03125 98.8764 2 64.0 0.00390625 99.7588 3 128.0 0.03125 100.0 4 1.0 0.25 100.0 5 2.0 0.015625 99.4924 6 256.0 0.0009765625 99.5217 7 64.0 0.0625 99.9723 8 1024.0 0.0078125 99.6599 9 2.0 0.00390625 100.0 Citation Request: Please cite: Alexander Vergara and Shankar Vembu and Tuba Ayhan and Margaret A. Ryan and Margie L. Homer and Ramón Huerta, Chemical gas sensor drift compensation using classifier ensembles, Sensors and Actuators B: Chemical (2012) doi: 10.1016/j.snb.2012.01.074.
This archive contains 13910 measurements from 16 chemical sensors, designed for drift compensation tasks in discriminating 6 gases at various concentration levels. The goal is to achieve robust performance (or minimal degradation) over time, as described in the paper referenced in Section 2: Data Collection. The primary purpose of providing this dataset is to make it freely and publicly accessible to the chemical sensor research community and artificial intelligence researchers for developing strategies to address sensor/concept drift. This dataset is exclusively for research purposes; commercial use is strictly prohibited.
This dataset was collected between January 2007 and February 2011 (36 months) at the Gas Delivery Platform facility of the Chemical Signals Laboratory, Institute of Biological Circuits, University of California, San Diego. The entire experimental setup was fully computerized, operated via LabVIEW, and ran on National Instruments software on a PC equipped with appropriate serial data acquisition boards. The measurement system platform offers versatility to obtain desired concentrations of target chemical substances with high precision and high repeatability, thereby minimizing common errors caused by human intervention and enabling focus on chemical sensor drift compensation for real-world scenarios.
The resulting dataset includes records from six distinct pure gaseous substances, namely ammonia, acetaldehyde, acetone, ethylene, ethanol, and toluene, with dosage concentrations ranging from 5 to 1000 ppmv for each. For detailed information on the gas identities, concentration values, and temporal distribution sequences of the measurement records considered in this dataset, please refer to Tables 1 and 2 of the referenced manuscript below. Batch10.dat was updated on October 14, 2013 to correct some corrupted values in the last 120 lines of the file. An extension of the concentration values for this dataset is available in the Gas Sensor Array Drift Dataset of Different Concentrations [referenced as Attribute Information:].
The sensor responses are read out as the resistance across the active layer of each sensor; thus, each measurement yields a 16-channel time series, with each channel represented by a set of features that reflect all dynamic processes occurring on the sensor surface in response to the evaluated chemical substance. Two distinct types of features were considered when creating this dataset: (i) the so-called steady-state feature (ΔR), defined as the difference between the maximum resistance change and the baseline, as well as its normalized version, expressed as the ratio of the maximum resistance value to the baseline value when chemical vapor is present in the test chamber. And (ii) a set of sensor dynamic characteristics that reflect the increasing/decaying transient portions of the sensor response during the entire measurement process under controlled conditions, namely the Exponential Moving Average (emaα). The aggregation of these features is a transformation borrowed from the econometrics field, first introduced to the chemical sensing community by Muezzinoglu et al. (2009), which converts the aforementioned transient portions into real scalars by estimating the maximum value of their exponential moving average (emaα) — the minimum value of the sensor response decay portion — with initial conditions set to 0 and the scalar smoothing parameter α of the operator, which defines the quality of the feature and its temporal scalar along the time series, set in the range of 0 and 1. Specifically, three different values of α were chosen to obtain three distinct feature values from the pre-recorded rising portion of the sensor response, and three additional features with the same α values for the decaying portion of the sensor response, thereby covering the entire sensor response dynamics. For a more detailed analysis and discussion of these features as well as their graphical illustrations, please refer to Section 2.3 of the referenced manuscript and Figure 2, respectively.
Once the aforementioned features are calculated, a feature vector is formed, which contains 8 features extracted from each specific sensor multiplied by the 16 sensors considered here. Finally, the resulting 128-dimensional feature vector containing all the above features (8 features × 16 sensors) is organized as follows: ΔR_1, |ΔR|_1, EMAi0.001_1, EMAi0.01_1, EMAi0.1_1, EMAd0.001_1, EMAd0.01_1, EMAd0.1_1, ΔR_2, |ΔR|_2, EMAi0.001_2, EMAi0.01_2, EMAi0.1_2, EMAd0.001_2, EMAd0.01_2, EMAd0.1_2, ..., ΔR_16, |ΔR|_16, EMAi0.001_16, EMAi0.01_16, EMAi0.1_16, EMAd0.001_16, EMAd0.01_16, EMAd0.1_16. Where: "ΔR_1" and "|ΔR|_1" are the ΔR and normalized ΔR features, respectively; "EMAi0.001_1", "EMAi0.01_1", and "EMAi0.1_1" are the emaα values of the rising transient portion of the sensor response with α equal to 0.001, 0.01, and 0.1, respectively; and "EMAd0.001_1", "EMAd0.01_1", and "EMAd0.1_1" are the emaα values of the decaying transient portion of the sensor response with α equal to 0.001, 0.01, and 0.1, respectively, all corresponding to Sensor #1. "ΔR_2" and "|ΔR|_2" are the ΔR and normalized ΔR features, respectively; "EMAi0.001_2", "EMAi0.01_2", and "EMAi0.1_2" are the emaα values of the rising transient portion of the sensor response with α equal to 0.001, 0.01, and 0.1, respectively; and "EMAd0.001_2", "EMAd0.01_2", and "EMAd0.1_2" are the emaα values of the decaying transient portion of the sensor response with α equal to 0.001, 0.01, and 0.1, respectively, all corresponding to Sensor #2; and so on until Sensor #16, thus forming the 128-dimensional feature vector to be fed into the classifier for training.
For processing purposes, the data is organized into ten batches, each containing the number of measurements per class and per month as shown in the table below. This data reorganization was performed to ensure that when training the classifier, there are sufficient and as evenly distributed experimental counts for each class and each month. Detailed dataset organization: Each row corresponds to the months grouped into one batch:
Batch ID | Months Included
Batch 1 | Months 1 and 2
Batch 2 | Months 3, 4, 8, 9, and 10
Batch 3 | Months 11, 12, and 13
Batch 4 | Months 14 and 15
Batch 5 | Month 16
Batch 6 | Months 17, 18, 19, and 20
Batch 7 | Month 21
Batch 8 | Months 22 and 23
Batch 9 | Months 24 and 30
Batch 10 | Month 36
The data format follows the same encoding style as libsvm, where one integer represents the class that each data point belongs to (1: ethanol; 2: ethylene; 3: ammonia; 4: acetaldehyde; 5: acetone; 6: toluene), followed by a set of features in the format x:v, where x represents the feature index, and v represents the actual value of the feature. For example, in the line "1 1:15596.162100 2:1.868245 3:2.371604 4:2.803678 5:7.512213 … 128:-2.654529", the number "1" represents the class label (ethanol in this case), while the remaining 128 columns list the actual feature values of each measurement record organized as described above.
Finally, to enable readers to reproduce the results presented in the related article, the following parameter values should be used in the training task:
| Batch ID | C | Gamma (γ) | Accuracy Rate |
|----------|---------|-----------|---------------|
| 1 | 256.0 | 0.03125 | 98.8764 |
| 2 | 64.0 | 0.00390625| 99.7588 |
| 3 | 128.0 | 0.03125 | 100.0 |
| 4 | 1.0 | 0.25 | 100.0 |
| 5 | 2.0 | 0.015625 | 99.4924 |
| 6 | 256.0 | 0.0009765625 | 99.5217 |
| 7 | 64.0 | 0.0625 | 99.9723 |
| 8 | 1024.0 | 0.0078125 | 99.6599 |
| 9 | 2.0 | 0.00390625| 100.0 |
Citation Request: Please cite: Alexander Vergara and Shankar Vembu and Tuba Ayhan and Margaret A. Ryan and Margie L. Homer and Ramón Huerta, Chemical gas sensor drift compensation using classifier ensembles, Sensors and Actuators B: Chemical (2012) doi: 10.1016/j.snb.2012.01.074.
提供机构:
帕依提提
搜集汇总
数据集介绍

背景与挑战
背景概述
该数据集包含16个化学传感器对6种气体(氨、乙醛、丙酮、乙烯、乙醇和甲苯)的13910个测量值,用于研究气体区分任务中的漂移补偿。数据集收集于2007年至2011年,主要用于学术研究,不适用于商业用途。
以上内容由遇见数据集搜集并总结生成



