Microsensor Beverage Tasting (MicroBeTa)
收藏NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/5457500
下载链接
链接失效反馈官方服务:
资源简介:
MicroBeTa is a dataset for automatic, "electronic tongue" beverage classification. It includes temporal multivariate readings simultaneously acquired from a temperature sensor and solid-state electrochemical microsensors developed and manufactured by the Chemical Transducers Group at the Institute of Microelectronics of Barcelona (IMB-CNM), CSIC:
http://gtq.imb-cnm.csic.es/en
Citing the MicroBeTA dataset
The MicroBeTA is released under a Creative Commons Attribution license, so please cite it if it is used in your work in any form. Published academic papers should use the citation for our Frontiers in Neuroscience paper. Personal works, such as machine learning projects or blog posts, should provide a URL to this Zenodo page, though referencing our research paper would also be appreciated.
Academic paper citation
LeBow N, Rueckauer B, Sun P, Rovira M, Jiménez-Jorquera C, Liu S-C and Margarit-Taulé JM (2021)
Real-Time Edge Neuromorphic Tasting From Chemical Microsensor Arrays. Front. Neurosci. 15:771480.
http://doi.org/10.3389/fnins.2021.771480
Personal use citation
Include a link to this Zenodo page: http://doi.org/10.5281/zenodo.5457501
Description
The dataset includes seven hours of readings from a sensor array acquired every second during three sessions performed over the course of three days at the IBM-CNM. The array comprises one Pt-100 temperature sensor, one microelectrode each for electrical conductivity and oxidation-reduction potential (ORP), and six ISFET sensors sensitive to specific ions (H+, Na+, K+,Ca2+, Cl-, and NO3-).
The beverage types selected for MicroBeTa are five commercial beverage varieties of white wine, red wine, still water, sparkling water and cava. This beverage selection covers a wide range of characteristics within a limited set of classes, with several semi-overlapping sets of attributes that could be expected to provide insight into how the data from various sensors could be used by the classifier, e.g. still and sparkling water, red wine and cava covering four general cases arising from the presence or absence of carbonation and fermentation byproducts, respectively.
All sensors were read out continuously and concurrently during each session, while the sensor array was moved from one beverage sample to another at fixed intervals of five minutes. The sequence of transitions between beverage samples was chosen to cover all combinations from one beverage to another. During each transfer, the sensor array was washed with deionized water before being placed in the next sample to avoid unnecessary cross-contamination of subsequent beverages in the series.
Data Files
clean_dataset.h5: Contains a Python Pandas dataframe including the reading signals from all sensor channels and the labels ('Time', 'H+', 'K+', 'Na+', 'Cl-', 'NO3-', 'Ca2+', 'Conductivity', 'ORP', 'Temperature', and 'Label', respectively), with the washing and transfer periods as well as transient instabilities of individual sensors discarded.
preprocessed_dataset_9cols.h5: Contains a Python Pandas dataframe (['n_output_classes', 'samples_train', 'labels_train', 'samples_test', 'labels_test'] columns) of sensor samples for training and testing a classifier model. The data samples are fixed-length, overlapped time windows containing the signal values from all nine sensors ('Temperature','H+', 'K+', 'Na+', 'Cl-', 'NO3-', 'Ca2+', 'Conductivity', and 'ORP', respectively) over a contiguous range of 16 timestamps. The samples are preprocessed as follows:
Incomplete measurement cycles in which not all beverages are recorded, or measurements of specific beverage samples much shorter than others, are removed entirely. Any measurements lasting significantly longer than five minutes are truncated to that length.
A high-pass filter with a cut-off frequency of 0.5 mHz is used to attenuate level offsets in the input signals while emphasizing their dynamic components.
Outliers in which at least one sensor channel contains a value further than four standard deviations from the mean are deleted.
Each sensor channel is normalized independently using quantile normalization.
preprocessed_dataset_7cols.h5: Contains the same Pandas dataframe of sensor samples for training and testing a classifier model as preprocessed_dataset_9cols.h5, but in this case excluding the two least informative sensors ('Temperature' and 'NO3-').
Contact
Further details on the creation and validation of MicroBeTa will be disclosed in our Frontiers paper. If you have any questions or comments about the dataset, please feel free to write to:
josepmaria.margarit@imb-cnm.csic.es
创建时间:
2023-05-15



