five

Annotations and associated frequency signals

收藏
NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://figshare.com/articles/dataset/Annotations_and_associated_frequency_signals/24470620
下载链接
链接失效反馈
官方服务:
资源简介:
Labelled industry datasets are the most valuable asset in prognostics and health management (PHM) research. However, creating labelled industry datasets is both difficult and expensive, making publicly available industry datasets rare at best. While labels are generally unavailable, many industry datasets contain annotations, maintenance work orders, or logbooks, with free-form text containing technical language descriptions of component properties, valuable information for any PHM model. Alas, publicly available annotated industry datasets are also scarce, in particular ones with associated signals available. Therefore, we release data from an annotated process industry dataset, consisting of 21090 pairs of signals and annotations from one year of kraftliner production. The annotations are written, in Swedish, by on-site Swedish experts, and the signals consist of accelerometer vibration measurements from two large (80x10x10m) paper machines. The data is cleaned and structured so that each annotation is associated with ten days of signal measurements leading up to the annotation date, where one signal measurement consists of 8192 samples over 6.4 seconds, which becomes 3200 samples stretching over 500 Hz in the frequency domain. The associated annotations are attached to each signal sample, so that the list of annotations is as long as the list of signals. In total, there are 43 unique annotations, though most are associated with multiple signals from different machines due to commonalities in fault descriptions. The language data is pre-processed so that all letters are lower case, numbers are removed, and names are replaced with the Swedish word "egennamn", meaning "name of a person" in English. Also included are pre-computed embeddings, which facilitates faster and easier testing for researchers wanting to easily investigate training signal encoders supervised through technical language supervision. The data presented here was used in the article "Technical Language Supervision for Intelligent Fault Diagnosis in Process Industry" (https://papers.phmsociety.org/index.php/ijphm/article/view/3137). Please cite this article if you use this dataset. To use this dataset without understanding Swedish, please consult "Processing of Condition Monitoring Annotations with BERT and Technical Language Substitution: A Case Study" (https://www.papers.phmsociety.org/index.php/phme/article/view/3356) on how to augment the technical data to facilitate easier language model translations to other languages, and don't hesitate to contact me if you have questions regarding the data. Accessing the data is simple; all you need to do to load spectra and annotation pairs is: import pandas as pd spectra_note_df = pd.read_pickle("TL_spectra_note_df_big.pkl") all_spectra = TL_spectra_note_df['Spectra'] all_annotations = TL_spectra_note_df['Notes'] Pre-computed embeddings can be accessed through: all_embeddings = TL_spectra_note_df['Embeddings']
创建时间:
2023-11-01
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作