five

BOOM

收藏
魔搭社区2025-11-27 更新2025-05-24 收录
下载链接:
https://modelscope.cn/datasets/Datadog/BOOM
下载链接
链接失效反馈
官方服务:
资源简介:
# Dataset Card for BOOM (Benchmark of Observability Metrics) ## Dataset Summary **BOOM** (**B**enchmark **o**f **O**bservability **M**etrics) is a large-scale, real-world time series dataset designed for evaluating models on forecasting tasks in complex observability environments. Composed of real-world metrics data collected from Datadog, a leading observability platform, the benchmark captures the irregularity, structural complexity, and heavy-tailed statistics typical of production observability data. Unlike synthetic or curated benchmarks, BOOM reflects the full diversity and unpredictability of operational signals observed in distributed systems, covering infrastructure, networking, databases, security, and application-level metrics. Note: the metrics comprising BOOM were generated from internal monitoring of pre-production environments, and **do not** include any customer data. ![Dataset](intro_figure_boom.png) *<center>Figure 1: (A) BOOM consists of data from various domains. (B) Example series from three of the domains. From left to right, these series represent: sum of failed requests on a backend API, grouped by error type and source (Application); CPU limits on a multi-tenant service deployed on a Kubernetes cluster, grouped by tenant (Infrastructure); and sum of command executions on a Redis cache, grouped by command (Database). </center>* Boom consists of approximately 350 million time-series points across 32,887 variates. The dataset is split into 2,807 individual time series with one or multiple variates. Each represents a metric query extracted from user-generated dashboards, notebooks, and monitors. These series vary widely in sampling frequency, temporal length, and number of variates. Looking beyond the basic characteristics of the series, we highlight a few of the typical challenging properties of observability time series (several of which are illustrated in Figure 2): - Zero-inflation: Many metrics track infrequent events (e.g., system errors), resulting in sparse series dominated by zeros with rare, informative spikes. - Highly dynamic patterns: Some series fluctuate rapidly, exhibiting frequent sharp transitions that are difficult to model and forecast. - Complex seasonal structure: Series are often modulated by carrier signals exhibiting non-standard seasonal patterns that differ from conventional cyclic behavior. - Trends and abrupt shifts: Metrics may feature long-term trends and sudden structural breaks, which, when combined with other properties, increase forecasting difficulty. - Stochasticity: Some metrics appear pseudo-random or highly irregular, with minimal discernible temporal structure. - Heavy-tailed and skewed distributions: Outliers due to past incidents or performance anomalies introduce significant skew. - High cardinality: Observability data is often segmented by tags such as service, region, or instance, producing large families of multivariate series with high dimensionality but limited history per variate. ![Dataset](series_examples.png) *<center>Figure 2: Examples of BOOM dataset showing the diversity of its series.</center>* ## Evaluating Models on BOOM We provide code with example evaluations of existing models; see the [code repository](https://github.com/DataDog/toto). ## Dataset Structure Each entry in the dataset consists of: - A multivariate or univariate time series (one metric query with up to 100 variates) - Metadata including sampling start time, frequency, series length and variates number. Figure 3 shows the metadata decomposition of the dataset by number of series. - Taxonomy labels for dataset stratification: - **Metric Type** (e.g., count, rate, gauge, histogram) - **Domain** (e.g., infrastructure, networking, security) ![Metadata](metadata.png) *<center>Figure 3: Representative figure showing the metadata breakdown by variate in the dataset: (left) sampling frequency distribution, (middle) series length distribution, and (right) number of variates distribution.</center>* ## Collection and Sources The data is sourced from an internal Datadog deployment monitoring pre-production systems and was collected using a standardized query API. The data undewent a basic preprocessing pipeline to remove constant or empty series, and to impute missing values. ## Comparison with Other Benchmarks The BOOM Benchmark diverges significantly from traditional time series datasets, including those in the [GiftEval](https://huggingface.co/datasets/Salesforce/GiftEval) suite, when analyzed using 6 standard and custom diagnostic features computed on normalized series. These features capture key temporal and distributional characteristics: - Spectral entropy (unpredictability), - Skewness and kurtosis (distribution shape), - Autocorrelation coefficients (temporal structure), - Unit root tests (stationarity), - Flat spots (sparsity). ![Metadata](density_plots.png) *<center>Figure 4: Distributional comparison of 6 statistical features computed on normalized time series from the BOOM, GIFT-Eval, and LSF benchmark datasets. The broader and shifted distributions in the BOOM series reflect the increased diversity, irregularity, and nonstationarity characteristic of observability data.</center>* BOOM series exhibit substantially higher spectral entropy, indicating greater irregularity in temporal dynamics. Distributions show heavier tails and more frequent structural breaks, as reflected by shifts in skewness and stationarity metrics. A wider range of transience scores highlights the presence of both persistent and highly volatile patterns—common in operational observability data but largely absent from curated academic datasets. Principal Component Analysis (PCA) applied to the full feature set (Figure 1) reveals a clear separation between BOOM and [GiftEval](https://huggingface.co/datasets/Salesforce/GiftEval) datasets. BOOM occupies a broader and more dispersed region of the feature space, reflecting greater diversity in signal complexity and temporal structure. This separation reinforces the benchmark’s relevance for evaluating models under realistic, deployment-aligned conditions. ## Links: - [Research Paper](https://arxiv.org/abs/2505.14766) - [Codebase](https://github.com/DataDog/toto) - [Leaderboard 🏆](https://huggingface.co/spaces/Datadog/BOOM) - [Toto model (Datadog's open-weights model with state-of-the-art performance on BOOM)](https://huggingface.co/Datadog/Toto-Open-Base-1.0) - [Blogpost](https://www.datadoghq.com/blog/ai/toto-boom-unleashed/) ## Citation ```bibtex @misc{cohen2025timedifferentobservabilityperspective, title={This Time is Different: An Observability Perspective on Time Series Foundation Models}, author={Ben Cohen and Emaad Khwaja and Youssef Doubli and Salahidine Lemaachi and Chris Lettieri and Charles Masson and Hugo Miccinilli and Elise Ramé and Qiqi Ren and Afshin Rostamizadeh and Jean Ogier du Terrail and Anna-Monica Toon and Kan Wang and Stephan Xie and Zongzhe Xu and Viktoriya Zhukova and David Asker and Ameet Talwalkar and Othmane Abou-Amal}, year={2025}, eprint={2505.14766}, archivePrefix={arXiv}, primaryClass={cs.LG}, url={https://arxiv.org/abs/2505.14766}, } ```

# BOOM(可观测性指标基准)数据集卡片 ## 数据集概述 **BOOM(可观测性指标基准,Benchmark of Observability Metrics)**是一款大规模真实世界时序数据集,旨在针对复杂可观测性环境下的预测任务评估模型。该数据集由领先的可观测性平台Datadog采集的真实指标数据构成,完整捕捉了生产环境可观测性数据典型的不规则性、结构复杂性与重尾统计特性。不同于合成或人工精选的基准数据集,BOOM如实反映了分布式系统中运维信号的全维度多样性与不可预测性,覆盖基础设施、网络、数据库、安全以及应用层等多个领域的指标。 注意:构成BOOM的指标来源于预生产环境的内部监控,**不包含任何客户数据**。 ![Dataset](intro_figure_boom.png) *<center>图1:(A) BOOM包含来自多个领域的数据。(B) 三个领域的示例时序序列。从左到右,这些序列分别代表:后端API按错误类型与来源分组的失败请求总数(应用领域);Kubernetes集群上部署的多租户服务按租户分组的CPU限制值(基础设施领域);Redis缓存按命令分组的命令执行总数(数据库领域)。</center>* BOOM包含约3.5亿个时序点,横跨32887个指标变量。数据集被划分为2807个独立时序序列,每个序列包含一个或多个指标变量。每个序列对应一个从用户仪表盘、笔记本与监控中提取的指标查询。 这些序列在采样频率、时序长度与指标变量数量上差异显著。除了序列的基本特征外,我们重点列举了可观测性时序序列的几类典型挑战特性(部分特性如图2所示): - 零膨胀(Zero-inflation):大量指标用于追踪低频事件(例如系统错误),导致以零值为主、仅包含少量有意义尖峰的稀疏序列。 - 高动态模式:部分序列波动剧烈,频繁出现急剧变化,难以建模与预测。 - 复杂季节结构:序列通常受载波信号调制,呈现出与传统循环行为不同的非标准季节模式。 - 趋势与突变偏移:指标可能包含长期趋势与突发结构断裂,当与其他特性结合时会增加预测难度。 - 随机性:部分指标呈现伪随机或高度不规则的特征,几乎没有可识别的时序结构。 - 重尾与偏态分布:过往事件或性能异常导致的异常值会引入显著偏态。 - 高基数:可观测性数据通常按服务、区域或实例等标签进行分段,生成大量高维度但每个指标变量历史数据有限的多变量时序序列家族。 ![Dataset](series_examples.png) *<center>图2:BOOM数据集的示例序列,展示了其序列的多样性。</center>* ## 基于BOOM的模型评估 我们提供了包含现有模型示例评估的代码,详见[代码仓库](https://github.com/DataDog/toto)。 ## 数据集结构 数据集中的每个条目包含: - 单变量或多变量时序序列(一个指标查询最多包含100个指标变量) - 元数据,包括采样开始时间、频率、序列长度与指标变量数量。图3展示了按序列数量分解的数据集元数据。 - 用于数据集分层的分类标签: - **指标类型**(例如计数、速率、计量值、直方图) - **领域**(例如基础设施、网络、安全) ![Metadata](metadata.png) *<center>图3:展示数据集按指标变量划分的元数据分解的代表性图表:(左)采样频率分布,(中)序列长度分布,(右)指标变量数量分布。</center>* ## 采集与来源 数据来源于内部Datadog部署的预生产系统监控,通过标准化查询API采集。数据经过基础预处理流程,以移除恒定或空序列,并补全缺失值。 ## 与其他基准数据集的对比 BOOM基准与传统时序数据集(包括[GiftEval](https://huggingface.co/datasets/Salesforce/GiftEval)套件)存在显著差异,我们通过6种基于归一化序列计算的标准与自定义诊断特征进行了分析。这些特征捕捉了关键的时序与分布特性: - 频谱熵(不可预测性) - 偏度与峰度(分布形状) - 自相关系数(时序结构) - 单位根检验(平稳性) - 平坦区间(稀疏性) ![Metadata](density_plots.png) *<center>图4:基于BOOM、GIFT-Eval与LSF基准数据集的归一化时序序列计算的6种统计特征的分布对比。BOOM序列更宽且偏移的分布,反映了可观测性数据典型的更高多样性、不规则性与非平稳性。</center>* BOOM序列的频谱熵显著更高,表明时序动态的不规则性更强。其分布展现出更重的尾部与更频繁的结构突变,这一点可通过偏度与平稳性指标的偏移反映。更广泛的瞬态分数区间表明,数据中同时存在持久模式与高波动模式——这类模式在运维可观测性数据中十分常见,但在精选的学术数据集中基本缺失。 主成分分析(PCA)应用于完整特征集(如图1所示),清晰区分了BOOM与[GiftEval](https://huggingface.co/datasets/Salesforce/GiftEval)数据集。BOOM在特征空间中占据更广阔且分散的区域,反映出信号复杂性与时序结构的更高多样性。这种差异进一步验证了该基准在评估贴合真实部署场景的模型时的相关性。 ## 相关链接 - [研究论文](https://arxiv.org/abs/2505.14766) - [代码库](https://github.com/DataDog/toto) - [排行榜 🏆](https://huggingface.co/spaces/Datadog/BOOM) - [Toto模型(Datadog开源权重模型,在BOOM数据集上达到当前最优性能)](https://huggingface.co/Datadog/Toto-Open-Base-1.0) - [博客文章](https://www.datadoghq.com/blog/ai/toto-boom-unleashed/) ## 引用 bibtex @misc{cohen2025timedifferentobservabilityperspective, title={This Time is Different: An Observability Perspective on Time Series Foundation Models}, author={Ben Cohen and Emaad Khwaja and Youssef Doubli and Salahidine Lemaachi and Chris Lettieri and Charles Masson and Hugo Miccinilli and Elise Ramé and Qiqi Ren and Afshin Rostamizadeh and Jean Ogier du Terrail and Anna-Monica Toon and Kan Wang and Stephan Xie and Zongzhe Xu and Viktoriya Zhukova and David Asker and Ameet Talwalkar and Othmane Abou-Amal}, year={2025}, eprint={2505.14766}, archivePrefix={arXiv}, primaryClass={cs.LG}, url={https://arxiv.org/abs/2505.14766}, }
提供机构:
maas
创建时间:
2025-05-23
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作