Magma Experiment for Code/Bug-based Coverage Benchmarking
收藏DataCite Commons2026-01-12 更新2026-05-04 收录
下载链接:
https://edmond.mpg.de/citation?persistentId=doi:10.17617/3.UQGK4A
下载链接
链接失效反馈官方服务:
资源简介:
<h1>Magma Experiment for Code- and Bug-based Coverage Benchmarking</h1>
<h2>Overview</h2>
<p>
This dataset contains the <b>raw experimental artifacts</b> for the Magma portion of our study on the
concordance (split-half reliability) of <b>coverage-based</b> and <b>bug-based</b> fuzzer benchmarking procedures. [oai_citation:0‡FSE2026.pdf](sediment://file_000000006394720c8478a4c3a5cd9f3e)
</p>
<p>
The artifacts are intentionally released at a low level (queues, logs, and bug information) so that others can
<b>reproduce</b>, <b>audit</b>, and <b>recompute</b> outcomes under alternative analysis choices.
</p>
<p>
Note: This dataset is <b>not linked directly from the paper</b>. The paper links to a companion code repository, and
that repository links to this dataset.
</p>
<hr/>
<h2>Experimental Setup (Summary)</h2>
<ul>
<li><b>Benchmark suite:</b> Magma v1.2.0</li>
<li><b>Benchmarks used:</b> 18 fuzz drivers (benchmarks) across multiple programs</li>
<li><b>Fuzzers:</b> 8 fuzzers</li>
<li><b>Trials:</b> 20 independent trials per (fuzzer, benchmark) combination (subject to supported combinations)</li>
<li><b>Campaign length:</b> 23 hours per trial</li>
<li><b>Bug evaluation:</b> Magma canary-based ground truth; we count <b>triggered bugs</b></li>
<li><b>Coverage evaluation:</b> branch coverage (LLVM tooling)</li>
</ul>
<hr/>
<h2>What This Dataset Contains</h2>
<p>
This dataset includes the raw outputs needed to compute bug-based results directly and to compute coverage-based
results via offline replay:
</p>
<ul>
<li><b>Execution queues</b> produced during fuzzing campaigns (per fuzzer, benchmark, trial)</li>
<li><b>Raw bug information</b> from Magma’s canary-based bug reporting (including logs/outputs per run)</li>
<li><b>Per-run metadata</b> required to group results and reproduce the analysis workflow</li>
</ul>
<hr/>
<h2>Important: How Coverage Results Are Obtained</h2>
<p>
<b>Coverage is not precomputed in this dataset.</b> This dataset includes the execution queues and other artifacts,
but <b>coverage values must be derived</b> using the companion code repository.
</p>
<p>
To extract coverage:
</p>
<ol>
<li>Obtain the companion code repository referenced by the paper.</li>
<li>Configure it to point to this dataset on disk.</li>
<li>Run the provided pipeline to replay the execution queues and compute <b>branch coverage</b> via LLVM tooling.</li>
</ol>
<p>
If you download only this dataset without the companion code, you can still analyze <b>bug-based</b> outcomes,
but you <b>cannot</b> reproduce the paper’s coverage-based results.
</p>
<hr/>
<h2>Directory Layout</h2>
<p>
The dataset is organized hierarchically by <b>fuzzer</b>, <b>benchmark</b>, and <b>trial</b>.
The exact naming is intended to match the assumptions of the companion analysis scripts.
</p>
<p>
Conceptually:
</p>
<pre>
&lt;fuzzer&gt;/&lt;benchmark&gt;/&lt;trial&gt;/...
</pre>
<p>
Each trial directory corresponds to one 23-hour campaign run and contains the artifacts produced by that run
(e.g., queues, logs, bug-triggering information, and auxiliary metadata).
</p>
<hr/>
<h2>File Count</h2>
<p>
At release time, this dataset contains exactly <b>2,860 files</b>.
</p>
<hr/>
<h2>Known Omissions / Unsupported Combinations</h2>
<ul>
<li><b>SymCC × exif</b> is not present because SymCC does not compile on the <b>exif</b> benchmark; therefore, no such run exists.</li>
</ul>
<hr/>
<h2>Data Completeness Note (Minimal)</h2>
<p>
A small number of runs (seven) were re-executed after a temporary server failure to complete the expected trial matrix, but were excluded from the paper.
The released dataset is intended to be <b>complete</b> for the supported fuzzer–benchmark combinations.
</p>
<hr/>
<h2>Relationship to Paper and Code</h2>
<ul>
<li><b>Paper:</b> defines the research questions, metrics, and statistical analysis. [oai_citation:1‡FSE2026.pdf](sediment://file_000000006394720c8478a4c3a5cd9f3e)</li>
<li><b>Companion code repository:</b> performs coverage extraction (queue replay), aggregation, ranking, and concordance computations.</li>
<li><b>This dataset:</b> provides the raw artifacts consumed by that code.</li>
</ul>
<hr/>
<h2>Intended Use</h2>
<ul>
<li>Reproduction of bug-based outcomes and reanalysis of trial variability</li>
<li>Recomputation of coverage-based outcomes via replay using the companion code</li>
<li>Alternative analyses of bug and code coverage.</li>
</ul>
<p>
This dataset is <b>not</b> a pre-aggregated results table; it is raw experimental output.
</p>
<hr/>
<h2>License</h2>
<p>
This dataset is released under <b>CC BY 4.0</b>.
</p>
<hr/>
<h2>Citation</h2>
<p>
Please cite this dataset via its DOI (doi:10.17617/3.UQGK4A). When referencing the methodology or results, also cite the accompanying paper, <i>In Bugs We Trust? On Measuring the Randomness of a Fuzzer Benchmarking Outcome<i>
</p>
提供机构:
Edmond
创建时间:
2025-07-09



