MSR 2023 Dataset
收藏Figshare2023-03-13 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/MSR_2023_Dataset/22264000
下载链接
链接失效反馈官方服务:
资源简介:
This is the artifact that accompanies the paper titled: "Wasmizer: Curating WebAssembly-driven Projects on GitHub". It contains: - the scripts we used to produce a dataset of WebAssembly binaries - a dataset of WebAssembly binaries (8915 .wasm files and 1384 .wat files) - the WASMIZER tool that can be used to automatically collect a new dataset, with refined information # Scripts The `Scripts/` folder contains the scripts that we used to collect our dataset in December 2022. Since then, we have refined our scripts and made a tool, WASMIZER, described at the end of this README and accessible in the `WASMIZER/` folder. In the `Scripts/` folder, there are three subfolders : - `Collector/` that contains the scripts to collect GitHub projects that may be targeting WebAssembly as a compilation target - `Compiler/` that tries to compile a project and extracts the `.wasm` and `.wat` found after compilation. - `SmellsChecker/` that contains the checkers for the smells used in the case study of the paper. Each of these directories contain a README providing more details. # Dataset The `Dataset/` folder contains our dataset of WebAssembly binaries collected in December 2022. It is structured as follows: - The `wasm/` folder contains WebAssembly binaries in their binary format (`.wasm`) - The `wat/` folder contains WebAssembly binaries in their textual format. The basename of the files are their SHA checksum. Each binary is accompanied by a `.meta` file of the same name containing the project and the path within the project in which it was found. For example, for the binary `00047ad76615715bb2b36fa2102135b8dc32ac3c17f3488451168f808e2039f0.wasm`, there is a `00047ad76615715bb2b36fa2102135b8dc32ac3c17f3488451168f808e2039f0.meta` file containing: ``` ./JuiceFV-Emscripten_OpenGL/application/dependencies/lib_sources/GLM/test/gtx/test-gtx_easing.wasm ``` This indicates that we have found a WebAssembly binary in the `JuiceFV/Emscripten_OpenGL` GitHub project, at location `application/dependencies/lib_sources/GLM/test/gtx/test-gtx_easing.wasm`. When collecting this dataset, a number of metadata have not been collected and are thus missing from this initial snapshot. However, we have bundled a tool described below, which scrapes repositories and extract .wasm files after compilation, which collects more metadata. # WASMIZER The WASMIZER tool is provided in the `WASMIZER/` directory. This is a refined version of our scripts used to collect the dataset. More details can be found in `WASMIZER/README.md`. To obtain the latest version, one can run `git pull` from the `WASMIZER/` directory, or access the repository online at [https://github.com/arash-mazidi/WASMIZER](https://github.com/arash-mazidi/WASMIZER). The tool is deployed and regularly pushes newly found projects, .wasm and .wat files to the following shared folder : https://tucloud.tu-clausthal.de/index.php/s/MMRQMEZm66GRGXI (password: wasmizer).
创建时间:
2023-03-13



