jm1

NIAID Data Ecosystem2026-03-11 收录

下载链接：

https://zenodo.org/records/268514

下载链接

链接失效反馈

官方服务：

资源简介：

jm1', the cleaned version by Shepperd et al., described as jm1' here. jm1'', the cleaned version by Shepperd et al., described as jm1'' here. Title/Topic: JM1/software defect prediction Sources: Creators: NASA, then the NASA Metrics Data Program Contacts: Mike Chapman, Galaxy Global Corporation (Robert.Chapman@ivv.nasa.gov) +1-304-367-8341; Pat Callis, NASA, NASA project manager for MDP (Patrick.E.Callis@ivv.nasa.gov) +1-304-367-8309 Past usage: How Good is Your Blind Spot Sampling Policy?; 2003; Tim Menzies and Justin S. Di Stefano; 2004 IEEE Conference on High Assurance Software Engineering (http://menzies.us/pdf/03blind.pdf). Results: Very simple learners (ROCKY) perform as well in this domain as more sophisticated methods (e.g. J48, model trees, model trees) for predicting detects Many learners have very low false alarm rates. Probability of detection (PD) rises with effort and rarely rises above it. High PDs are associated with high PFs (probability of failure) PD, PF, effort can change significantly while accuracy remains essentially stable With two notable exceptions, detectors learned from one data set (e.g. KC2) have nearly they same properties when applied to another (e.g. PC2, KC2). Exceptions: LinesOfCode measures generate wider inter-data-set variances; Precision's inter-data-set variances vary wildly "Assessing Predictors of Software Defects", T. Menzies and J. DiStefano and A. Orrego and R. Chapman, 2004, Proceedings, workshop on Predictive Software Models, Chicago, Available from http://menzies.us/pdf/04psm.pdf Results: From JM1, Naive Bayes generated PDs of 25% with PF of 20% Naive Bayes out-performs J48 for defect detection When learning on more and more data, little improvement is seen after processing 300 examples. PDs are much higher from data collected below the sub-sub-system level. Accuracy is a surprisingly uninformative measure of success for a defect detector. Two detectors with the same accuracy can have widely varying PDs and PFs. Relevant information: JM1 is written in "C" and is a real-time predictive ground system: Uses simulations to generate predictions Data comes from McCabe and Halstead features extractors of source code. These features were defined in the 70s in an attempt to objectively characterize code features that are associated with software quality. The nature of association is under dispute. Notes on McCabe can be found here and notes on Halstead can be found here. These metrics are widely used for defect prediction purposes, a quick tutorial on defect prediction is given here. Number of instances: 10885 Number of attributes: 22 (5 different lines of code measure, 3 McCabe metrics, 4 base Halstead measures, 8 derived Halstead measures, a branch-count, and 1 goal field). Attribute Information: loc : numeric, McCabe's line count of code v(g) : numeric, McCabe "cyclomatic complexity" ev(g) : numeric, McCabe "essential complexity" iv(g) : numeric, McCabe "design complexity" n : numeric, Halstead total operators + operands v : numeric, Halstead "volume" l : numeric, Halstead "program length" d : numeric, Halstead "difficulty" i : numeric, Halstead "intelligence" e : numeric, Halstead "effort" b : numeric, Halstead t : numeric, Halstead's time estimator lOCode : numeric, Halstead's line count lOComment : numeric, Halstead's count of lines of comments lOBlank : numeric, Halstead's count of blank lines lOCodeAndComment: numeric uniq_Op : numeric, unique operators uniq_Opnd : numeric, unique operands total_Op : numeric, total operators total_Opnd : numeric, total operands branchCount : numeric, the flow graph defects : {false,true}, module has/has not one or more reported defects Missing attributes: none Class Distribution: the class value (defects) is discrete false: 2106 = 19.35% true: 8779 = 80.65% Comments on the Data Martin Shepperd | August 19th, 2011 at 10:52 am I've realised this version of the dataset differs from the version available from the NASA MDP website. The following tabulates the differences for all 13 NASA datasets: {{{ Dataset n cases p features CM1 – NASA 505 40 CM1 – Promise 498 21 JM1 – NASA 10878 21 JM1 – Promise 10885 21 KC1 – NASA 2107 21 KC1 – Promise 2109 21 KC3 – NASA 458 40 KC3 – Promise 458 39 MC1 – NASA 9466 39 MC1 – Promise 9466 38 MC2 – NASA 161 40 MC2 – Promise 161 39 MW1 – NASA 403 40 MW1 – Promise 403 37 PC1 – NASA 1107 40 PC1 – Promise 1109 21 PC2 – NASA 5589 40 PC2 – Promise 5589 36 PC3 – NASA 1563 40 PC3 – Promise 1563 37 PC4 – NASA 1458 40 PC4 – Promise 1458 37 PC5- NASA 17186 39 PC5 – Promise 17186 38 KC4- NASA 125 40 KC2 – Promise 522 22 }}} Therefore researchers are strongly advised to indicate which version of these files they are using! Martin ShepperdAugust 19th, 2011 at 10:54 am: It is also puzzling how features such as loc can contain values like 1.1 (see feature 1, case 1).Martin ShepperdAugust 20th, 2011 at 7:03 am: Sorry small error — the two versions of this dataset have the following sizes: {{{ Version, Cases(n), Features(p) JM1 – NASA, 10878, 24 JM1 – Promise, 10885, 22 }}} Reference M. Shepperd, Q. Song, Z. Sun and C. Mair, Data Quality: Some Comments on the NASA Software Defect Data Sets (under review), 2011

创建时间：

2020-01-24