For the effective prevention and elemination of defects and failures in a
software system, it is important to know which parts of the software are mor
likely to contain errors, and therefore, can be considered as "risky". To
increase reliability and quality, more effort should be spent in risky
components during design, implementation, and testing.
Examining the version archive and the code of a large open-source project, we
have investigated the relation between the risk of components as measured by
post-release failures, and different code structures; such as method calls,
variables, exception handling expressions and inheritnace statements. We have
analyzed the different types of usage relations between components, and their
affects on the failures. We utilized three commonly used statistical techniques
to build failure prediction models. As a realistic opponent to our models, we
introduced a "simple prediction model" which makes use of the riskiness
information from the available components, rather than making random guesses.
While the results from the classification experiments supported the use of code
structures to predict failur-proneness, our regression analyses showed that the
design time decisions also effected component riskiness. Our models were able
to make precise predictions, with even only the knowledge of the inheritnace
relations. since inheritance relations are defined aerliest at design time;
based on the results of this study, we can say that it may be possible to
initialize preventive actions against failures even early in the design phase
of a project.