|
CMU-ISRI-05-121
Institute for Software Research International
School of Computer Science, Carnegie Mellon University
CMU-ISRI-05-121
Finding Predictors of Field Defects for Open Source
Software Systems in Commonly in Commonly Available Data Sources:
A Case Study of OpenBSD
Paul Luo Li, May Shaw, Jim Herbsleb
June 2005
This paper is an expanded version of the paper titled:
Finding Predictors of Field Defects for Open Source Software Systems
in Commonly Available Data Sources: A Case Study of OpenBSD,
in METRICS, 2005.
CMU-ISRI-05-121.ps
CMU-ISRI-05-121.pdf
Keywords: Process metrics, product metrics, software science,
software quality assurance, measurement, documentation, reliability,
experimentation, field defect prediction, open source software,
reliability modeling, CVS repository, request tracking system,
mailing list archives, deployment and usage metrics, software and
hardware configurations metrics
Open source software systems are important components of many business
software applications. Field defect predictions for open source
software systems may allow organizations to make informed decisions
regarding open source software components. In this paper, we remotely
measure and analyze predictors (metrics available before release) mined
from established data sources (the code repository and the request tracking
system) as well as a novel source of data (mailing list archives) for
nine releases of OpenBSD. First, we attempt to predict field defects by
extending a software reliability model fitted to development defects.
We find this approach to be infeasible, which motivates examining
metrics-based field defect prediction. Then, we evaluate 139 predictors
using established statistical methods: Kendall s rank correlation,
Pearson s rank correlation, and forward AIC model selection. The
metrics we collect include product metrics, development metrics,
deployment and usage metrics, and software and hardware configurations
metrics. We find the number of messages to the technical discussion
mailing list during the development period (a deployment and usage
metric captured from mailing list archives) to be the best predictor
of field defects. Our work identifies predictors of field defects in
commonly available data sources for open source software systems and
is a step towards metrics-based field defect prediction for
quantitatively-based decision making regarding open source software
components.
31 pages
|