Tuesday, November 25, 2014

Are mutants a valid substitute for real faults in software testing?

My colleagues and I won an ACM Distinguished Paper award for our paper “Are mutants a valid substitute for real faults in software testing?".

Both practitioners and researchers need to evaluate the quality of test suites  for example, researchers want to know whether a new testing technique improves a test suite.  The true measure of a test suite's quality is how many real faults it detects.  The set of real faults is typically unknown, so the state-of-the-art approach is to measure how many artificial faults, called mutants, a test suite detects.  Hundreds of research papers make the assumption that if a test suite detects more mutants, then it will detect more real faults as well.  Amazingly, no one knows whether this assumption is true!  Or, no one did until our research.

The paper reports on extensive experimentation that shows that mutant detection is the best available proxy for test suite quality.  We also showed how mutation analysis can be improved and identified its fundamental limitations that prevent it from perfectly predicting real fault detection.  Our analysis accounts for confounding factors such as code coverage.  In addition to these experimental results, the real faults and test suites we assembled can be used in future testing research.


The paper was presented on November 20 at FSE, one of the two top software engineering conferences.  The paper was authored by René Just, Darioush Jalali, and Michael Ernst of UW CSE, Laura Inozemtseva and Reid Holmes (a former postdoc at UW) of the University of Waterloo, and Gordon Fraser of the University of Sheffield.  This is the second ACM Distinguished Paper award this year for René, Gordon, and me.

You can read the paper at http://homes.cs.washington.edu/~mernst/pubs/mutation-effectiveness-fse2014.pdf.  You can obtain the tools and experimental data at http://mutation-testing.org and http://defects4j.org.

Friday, July 25, 2014

UW CSE at ISSTA

René Just and Michael Ernst of UW CSE, along with their colleague Gordon Fraser of the University of Sheffield, have been awarded an ACM Distinguished Paper award for their paper “Efficient mutation analysis by propagating and partitioning infected execution states”.  This paper will be presented on July 25 at ISSTA, the premier conference in software testing and analysis.  The paper speeds up mutation analysis by 40% over the previous state of the art.  Mutation analysis is widely used in testing research, because it is the most precise known approach to measure the quality of a test suite.  This is Michael Ernst's 6th ACM Distinguished Paper award, not to mention other best paper awards he has received.  The photo shows Ernst and Just flanking the conference chairs Corina Pasareanu and Darko Marinov.

UW CSE is well-represented elsewhere in the conference, with 2 other technical papers and 3 tool demos.  One other paper is “Empirically revisiting the test independence assumption” by Sai Zhang, Darioush Jalali, Jochen Wuttke, Kıvanç Muşlu, Wing Lam, Michael D. Ernst, and David Notkin, which shows how and when tests cases depend on one another, so that a test fails depending on what other test cases are run before it.  Another paper is “A type system for format strings” by Konstantin Weitz, Gene Kim, Siwakorn Srisakaokul, and Michael D. Ernst, which shows how to verify correct use of format routines such as printf.

Wednesday, April 16, 2014

ISSTA 2014 papers

My research group had 3 papers accepted to ISSTA, the International Symposium on Software Testing and Analysis.  Here are brief descriptions of them.

“Empirically revisiting the test independence assumption”

by Sai Zhang, Darioush Jalali, Jochen Wuttke, Kıvanç Muşlu, Wing Lam, Michael D. Ernst, and David Notkin.

Two tests are independent (non-interfering) if running one test does not change whether the other test fails or succeeds.  If tests are dependent, then reordering them (as in test prioritization or test selection) could cause a previously-passing test to suddenly start failing, even though the program has not changed.

Both practitioners and researchers generally ignore the problem of test dependence:  they assume that test cases are independent.  We developed presents algorithms and tools for detecting dependent tests.  Our paper shows that test dependence does occur in real-world test suites, and it shows that test dependence can cause both false alarms and failed alarms in prioritized (reordered) test suites.

You can also read the abstract.

“Efficient mutation analysis by propagating and partitioning infected execution states”

by René Just, Michael D. Ernst, and Gordon Fraser.
Abstract:

Given two test suites, mutation analysis ranks one of them as better than the other.  Mutation analysis is notoriously compute-intensive, because it runs many variants of the program over each test suite.  This paper reduces the cost of mutation analysis by 40%, by eliminating redundant executions that reveal no information.

The technique works by first running an instrumented version of the program over the test suite.  This pre-processing step more than pays for itself by enabling other executions to be skipped:

  • mutated expressions that produce the same value as the original program expression did,
  • mutanted expressions that produce a different value, but a containing expression produces the same value as it did in the original program, and
  • different mutants that all compute the same value (all but one of which can be skipped).

You can also read the abstract.

“A type system for format strings”

by Konstantin Weitz, Gene Kim, Siwakorn Srisakaokul, and Michael D. Ernst.

Most programming languages support format strings, but their use is error-prone. Using the wrong format string syntax or passing the wrong number or type of arguments leads to unintelligible text output, program crashes, or security vulnerabilities.

We have designed and implemented a type system that indicates erroneous usage, or statically guarantees that calls to these APIs will never fail.  The annotation burden for the user of our type system is low, and it found over 100 bugs in open-source projects.

You can also read the abstract.

Reading the papers

The papers are not publicly available yet, because we are editing our submissions in response to suggestions from the referees.  (Whether or not a paper is accepted, it is always a pleasure to receive feedback from experts who have read your paper and thought about ways to improve the research.)  The camera-ready deadline is June 6.  In the meanwhile, if you are interested in a paper and are willing to provide feedback, I would be happy to send you a preprint; just ask.