CSCI 301 10/17/2023 - Testing Oracle & Testing Coverage
The Infeasibility Problem
Adequacy criteria are sometimes impossible to satisfy: • Syntactically indicated behaviors (paths, data flows, etc.) are sometimes impossible execute • Infeasible control flow, data flow, and data states • Unsatisfactory approaches: • Manual justification for omitting each impossible test case (esp. for more demanding criteria) • Adequacy "scores" based on coverage • example: 95% statement coverage, 80% def-use coverage
Observation 1: We pick the question!
Some questions may be easier to answer than others. Possible sources of information • • Documentation, Specification • Published results, reference values • Subject matter expert For certain questions, answer feasible • with manual calculations: •• Values such as 10,000 as good as 17,845 •• Constants can be 0, coefficients can be 1 • with analytical formulas that are known and tractable •• exact answer, or lower/upper bound
Observation 3: There is a lot of code out there!
Use existing code as an Oracle Examples ••Use other dedicated frameworks you cannot use in production code, e.g. matlab, mathematica •••Use something much more general •••Use something much more special and limited ••Use a different programming environment that is useful •••e.g., scripting may be more productive but not necessarily efficient, platform independent, portable, .. •Use inefficient but readily available data structures •Use inefficient brute force naïve algorithms from a
Summary Testing is never complete, so when is enough?
Various coverage criteria • Statement / Branch / Path coverage • Tool support: • Eclemma delivers stats + color highlighting • Code execution does not mean code is correct! • Code coverage is a "best effort" approach at best, • a compromise between the impossible and the inadequate • a trade-off between time, effort, effectiveness
How do we decide if the observed program behavior is correct? (key question) (The Oracle Problem)
Very often text books dance around this problem: • Correct solutions are trivial: Length of a queue with 3 entries is 3. • Manual calculations to obtain a solution are feasible or known: 2 + 2 = 4 • Observation of failure trivial: SUT crashes
Testing (An Oracle for Testing -- )
• Definition "A test case is a set of inputs, execution conditions, and a pass/fail criterion". ... plus some documentation ... • A test case requires a pass/fail criterion: —> Oracle in testing literature.
Observation 4: Consistency
Consistency is weaker than correctness but still useful • Often solution to one task helps to find solution for other ••Relations: reflexive, symmetric, transitive, ... •• ••Algebraic: associative, commutative, distributive, ... ••Geometric: reflection, rotation ••Permutation of parameter values may lead same answer or inverse answer: •••compare(A,B) vs compare(B,A) Compute solution to a set of related problems and check if answers are consistent Risk: answers are all wrong but cons
Other Challenges in Structural Coverage
Inter-procedural and cross-level coverage • e.g., inter-procedural data flow, call-graph coverage • Regression testing • maintenance of test suite: adjust vs delete • Late binding (OO programming languages) • coverage of actual and apparent polymorphism • Fundamental challenge: Infeasible behaviors • underlies problems in inter-procedural and polymorphic coverage, • obstacle to adoption of more sophisticated coverage criteria and dependence analysis
Coverage Criteria for Software Testing Testing is never complete, so when is enough?
Obviously: • If significant parts of program structure are not tested, testing is surely inadequate. • Leads to idea of "Coverage" • Statement coverage: • Test executes SUT code, let's monitor which lines are executed, which not. • Leads to: • Coverage % value for methods, classes, packages. • Color highlighting as a visualization in code editor.
Observation 2: Redundancy in Algorithms
Often many algorithms known to solve a problem • Start with a "naïve" version that is straightforward ••Little effort to implement ••Less effort to evaluate ••Low risk for mistakes • Implement algorithm you want to use in production code • Use simple algorithm as oracle for production version ••Basically check for consist
Oracle requires some kind of redundant solution for specific test questions Outcome of Oracle needs to be documented in test case
Oracle requires some kind of redundant solution for specific test questions -Manual solution for simple cases Subject matter expert Specification, documentation 2nd implementation own implementation of simple algorithm, alternative code for different environment etc. external existing code (more general or more specialized Outcome of Oracle needs to be documented in test case -case if code fails test, prime suspect of failure is test case not code Test code must be simple enough such that it needs no testing on its own (vicious cycle)
Control Flow Coverage Criteria: Statement, Branch, Path
Statement coverage covers all nodes (lines of code). --*We could also ask to cover - all branches - all paths, subpaths, ... (slides for more info) ) Path coverage • each execution path considered • ac df, • abc df, • ac dedf, • abc dedf, ... • subject to combinatorial explosion Typical loop coverage criterion would require • zero iterations (cdf), • one iteration (cdedf), • and multiple iterations (cdededed...df) Rationale: An untested def-use association could hide an erroneous computation
Measuring Statement Coverage in Eclipse with Eclemma
free Java code coverage tool for Eclipse after installation, simply select under "coverage as" or produces statistics & highlights code (in different colors, check slides for more info)
Branch vs Condition Coverage • Branch coverage • Condition coverage ,b} • Modified condition/decision coverage (MC/DC)
test with a = b = true for if branch (c) test with a = false for else branch (d) --*test with a = true, b = false test with a = false, b = true to cover both cases {true,false} for each basic condition {a,b} --- requires that each basic condition be shown to independently affect the outcome of each decision required e.g. by RTCA/DO-178B "Software considerations in airborne Systems ..." EUROCAE ED-12B as the European equivalent
Various Control Flow Coverage Criteria Exist: 1. Statement (node, basic block) coverage 2. Branch (edge) coverage 3. Condition coverage 4. Path coverage (structured basis or cyclomatic testing) 5. Data flow (syntactic dependency) coverage 6. Function coverage
• Node, single statement • Basic block (several statements in sequence) --• Branch: each branch of if/switch conditions --• Each basic condition evaluated to true and false, (A && B && C) --• Execution paths through a method --• Define-used pairs of data --and there are more • Coverage typically measured as percentage
Control Flow Coverage Criteria in Practice
• Statement or branch coverage is used in practice • Simple lower bounds on adequate testing • Additional control flow heuristics sometimes used • Loops (never, once, many), combinations of conditions • 100% coverage hard to achieve in practice • So what do you do if your coverage statistics is too low? • Bad idea: quickly create/adapt tests to improve stats • Good idea: check what is not covered and why
Observation 5: Bounds and necessary conditions • Bounds give limits to correct results • Example Bounds help to identify clear error cases In general: necessary conditions
••smallest possible value <= lower bound <= smallest correct value ••largest correct value <= upper bound <= largest possible val example: ••Maze on a w * h board: •••Lower bound for path length to exit: Manhattan distance •••Upper bound for path length to exit Bounds help to identify clear error cases ••Result outside of [lb, ub] => result is wrong In general: necessary conditions: •weaker than "<=>" but easier to obtain & implement