SER330 - Software Testing
Running Tests: Test Results
Test results include: • Number of test cases that were run • Number of test cases that failed • For each test case that failed, details about how it failed
What are some types of static analysis?
Inspections and finite-state verification
Assertions are becoming widely used in industry
Microsoft strongly encourages the use of assertions
• Structural strategies:
Modules constructed, integrated and tested based on a hierarchical project structure • Top-down, Bottom-up, Sandwich, Backbone
Treat Each Participant the Same Way
• "Clear your mental slate" prior to each test session • Leave time for breaks between sessions
Independent Groups Design AKA
(Between-Subjects Design)
Regression test case selection technique
-Code based (~white-box)->Control flow based->Data flow based-Specification/Requirements-based (~black-box)
Fault aka
Defect/bug
Cycle
path whose start node and end node are the same
Reaching Definitions
• Xi means that the definition of variable x at node i {might|must} reach the current node • Also stated as {possible|definite} {some|all} {any|all} {may|must}
What Not to Say to Participants
•10 Saying''Remember,we'renottestingyou''morethanthree times. • 9.Don'tworry,thelastparticipantcouldn'tdoit,either. • 8.Noone'severdonethatbefore. • 7.HA!HA!HA! • 6.That'simpossible!Ididn'tknowitcouldgoinupsidedown! • 5.Couldwestopforawhile?Watchingyoustrugglelikethisis making me tired. • 4.Ididn'treallymeanyoucouldpressanybutton. • 3.Yes,it'sverynaturalforobserverstocryduringatest. • 2.Don'tfeelbad,manypeopletake15or16tries. • 1.Areyousureyou'veusedcomputersbefore?
Writing Tests with JUnit4: Writing Test Cases
"@Test"annotation designates a method that is a test case @org.junit.Test: Nominal behavior expected (i.e. an exception is NOT expected to be thrown) @org.junit.Test(expected=MyException.class): Exceptional behavior expected (i.e. an exception is expected to be thrown) Suppose you want to test method foo. By convention, the method that will test foo should be named testFoo.
Benefits of Starting Testing Activities "Early"
- Tests generated independently from code, when requirements are fresh in the mind of analysts - Faults in requirements/designs can be found during the creation of test cases - e.g., incompleteness faults, ambiguity issues - Cheaper to fix faults earlier! - Test cases can be used by programmers in addition to specifications to understand the intent of the system - Separating test generation from text execution can help with scheduling/reduce total amount of time taken by the project
Test Case Prioritization Approaches
-Random sampling-Prioritized rotating selection: Basic idea: (Execute all test cases, eventually, Execute some sooner than others) Possible priority schemes:--->Round robin: Priority to least-recently-run test cases--->Track record: Priority to test cases that have detected faults beforeThey probably execute code with a high fault density--->Structural: Priority for executing elements that have notbeen recently executed• Can be coarse-grained: Features, methods, files, ...
How can the development process be improved? Give an example.
1. Data collection improvement 2. Find ways to reduce bugs 3. Continuous testing 4. Change dev process (more testing) 5. Go through process again to see if # of bugs reduced
A four step process to analyze and improve a development process
1. Define the data to be collected and implement procedures for collecting them 2. Analyze collected data to identify important fault classes 3. Analyze selected fault classes to identify weaknesses in development and quality measures 4. Adjust the quality and development process
An example of development process improvement
1. Faults that affect security were given highest priority 2. During QA we identified several buffer overflow problems that may affect security 3. Faults were due to bad programming practice and were revealed late due to lack of analysis 4. Action plan: Modify programming discipline and environment and add specific entries to inspection checklists
Components of a Test Plan
1. QA plan identifier 2. Introduction 3. Items to be verified 4. Features to be verified 5. Features not to be verified 6. Approach 7. Item pass/fail criteria 8. Suspension criteria and resumption requirements 9. QA deliverables 10. QA tasks 11. Environmental needs 12. Responsibilities 13. Staffing and training needs 14. Schedule 15. Risks and contingencies 16. Approvals
Test Plan Components (explain each)
1. Test plan identifier: unique 2. Introduction: objectives of document 3. Test items: high-level description of inputs to testing 4. Features to be tested: list of features or requirements 5. Features not to be tested: features you cannot test yet 6. Approach how you will test may include: phases of testing (unit, system, acceptance), performance testing, regression testing, use of problem tracking system 7. Item pass/fail criteria When does a test pass? When does the product pass? 8. Suspension criteria and resumption requirements: what to do if bugs prevent progress 9. Test deliverables: all outputs of testing 10. Testing tasks: from your schedule 11. Environmental needs: hardware and software 12. Responsibilities: who does what 13. Staffing and training needs: list people or types of people 14. Schedule: milestones 15. Risks and contingencies: potential problems and potential approaches to address them 16. Approvals: signatures
Five Basic Questions of Software Quality
1. When should quality assurance activities start? When should they stop? 2. What particular techniques should be applied during development? 3. How can we assess the readiness of a product? 4. How can we control the quality of successive releases? 5. How can the development process itself be improved?
How much does it cost to maintain software?
2-3 times as much as development
Software inspections consist of...
4-6 participants and is a 5 stage process with significant preparation
Thread
A "thread" is a portion of several modules that together provide a user-visible program feature. Integrating one thread, then another, etc., we maximize visibility for the user As in sandwich integration testing, we can minimize stubs and drivers, but the integration plan may be complex
Configuration
A collection of specific versions of software items (documentation, test cases), of hardware, and/or firmware
Another definition of SE
A field of computer science that deals with the building of software systems that: • are so large & complex to require teams of developers • exist in multiple versions • used for many years • undergo changes/evolution
Thinking Like a Tester
A good test case is one that has a high probability of finding a bug, if a bug exists Come up with ways to demonstrate that a product does not work • Try to break the product • Don't try to come up with ways to demonstrate that the product works
Software Regression
A new version of a software component regresses with respect to the previous version when the new version no longer provides functionality that should be preserved
A path
A path through a directed graph given a sequence of edges
What is Quality Assurance (Evaluation of Product)?
A planned and systematic pattern of all activities necessary to provide adequate confidence that the product conforms to established requirements
What is Quality Assurance (Evaluation of Process)?
A set of activities designed to evaluate the process by which products are developed (evaluation of the process)
Independent Groups Design (Between-Subjects Design) Pros and cons
Advantages • Mitigates transfer of learning effects • Appropriate for lengthy tasks—can prevent participant's fatigue Disadvantages • Requires a large number of participants
Within-subjects design pros and cons
Advantages • Requires fewer number of participants than Independent Groups Design Disadvantages • Prone to transfer of learning effect
Advantages and Disadvantages of "Thinking Aloud"
Advantages: • Can capture performance and preference information simultaneously • Can receive cues about misconceptions and confusions before they manifest as errors Disadvantages: • Could be unnatural and distracting • Could be tiring • Slows the thought process, can increase mindfulness
Big Bang Integration Test
An extreme and desperate approach: Test only after integrating all modules +Does not require scaffolding • The only excuse, and a bad one - Minimum observability, diagnosability, efficacy, feedback - High cost of repair • Recall: Cost of repairing a fault rises as a function of time between error and fault discovery
What is an oracle?
An oracle is any human or mechanical agent that decides whether a program behaved correctly in a given test and, accordingly, results in a verdict of "pass" or "fail."
When should QA activities start?
As soon as we decide to build a software product
Assertions versus Exceptions
Assertion violation => fault • Predefined response • Error report • Terminate or continue • More expressive notation (e.g. All, Some, old, class invariant) • Can be turned on and off during deployment Exception thrown => unusual case • Style guideline • exceptions should be reserved for truly exceptional situations • Outer context knows how to deal with the situation • Program-defined response • Handler • Different choices for resuming execution • Complex exception flow • Always part of the deployed code
Software Inspection Process - Rework
Author fixes all faults
Writing Tests with JUnit4: Preparing the Test Environment (Test Fixture)
Before/After annotation designates a method that deals with the test fixture: @org.junit.Before - Sets up the objects in the test fixture (usually allocates the objects and sets their initial values) @org.junit.After - Tears down the objects in the test fixture (usually "deallocates" the objects by setting their references to null) Important to execute both methods for each test case so that the test cases are isolated from each other; thus can execute the test cases in any order
Why black and white box?
Black box • May not have access to the source code • Often do not care how s/w is implemented, only how it performs White box • Want to take advantage of all the information • Looking inside indicates structure => helps determine weaknesses
Approaches to testing
Black box (functional, requirements based) White box(structural, implementation based)
Why Engineer Software (Quality Context)?
Can save lives and money Software is now an integral part of every facet of our societal infrastructure: • Transportation • Communication • Financial • Healthcare Poor quality software menaces the maintenance of that infrastructure!
Mutation Testing Assumptions
Competent Programmer Hypothesis • programmers write programs that are reasonably close to the desired program • e.g., sorting program is not written as a hash table Coupling Effect • detecting simple atomic faults will lead to the detection of more complex faults
Version vs. Configuration
Configuration is set of versions (different files in different states) There could be multiple versions of a configuration
White Box/Structural Test Data Selection Types
Coverage based Fault-based Failure based
White Box/Structural Test Data Selection
Coverage based Fault-based • error seeding (e.g., mutation testing) Failure-based • domain and computation based • use representations created by symbolic execution
UTP - Test Environment, Equipment, and Logistics
Describes the environment we will attempt to simulate during the test, necessary equipment, and set-up
UTP - Moderator's role
Describes what the moderator will be doing during the usability test
Test Plan
Details the testing steps to be taken in a project, project-specific
Determining Feasibility of paths through CFG
Determining if a path is feasible or not requires additional semantic information • In general, unsolvable • In practice, intractable
Test Case Creation Process
Develop test case • Test case objective • Test case environment • Test case preconditions • Test case procedure • Expected results • Pass/Fail criterion Verify/Debug test case
DAG
Directed acyclic graph
Software Inspection Process - General Guidelines
Distribute material ahead of time Use written checklist of what should be considered (functional testing guidelines) Criticize the product, not the author
Problem with coverage criteria
Do not take into account the operational profile • Companies care more about faults that occur in frequently executed code • Could try to weigh coverage criteria for frequently executed code more heavily Fault detection may depend upon • Specific combinations of statements, not just coverage of those statements • Astutely selected test data that reveals the fault, not just test data that executes the statement/branch/path
Software Inspection Process - Planning
Done by authors (prepare documents and overview), moderator (gather materials, arrange participants, arrange meeting)
What kind of approach is testing?
Dynamic analysis
Configuration Management Technology
Enables the process of Configuration Management ex. version control tools, build handling tools
Bad Reasons to Conduct a Usability Test
Everyone else has a usability testing program • The meeting rooms used for testing are available the third week of the month • You want to see if there is a need for this type of product in the marketplace
Example test case procedure
Example: • Requirement: the web site should have a menu item for calculating shipping cost. Once this menu item is selected, the user should be prompted to enter shipping weight (users can only ship items that weigh between 0 and 100 oz). • Test procedure: 1. Click "Shipping Cost" in Main Menu 2. Enter "101" in shipping weight field 3. Enter "0" in shipping weight field 4. Enter "100" in shipping weight field
Mutation Testing Process
Execute original program P on test set T. P is the correct program Save results R = P(T) Decide what mutations to do (select mutation operators) Generate mutants Each inserted fault results in a new program Mutant programs P1, ...Pk Distinguish mutants If Pi(T) != R then mutant Pi is killed
Define each of the following and provide an example: Failure Fault Dynamic Analysis Robustness of software component
Failure - result that deviates from expected ex. toLowerCase results in upper case string Fault - defect/bug - flaw that could lead to a failure ex. calling toUpperCase instead of toLowerCase Dynamic Analysis - process of evaluating software based on its execution ex. model checking, testing Robustness - behaves reasonably even in unexpected circumstances "fails gracefully" ex. system expecting number user enters "sdisfs" system displays an error instead of crashing
Bug aka
Fault/defect
Why use manual reviews?
Find defects earlier • Finding and fixing after delivery could be 100 times more expensive than during requirements and design • It is estimated that peer reviews find 60 percent of defects • Inability to test all artifacts • How do you test a non-executable artifact? • Inability to fully test some artifacts • Exhaustive testing is infeasible Education • Project understanding • Technical skills Enhance maintainability • Improves internal documentation • Promotes standardization
Software Inspection Process - Inspection
Find/report faults (Do not discuss alternative solutions)
Software inspections
Formal, multi-stage process • significant background & preparation before a meeting • meetings led by moderator according to a set of rules • discovered defects are recorded and meeting report is produced • many variations of this approach
Goal of Regression testing
Gain confidence that changes made to the system are correct-New functionality and corrected/modified functionality should behave as they should-Unchanged functionality is indeed unchanged
Test Strategy
General guidelines, project-independent
Error
Human action that results in software containing a fault
How are development costs measured?
Hundreds-thousands of dollars per delivered LOC Testing and analysis 50% of this cost
White box testing: What type of models do you ideally want?
Ideally want general models • One model that can deal with • different languages • e.g., Ada, C++, Java • different levels of abstraction/detail • e.g., detailed design, arch. design • different kinds of artifacts • e.g., code, designs, requirements
Acyclic
If graph has no path that is a cycle
Annotate Control Flow Graph with Events
If no execution of the program (a path from the start to the end node of the CFG) leaves the property in a non-accepting state, then the property is satisfied.
High-level goals of Software Engineering
Improve Productivity Improve Predictability Improve Maintainability Improve Quality
Peer Reviews
Informal process Pass an artifact to a coworker to obtain feedback Review might be done face-to-face or remotely
Observations about regression testing
It is often unnecessary to re-execute all former test casesif a system has changed, only test cases related to that part of the system should be executed
Main Areas of a QA Plan
Items and features to be verified • Scope and target of the plan Activities and resources • Constraints imposed by resources on activities Approaches to be followed • Methods and tools • Criteria for evaluating results
Nature of Software for DoD Cyber Physical Systems
Large scale conflicting performance req. Implementation diversity Complex Deployment Architectures long system life cycles Certification standards
Availability
Measures the quality of service in terms of running versus down time
Software inspections were developed by
Michael Fagan in 1972 for IBM
Functional strategies:
Modules integrated according to application characteristics or features • Threads, Critical module
When can manual reviews be performed?
Most can be applied at any step in the lifecycle
No configuration management =
No safety net
Software Inspection Process - Preparation
Participants study the material
Describe 3 black box techniques
Partition the input space - Choose samples to test - Test boundary conditions - Exceptional conditions
What are some types of manual reviews?
Peer reviews Walkthroughs Inspections
Regression testing - test cases used
Primarily selecting from existing test cases Plus adding some new test cases Possibly delete some old test cases
Post-Test Questionnaire
Purpose: Gather preference information from the participants to clarify and deepen understanding of the product's strengths and weaknesses
Testing relationship diagram
Quality assurance should be a process
Example of applying different quality measures
Release Chipmunk online store feature when: • All critical features are implemented (completeness) •All functional requirements have corresponding test cases and all these tests pass (functional correctness) • On average,no more than 30 minutes down time per month (availability) • Mean-time-to-failureatleast1week(reliability) • There is at most 1 failure per 1000 user sessions(reliability)
Branch Coverage
Requires that each branch in a program (each edge in a control flow graph) be executed at least once • e.g., Each predicate must evaluate to each of its possible outcomes Branch coverage is stronger than statement coverage
Hidden Branch Coverage
Requires that each condition in a compound predicate be tested
Assertions sometimes referred to as
Self-checking software or Design by Contract
Walkthroughs
Semi-formal process Author of an artifact presents it to coworkers Coworkers provide feedback
Test Case
Set of inputs, execution conditions, and a pass/fail criterion Basic, lowest-level component of testing
Why Engineer Software (Economic Context)?
Software is an important industry • Worldwide competition • Global development models • Well-designed software easier to maintain • Poorly designed legacy s/w may be a hindrance
Why Engineer Software (Societal Progress Context)?
Software is the "Grand Enabler" holding the key to scientific and engineering challenges • Human genome project • Space exploration • Weather prediction
Software Inspection Process - Follow-up
Team certifies faults fixed and no new faults introduced
Overview of Testing Techniques
Testing Processes Testing Approaches • Black Box • Test case selection criteria • Representations for considering combinations of events/states • White Box/Structural • Coverage based • Fault-based • Failure-based
Test Case Procedure
The actions/instructions that will be executed to test the system under test
Early start: from feasibility study
The feasibility study of a new project must take into account the required qualities and their impact on the overall cost • At this stage, quality related activities include • risk analysis • assessment of the impact of new features and new quality requirements • measures needed to assess and control quality at each stage of development • contribution of quality control activities to development cost and schedule
How should the quality of software products be assured?
There are no fixed recipes Software quality assurance specialists must • choose and schedule the right blend of techniques • to reach the required level of quality • within cost constraints design a specific solution that suits • the problem • the requirements • the development environment
How can Software Quality be improved?
Treat software as a PRODUCT produced in a systematic way according to a well-defined PROCESS designed to achieve explicit quality objectives • Build quality in • Define software product • Reason about the product • Incorporate validation as an integral part of the process
Automobile Analogy
U.S. automobile industry used to be very complacent about quality • Lost a significant amount of market share • Will complacency about s/w quality lead to the same result? There are many recalls for automobiles • Some fixed for free There are many defects in software • Some "fixed" for free • Some fixed in the "the next" release • With the customer paying for the upgrade
Common Testing Processes
Unit Integration System Acceptance Regression
Unreachable vs. Dead Codes
Unreachable is never executed Dead code may be executed but is irrelevant
Test Case Automation
What can be automated? • execution of test case procedure • evaluation of test case results (the oracle) Guidelines for automating test cases • Automate only test cases that will be repeated sufficiently many times for the automation to be cost effective • Develop a manual version of the test case first • Manual test case procedure needs to be unambiguous • Define and follow standard coding practice for writing automated test cases
Configuration Management
a discipline of identifying the configuration of a system at distinct points in time for the purpose of systematically controlling changes to the configuration and maintaining the integrity and traceability of the configuration throughout the system life cycle
What is a fault?
a flaw that could cause a failure
Quality assurance should not be...
a phase in the development lifecycle, rather it should be performed throughout the lifecycle
Configuration Management refers to both...
a process and a technology
Correctness
a product is correct if is satisfies all the requirement specifications
Reliability is sometimes stated as...
a property of time (mean time to failure)
Test suite
a set of test cases
What is Software?
associated artifacts to assist with the development, operation, validation, and maintenance of programs/software systems
Robustness
behaves "reasonably" even in circumstances that were not expected • making a system robust more than doubles development costs • a system that is correct may not be robust, and vice versa
Manual reviews have been shown to be...
beneficial for software quality
Acceptance Testing
customer's evaluation of a system (usually a form of system testing)
Simple cycle
cycle such that all of its nodes are different except for the start and end nodes
Reliability measures the...
dependability of the product ·The probability that the product will perform as expected
Configuration Management Process
encourages developers to work in such a way that changes to the configurations are tracked
What is Engineering?
established, scientifically sound practices that well-trained practitioners follow
System testing (high-level)
evaluating the overall functionality and performance
Regression testing
exercise a changed system -focus on modifications and their impact
Integration testing
exercise a collection of inter-dependent components (focus on interfaces between components)
System testing
exercise a complete, stand-alone system
Unit testing
exercise a single component (procedure, class)
A product is behaviorally and functionally correct if...
it satisfies all the specified behavioral requirements
JUnit Asset Methods
l False/True l Null/NotNull l Same/NotSame l Equals It also provides the fail method that is usually used to signal that an exception should have been thrown
When should QA activities stop?
last far beyond the product delivery as long as the software is in use, to cope with evolution and adaptations to new conditions
Manual Reviews are
manual static analysis methods
Interior test
of a loop causes a loop to be entered an it's body traversed at least once are to be selected for each unique path through the the loop
Boundary test
of a loop causes that loop to be reached but not entered
Microsoft Assertion Effectiveness Study
paper presents an empirical case study of two commercial software components at Microsoft Corporation • Applied to two development tools that are part of the Visual StudioTM • Written in C and C++ • For each component, analyzed two internal releases, thus providing four data sets • Releases: A-1, A-2, B-1, B-2. • The developers systematically employed assertions • Foreachcomponent,measuredthenumberofassertionsat the time of release • Usedarankcorrelationtechnique,soremovedfromtheanalysisallthe files that have no assertions and no faults • Suchfilesskewtheresultsbecausetheycannotbeusedto evaluate the efficacy of assertions and inflate the size of the study • Includedfilesthathavefaultsbutnoassertionsaswellasfileswith assertions but no faults • ComponentSizeinKLOCs;Assertiondensity • A-1104.03KLOC, 26.63 assertions/KLOC • A-2105.63KLOC,33.09 assertions/KLOC • B-1372.04KLOC,37.09 assertions/KLOC • B-2365.43KLOC,39.49 assertions/KLOC
Integration testing (high-level)
putting the pieces together
Waterfall Advantages/Disadvantages
recognizes distinct activities clearly oversimplifies the process • wait, wait , wait, surprise model actual processes are more complex • numerous iterations among phases • not purely top down • decomposition into subsystems many variations of the waterfall model • prototyping • re-engineering • risk reduction
Given a correct specification, a correct product is...
reliable, but not necessarily vice versa
Performance data
represent measures of participant behavior, includes error rates, number of accesses of the help by task, time to perform a task, and so on
Preference data
represent measures of participant opinion or thought process, includes participant rankings, answers to questions, and so forth.
Control Flow Graph
represents the flow of executable behavior
Failure
result that deviates from the expected or specified intent
Selecting paths (i.e., test cases) that satisfy these criteria
static selection • some of the associated paths may be infeasible dynamic selection • monitors coverage and displays areas that have not been satisfactorily covered
Major objection to using assertions
storage and runtime overhead • often shown not to be a problem • need more empirical data optimization techniques could remove many of the assertions • basically proving that the assertion is valid • would expect that many of the assertions could be eliminated • preconditions are often redundant checks on the validity of the parameters
Graphs are...
suggestive devices that help in the visualization of relations. The set of edges in the graph are visual repres. of the ordered pairs that compose relations (source, destination) Graphs provide a mathematical basis for reasoning about software
Beta test
tests performed by real users in their own environment, performing actual tasks without interference or close monitoring
Alpha test
tests performed by users in a controlled environment, observed by the development organization
Test/Test Execution
the activity of executing a test case and evaluating the results
Software Engineering
the application of scientific knowledge to the development and maintenance of software systems
Define Quality (software)
the degree to which a software product meets established requirements; however, quality depends upon the degree to which those established requirements accurately represent stakeholder needs, wants, and expectations
Validation
the process of evaluating a system or a component to determine if it satisfies its intended purpose Did we build the right product?
Verification
the process of evaluating a system or a component to determine whether the products of a given development phase satisfy the specifications imposed at the start of the phase ·Usually achieved via some form of static analysis ·Did we build it right?
Dynamic Analysis
the process of evaluating software based on its execution
Static Analysis
the process of evaluating software without executing it
Debugging
the search for the cause of a failure and subsequent repair
Regression testing
the selective retesting of a software component to verify that modifications have not caused unintended effects and that the component still complies with its specified requirements
What is testing?
the systematic selection and subsequent execution of sample inputs from a product's input space in order to infer information about the product's behavior ·Usually trying to uncover failures ·the most common form of dynamic analysis
Why could regression testing be difficult?
·Large systems take a long time to retest·Some former test cases but be no longer executable·New test case might need to be created·Cost of testing can prevent software improvements
How to obtain values for the different quality measures?
·Randomly generated tests following an operational profile ·Alpha and beta tests
Reliability vs. Correctness
·Reliability is relative, while correctness is absolute
Correctness is a...
·mathematical property ·requires a specification of intent ·specifications are rarely complete ·difficult to prove poorly-quantified qualities such as "user-friendly"
Quality assurance activities vary and depend on:
·nature of the product ·quality requirements ·construction process ·the engineering discipline
How is quality of non-software products assured?
·products form highly-automated production lines ·Custom products
Systems testing analogies
• "For the software testing professional, the system test phase is like game day for an athlete or show time for an actor" • "If product development is like a relay race, the test team runs the last leg of the race"
Example of Quantification
• --ASSERT for all I,(0≤ I<N), A[I]≤ A[I+1] • --ASSERT for some I,(0≤I<N), A[I]≤ A[I+1] • quantification not always supported since it can result in expensive computation
Assertions - Examples of using old and current values
• --ASSERT for all I,(1≤ I≤N), old(A[I])= A[I] • Value of the variables in the array have not changed • --ASSERT for all J,(1≤ J≤N) (for some I,(1≤I≤N),old(A[J])= A[I]) • Permutation of the array
UTP - Method
• A detailed description of how the test session(s) will be conducted
Counterbalancing
• A technique to mitigate transfer of learning effects in within-subjects design • The order in which tasks are assigned to participants is randomized or balanced out • Issue: what if the tasks are naturally performed in sequence in the real world? • Could use prerequisite training—train participants on preceding tasks before testing task of interest
Writing Tests with JUnit4: Test Suite
• A test suite may be composed of: • Tests cases • Other test suites • A test suite is defined as a class or a set of classes • Single class test suite • Multiple class test suite • Criteria for grouping test cases • Single class test suite usually contains tests for the methods of a single class from the software system • E.g., the AccountTest class contains tests for the methods of the Account.java class. • If class under test is too large, additional test grouping strategies can be used -- e.g., exceptional vs. normal behavior, based on fixture reuse. • Multiple class test suites can contain all tests related to a given package from the software system • E.g., OverallBankingTest class invokes tests for the methods of all classes in the banking package.
Graph
• Agraph,G=(N,E),is an ordered pair consisting of a node set, N and an edge set E = {(ni, nj)} If ordered, called directed else undirected
History of Assertions
• Alan Turing discussed using assert statements in algorithms, ~1947 • Assert statements used in formal verification* to indicate what should be true at points in a program, ~1967 • Assertions were advocated for finding faults during execution, ~1972 • Based on preprocessors • Assertions introduced as part of programming and specification languages, ~1975 • Euclid, Alphard, ... Bertrand Meyer popularizes Design by Contract and included assertions as an integral part of Eiffel, an OO language • Assertion capabilities for common programming languages, available but limited • C and Java have very limited assertion capabilities •Sophisticated assertion tools available in the market or public domain • E.g.,Parasoft,JML • Assertions widely used in Industry(e.g.,Microsoft and Google) • Experimental evidence of effectiveness
Heuristic for Path Coverage
• All acyclic paths • Will include the fall-through cases for loops • Boundary-Interior loop coverage • Stress test loop bounds when appropriate • E.g., manipulating bounded data structures
Probe and Interact with Participant as Appropriate
• Amount of interaction varies depending on stage of product • More interaction early in the development cycle • Less interaction later in the development cycle • If novice, err on the side of interacting too little
All or None Properties
• An all property is a behavior that must always happen on all possible executions • A none property is a behavior that must never happen Why not a some property?
White box testing - Formal models
• Analysis is usually done on a model of an artifact •textual representation of the artifact is translated into a model that is more amenable to analysis than the original representation ·the translation may require syntactic and semantic analysis so that the model is as accurate as possible e.g., x:= y + foo.bar • model must be appropriate for the intended analysis • Graphs are the most common form of models used ·e.g.,abstract syntax trees,control flow graphs,call graphs,reachability graphs, Petri nets, program dependence graphs
More Tips on Data Collection
• Anticipate what events might occur during test to expedite data collection • E.g., use checkboxes or abbreviations instead of writing in full • If more than one observers available, have them collect different data
Common Mistakes Using Assertions
• Assertions too general •Assert that I is a positive integer when it should be between 1 and 10 •Assert that some matching element in an array is returned when it is the first matching element Assertion too specific •Assert that I is a positive integer when it can actually also be = 0 Assertion tied to the current implementation instead of the specification •Want class invariants be as general as possible •Sometimes the assertions associated with a method must be implementation specific •Using assertions instead of exceptions
Assist Participants Only as a Last Resort
• Assisting is not the same as probing • Assisting affects test results! • When to assist? • When a participant is very lost or confused; exceptionally frustrated and may give up • When performing the task makes a participant feel uncomfortable • When the product is not in a final state and missing information needs to be provided to participant • When a failure or a malfunction occurs • How to assist? • Never blame participants for a problem • Clarify the participant's concerns • Gradually provide hints, rather than revealing everything at once • Be aware of upcoming tasks and refrain from comments that might affect participant's performance on these tasks
Tips for Using "Thinking Aloud" Protocol
• Avoid using it for very short tests • Demonstrate the technique • Do not force technique if there is strong resistance from participants • Pay attention to where participants become quiet • Periodically acknowledge listening to participant's thinking aloud
Guidelines for Determining Content of Post-Test Questionnaire
• Base content on research questions from test plan • Be concise and precise • How will the answer to a post-test question bring you closer to a design decision? • Ask questions related to what cannot be directly observed • Use pilot test to refine questionnaire • Develop general topics the post-test questionnaire should cover
Control flow graph regression techniques
• Based on the differences between the control flow of the old and the new versions of the software • Re-execute only test cases that exercise paths through changed portions of the CFG
integration testing strategies
• Big bang Structure-based • Top-down • Bottom-up • Sandwich • Functionality-based • Thread-based • Critical modules
What is a Usability Test Plan?
• Blueprint of entire usability test • "Not writing a usability test plan will come back to haunt you"
UTP - Task Lisk - Prioritizing Tasks
• By frequency • By criticality • By vulnerability • By readiness
Why System Testing?
• Check if the entire system works as expected • Some system requirements are inherently global • E.g. system response time, mean time to failure • Can be tested only when the entire system is available!
System Testing
• Checking whether the software system (as a whole) meets its requirements/specification • Form of verification • Comprehensive, based on a specification of observable behavior • Independent of design and implementation • System test cases are based on requirements • Ways to achieve independence: • Different organization performs system testing • Same organization, but develop system tests early
VCSs vs. DVCSs
• Clients have the entire repository • There may be many servers • Micro commits possible (two step commit) • No latency (all is local)
Main Components of a QA Strategy
• Common quality requirements • Set of documents produced during the QA process • Set of QA activities that should be part of the QA process • Standard tools and practices to be used in the QA process • Guidelines for project staffing, roles and responsibilities
Goals of Usability Test Materials
• Communicate with participants • Collect data • Satisfy legal requirements
Orientation Script
• Communication tool meant to be read to participants • Describes test session • Intended to put participants at ease • The product, not the participant, is being tested
Background Questionnaire
• Composed of questions that will reveal the participant's experience, attitudes, and preferences in areas relevant to the product • Derived from the participants' characteristics in the usability test plan • Purpose: • Should help understand the participants' performance • Helps confirm the "right" people show up
UTP - Task List
• Consists of the tasks that the participants will perform during the test • Components of a task: • Description • Materials and machine states • Description of successful completion of a task • Timing and other benchmarks
Atomic faults (mutations): Operand mutations
• Constant replacement e.g., x := x + 5; would replace 5 with each constant of the appropriate type that appears in the program • Scalar variable replacement e.g., y := x + 5; would replace x with each scalar variable of the appropriate type that appears in the program and then would replace y with each scalar variable of the appropriate type that appears in the program
Ensure that Participants Are Finished Before Continuing with the Next Task
• Continuing too soon can bias participant/lead to loss of information • Have the participants signal when they are finished, as part of the test protocol
Examples of Performance Data
• Counts and rates: • Number of errors • Percentage of tasks completed successfully • Number and type of hints or prompts needed to complete task • Number of omitted steps or procedures • Scores on a comprehension test • Time durations • Time to complete a task • Time to recover from an error • Time to achieve a criterion level of competence • Training time to achieve benchmark performance • Time spent reading vs. "doing"
Writing Tests with JUnit4: Initial Preparation
• Create a new Java class that will contain individual test cases. • Suppose the name of the class you want to test is "Foo". By convention, the name of the class that contains the test cases for Foo should be FooTest. • Include the following imports: • import org.junit.* • import static org.junit.Assert.*
Example Uses of Data-Flow Analysis
• DFA used extensively in program optimization • Determine if a variable definition is dead (and can be removed) • Determine what variables are live (and need to be kept in registers) • DFA can also be used to find anomalies in the code • Use of an uninitialized variable (Undef->Ref) • Redefinition of a variable (Def -> Def) • All of the above are generic properties of program behavior
Example Applications of DFA Techniques
• DFA used extensively in program optimization • e.g., determine if a definition is dead (and can be removed) determine if a variable always has a constant value determine if an assertion is always true (and can be removed) • DFA can also be used to find anomalies in the code • Find def/ref anomalies [Osterweil and Fosdick] • Cecil/Cesar system demonstrated the ability to prove general user-specified properties [Olender and Osterweil] • FLAVERS demonstrated applicability to concurrent system [Dwyer and Clarke] • Why "anomalies" and not faults? • Because these anomalies may be on non-executable (infeasible) paths
Techniques for Reviewing with Participants
• Debriefing should feel like discussion among peers • Never make participants feel defensive about their actions • Do not react to participant's answers in one way or another
Overview of Debriefing
• Debriefing: Exploring and reviewing the participant's actions • Usability test session exposes problems, debriefing sheds light on why problems occurred and how to fix them • Analogy: • usability testing ≈ solving puzzle • debriefing ≈ all the pieces come together
Common representations for selecting sequences of events
• Decision tables • Use cases • State diagrams/finite-state automata
Task Scenarios
• Describe what the participants should do during the test • Based on the task list from the usability testing plan • Add context and rationale for performing tasks
UTP - Data to Be Collected and Evaluation Measures
• Describes the performance and preference data to be collected • Performance data: • Preference data:
Mutation Testing process
• Distinguish mutants: execute each mutant Pi with T and compare results to P(T) • If Pi(T) 1 P(T) then mutant is killed • Has to fail for at least one tÎT • if Pi(T) = P(T) then either • Pi and P are equivalent • the test cases do not reveal the seeded fault and we need to find a new test case that does apply test data and compare output with R, where R = P(T) * "killed" /distinguished mutants
Specification-based Regression Testing Techniques
• Do not require recording of executed CFG paths during testing • Identify regression test cases based on correspondence between test cases and specification (requirements) • E.g., test cases based on coverage of states and transitions in a UML state diagram that specifies the software system ÞIdentify states and transition affected by change, re- execute test cases related to these states and transitions
Tips for Probing and Interacting
• Don't show surprise • Focus on what the participants expected to happen • Say as little as possible, offer assistance only as a last resort • Ask neutral questions rather than ''loaded'' ones that imply an answer.
UTP - Appropriate research questions
• E.g., research questions related to the usability of a web site: • How easily do users understand what is clickable? • How easily and successfully do users find the products or information they are looking for? • How easily can users return to the home page? • How well do users understand the symbols and icons? Which ones are problematic? Why? • How quickly can users perform common tasks? • How closely does the flow of the software reflect how the user thinks of the work flow? What are the major usability flaws that prevent users from completing the most common tasks?
Independent Groups Design
• Each participant tests a single feature of a user interface
Within Subjects-Design
• Each participant tests multiple features of a user interface and the same feature is tested by different participants
Benefits of structural coverage criteria
• Easy to monitor and measure • Provides some guidance for evaluation of test cases
The primary reasons for combining QA techniques during development are:
• Effectiveness for different classes of faults • Applicability at different points in a project • Differences in purpose • Tradeoffs in cost and level of assurance
Data Flow Analysis (DFA)
• Efficient technique for proving properties about programs • Not as powerful as automated theorem provers, but requires less human expertise • Uses an annotated control flow graph model of the program • Compute facts for each node • Use the flow in the graph to compute facts about the whole program • We'll focus on single units
Don't Rescue Participants When They Struggle
• Encourage participants to verbalize their feelings (thinking aloud) • Alternatively, probe participants when they experience difficulty
Integration Testing Objectives
• Ensure proper interaction among components • Uncover module interaction and compatibility problems • Gain confidence in the integrity of overall system design
Objectives of Error Seeding
• Evaluate the adequacy of a test suite • Guide selection of additional test cases • Estimate the number of faults in a program
Coincidental Correctness
• Executing a statement does not guarantee that a fault on that path will be revealed
How can the development process itself be improved?
• Faults often result from human error • Problems in the development process can increase the probability of human errors or the non-detection of their consequences • Improving the development process can improve the quality of the developed product
Why Do Debriefing?
• Final opportunity to understand why every error or difficulty occurred for every participant for every session • Allows to resolve residual questions • Illuminates thought process and rationale of participants • Especially useful if thinking aloud not used during task scenarios
Evolution of Version Control Systems
• First there was an ad hoc approach (copy dirs) • Then local version control systems (VCSs) • Then centralized (means there is a server) • CVS, SVN, ClearCase, Perforce (proprietary) • Then distributed • Git, Mercurial, BitKeeper (proprietary), Bazaar, Darcs
Data Flow Analysis Steps
• First, determine local information that is true at each node in the CFG • e.g., What variables are defined What variables are referenced • Usually stored in sets • e.g., ref(n) is the set of variables referenced at node n • Second, use this local information and control flow information to compute global information about the whole program • Done incrementally by looking at each node's successors or predecessors • Use a fixed point algorithm--continue to update global information until a fixed point is reached
Running Tests: Test Case Pass/Fail Semantics
• For a given test suite, all methods whose annotation starts with @Test will be run • @Test:Nominal behavior • When all assertX method invocations succeed and no exception is thrown: Succeeds • Otherwise: Fails • @Test(expected=MyException.class):Exceptional behavior • When all assertX method invocations succeed and an exception of class MyException.class is thrown: Succeeds • Otherwise: Fails
Initial Facts about a node: GEN and KILL sets (DF)
• For each node i associate sets • GEN(i) - what is to be added (generated) • KILL(i) - what is to be eliminated (killed) • The definitions of GEN and KILL depend on the problem that is being solved • Often the GEN and KILL sets can be derived from the abstract syntax tree • E.g., variables defined in a node variables referenced in a node
Results comparing assertions to faults
• For files that have low assertion density, there is a higher fault density • Files with a low fault density have a higher assertion density • By analyzing the faults, observed that on average 14.5% of the faults for component A and 6.5% of the faults for component B were detected by assertions
Selecting a Data Collection Method
• Fully Automated Data Loggers • Keeps track of mouse clicks, keystrokes, timing, "location on user interface" • Semi-Automated Data Collection by Moderator/Observer • Test moderator/observer enters events as they occur simply by choosing from predetermined choices on the screen • Manual Data Collection by Moderator/Observer • User-Generated Data Collection • User fills out a questionnaire after the completion of each task
Choosing a Strategy (top, bottom, sandwich)
• Functional strategies require more planning • Structural strategies (bottom up, top down, sandwich) are simpler • But thread and critical modules testing provide better process visibility, especially in complex systems • Possible to combine • Top-down, bottom-up, or sandwich are reasonable for relatively small components and subsystems • Combinations of thread and critical modules integration testing are often preferred for larger subsystems
Quality Assurance Strategy
• General guidelines for quality assurance within an organization • Based on lessons learned within the organization • Not specific to a particular project (Unlike a QA plan)
When to Deviate from a Test Plan
• General rule: when you are a novice, err to the side of sticking to the test plan • Deviate when • Participants do not understand task scenarios • During the test, you discover additional areas/topics that need to be investigated • If questionnaires do not hit issues identified during test sessions • If the expected participant does not show up • If the timing does not work well
Usability Test plan - test objectives
• High-level description of goals of the usability test • The reasons for performing the test at the given time • Ties usability testing to the business goals of an organization
An Assertion Mechanism
• High-level language constructs for • Logical expressions (typically Boolean-valued expressions) for characterizing desirable/undesirable program execution states • Predefined (and usually limited) user-defined runtime response that is invoked if the logical expression is violated • Automatic translation of the logical expressions into executable statements that evaluate the expressions on the appropriate states (scope) of the associated program
Debugging Tests
• If a test case fails, what does that mean? • Need to apply quality assurance techniques to test cases • Debugging a test case often requires same engineering and analytical skills needed to debug production code
Tips on Data Collection
• If unsure how to begin, start simply • Common types of information collected during usability testing: • Whether each task was completed successfully • Whether prompting or assistance was required • Major problems/obstacles associated with each task • Time required to perform each task •Observations/comments concerning each participant's actions
Barriers to Engineering Software
• Industry's short term focus • Shortage of skilled personnel • Inadequate investment in R&D • Poor technology transfer models • Lack of "good" standards • Lack of experimental basis for standards
Error Seeding
• Insert "typical" faults into a system • Determine how many of the inserted faults are found • If K of the N faults found, then assume that K/N of actual faults found as well • Motivates developers/testers • Know there is something to find • Not looking for their own faults, so more motivated • Drawback • Assumption about percentage of remaining faults not valid unless the seeded faults are "representative"
What do assertions do?
• Insert specifications about the intent of a system • violation means there is a fault in the system • During execution, monitor if the assertion is violated • If violated then report the violation
Integration Plan + Test Plan
• Integration test plan drives and is driven by the project "build plan" • A key feature of the system architecture and project plan
Components of an Orientation Script
• Introductions • Explanation of the product to be tested • Expression of appreciation • Description of testing set-up • High-level explanation of the structure of the usability test • Assurance that participants are not being tested • Mention that it is OK to ask questions, ask for questions
UTP - Bad research questions
• Is the current product usable? • Is the product ready for release or does it need more work?
A software product is said to have good quality if:
• It has few failures when used by the customer • It is reliable • It satisfies the majority of users
Be Aware of Effects of Body Language
• It is easy to unintentionally influence participants with body language • Moving closer to someone ≈ acceptance of what someone is saying; moving away ≈ rejection • Raising pitch of voice ≈ agreement; lowering pitch ≈ disagreement • To improve, review video tapes of test sessions
Why are Software Inspections effective?
• Knowing the product will be scrutinized causes developers to produce a better product • Hawthorne effect • Having others scrutinize a product increases the probability that faults will be found • Walkthroughs and peer reviews are not as formal as inspections, but appear to also be effective • Hard to get empirical results
Test Case Prioritization need
• Large portion of a test suite may still need to be executed even after applying regression test case selection techniques
Common Formats of Post-Test Questionnaire
• Likert Scale (Strongly agree, agree ....) Semantic differentials Modern 3, 2, 1, 0 ,1 ,2 ,3 Traditional Fill-in questions checkbox branching
Tips for Probing and Interacting (continued)
• Limit interruptions to short discussions • Save longer discussions for debriefing session • Probe in response to both verbal and non-verbal cues from participants • Handle one issue at a time • Don't problem solve
Scope of an assertion
• Local assertion • checked at the definition site • E.g., ASSERT X > 10 • Global assertion • defined over a specific scope, usually using the scoping rules of the programming language • must determine the locations that need to be checked, • E.g., Global ASSERT X > 10 • Compiler/preprocessor must determine all the locations where X is defined/assigned and check that X is greater than 10 • Loop assertion (Loop invariant) • Checked at each iteration at the designated point in a loop • E.g., \loop_invariant (I < Max ) Pre (and Post conditions) • Checked at the start (and end) of a method each time it is invoked • Ex. precondition keywords: pre, assumes, requires • Ex. post-condition keywords: Post, ensures, provides • Other post-condition keywords: returns (returned value of a function), promise (impact on all other variables) • Class assertion (Class invariant) • Checked at the return of each method in a class • E.g., class_invariant, \invariant • All of the above are syntactic sugar • Could write the code to get the same results • But, assertion mechanism greatly simplifies writing assertions
Software inspections participants
• MODERATOR - responsible for organizing, scheduling, distributing materials, and leading the session • AUTHOR - responsible for explaining the product • SCRIBE - responsible for recording bugs found • PLANNER or DESIGNER - author from a previous step in the software lifecycle • USER REPRESENTATIVE - to relate the product to what the user wants • PEERS OF THE AUTHOR - perhaps more experienced, perhaps less • APPRENTICE - an observer who is there mostly to learn
If You Make a Mistake, Continue On
• Make a note, but continue on as if nothing happened • At worst, part of the session will be invalidated
Mutation testing summary
• Mutation testing takes error seeding to the absurd, but it did stimulate some useful research and insight • Optimization approaches need to be used • Mutation testing very useful for generating test beds of buggy programs • Only one (known) bug per program • May or may not be typical bugs • Now widely used to evaluate different testing approaches
What QA techniques should be applied during development?
• No single QA technique can serve all purposes
Control Flow Graph
• Nodes may correspond to single statements, parts of statements, or several statements • Execution of a node means that the instructions associated with a node are executed in order from the first instruction to the last • Nodes are 1-in, 1-out (except decision nodes) • a subpath through a control flow graph is a sequence of nodes (n1, n2,...nt ) where for each nk, 1≤ k < t, (nk, nk+1) is an edge in the graph e.g., 2, 3, 2, 3, 2, 4 • a complete path starts at the start node and ends 1 at the final node 1, 2, 3, 2, 4 • Every executable sequence of instructions in the represented component (program) corresponds to a path in its CFG • Not all paths correspond to executable sequences • requires additional semantic information • "infeasible paths" are not an indication of a fault • CFG usually over-estimates the executable behavior
Legal Documents
• Non-disclosure agreement • Informed consent form
Examples of Performance Data
• Number and percentage of tasks completed correctly with and without prompts or assistance • Number and type of prompts given • Number and percentage of tasks completed incorrectly • Count of all incorrect selections (errors) • Count of errors of omission • Count of incorrect menu choices • Count of incorrect icons selected
Example Questions for a Screen or a Web Site
• Organization of screen matches real-world tasks? • Amount of information adequate? • Appropriate use of color? • Similar information consistently placed? • Problems with navigation? • Problems with losing your place in the system? • Computer jargon? • Too much or too little information? • Ease of reading?
Common Usability Test Materials
• Orientation script • Background questionnaire • Legal documents • Pre-test questionnaire • Data collection instruments • Task scenarios • Post-test questionnaire • Debriefing guide
Examples of Preference Data
• Participant comments and opinions • Preference of Version A vs. Version B in a comparative study • Suggestions for improving the product • Number of negative references to the product • Rationales for performance (what the participant says about why he or she did what he or she did) • Ratings or rankings of the product
Loop Coverage
• Path 1, 2, 1, 2, 3 executes all branches (and all statements) but does not execute the loop well.
Hardware versus Software
• Percentage wise, hardware costs are decreasing and software costs are increasing Is hardware development done better than software development? • Yes, but... • s/w systems tend to be more complex • tend to do new applications in s/w and well-understood applications in h/w • despite the use of more rigorous and systematic processes, hardware systems fail too
Categories of Data Collected During a Usability Test
• Performance data: measures of participant behavior, includes error rates, number of accesses of the help by task, time to perform a task, and so on. • Preference data: measures of participant opinion or thought process, includes participant rankings, answers to questions, and so forth.
Software Inspection Process
• Planning • Preparation • Inspection • Rework • Follow-up
Quality Assurance Planning
• Planning is integral to the QA process • Includes identifying an overall QA strategy and creating a more detailed QA/test* plan
UTP - Research questions
• Precise, measurable statements about what should be learned from the usability test • Help focus the planning, designing, and conducting of a usability test
Moderate the Session Impartially
• Present the product neutrally • React the same way to different outcomes • Show no vested interest in the results one way or another
Benefits of CFGs
• Probably the most commonly used representation • Numerous variants • Basis for inter-component analysis • Collections of CFGs • Basis for various transformations • Compiler optimizations • S/W analysis • Basis for automated analysis • Graphical representations of interesting programs are too complex for direct human understanding
General Data Flow Analysis Approach
• Propagation rule • Forward or backward • IN value for the initial node • Since IN depends on OUT, need to initialize OUT for each node • Initial value depends on the problem • Merge operator determined by whether it is an all-path or an any-path problem • Final result rule • Usually based on IN, OUT, GEN, and KILL for each node • Sometimes only need to look at the final node
Guidelines for Task Scenarios
• Provide realistic scenarios and motivation • Place scenarios in the order they are most likely to occur in reality • Avoid using jargon • Avoid giving cues • Do not guide participants through scenario piecemeal
UTP - Test Report Contents and Presentation
• Provides a summary of the main sections of the test report • Describes how the results will be communicated to the development team
Data Collection Instruments
• Purpose: expedite the collection of all data pertinent to the test objectives • Questions to answer before choosing the data collection tools: • What data will address the problem statement(s) in the usability test plan? • How will you collect the data? • How will you record the data? • How do you plan to analyze the data? • How and to whom will you report the data? • What resources are available to help with the entire process?
Debriefing guide
• Purpose: provide the structure from which to conduct the debriefing session • Contains a list of general topics to be discussed with the participants • Similar to moderator's guide used in focus groups • Can be augmented with additional topics based on observations made during test session
How can we assess the readiness of a product? When is the product "good enough"?
• QA activities during development aim at revealing faults • We cannot reveal and remove all faults • QA cannot last indefinitely: we want to know if products meet the quality requirements • We must specify the required level of quality and determine when that level has been attained
Data Flow Regression Testing Techniques
• Re-execute test cases that, when executed on the original program, exercise definition-use pairs that were deleted or modified in the new version of the program• Re-execute test cases that exercise a conditional statement whose predicate has been modified
Design by Contract
• Recognizes the widespread use of library components • Precondition clearly states what a component expects to be true • Obligations of the client using the component • Post condition clearly states what is expected to be true after the component is used • Obligations of the component, given that the client has fulfilled their obligations
Systems testing - Reporting test results
• Regular reports • test status reports • bug status reports • Final report
Test Case Objective Examples
• Requirement2.2.1ofTestManagementToolkit(TMT):The application shall provide a means for creating, modifying, viewing, storing, and retrieving test plan documents. • Testcase1objective:VerifythattheTMTprovidesameansfor a qualified user to create all valid types of test plan documents. • Testcase2objective:VerifythattheTMTprovidesameansfor a qualified user to modify all test plan documents that have been stored. • Testcase3objective:...toviewanytestplanthathasbeen stored. • Testcase4objective:...tostoreanytestplanthathasbeen created • Testcase5objective:...toretrieveanytestplan...
Statement Coverage
• Requires that each statement in a program be executed at least once
Path Coverage
• Requires that every executable path in the program be executed at least once • In most programs, path coverage is impossible
System Testing Exit Criteria
• Run out of time • Completed all planned test cycles • Bug profile meets exit criteria
UTP - Participant Characteristics
• Sample size • In general, large samples needed for statistically valid results • In practice, often 4-5 participants per participant class expose about 80% of usability deficiencies
Why Create a Usability Test Plan?
• Serves as a blueprint for usability testing • Serves as a communication vehicle • Defines/implies required resources
Experimental results of Software Inspections
• Software inspections have repeatedly been shown to be cost effective • Increase front-end costs • ~15% increase to pre-code cost • Decrease overall cost • Doubled number of lines of code produced per person • some of this due to inspection process • Reduced faults by 2/3 • Found 60-90% of the faults • Found faults close to when they were introduced • The sooner a fault is found the less costly it is to fix
How can we control the quality of successive releases?
• Software quality assurance does not stop at the first release • Software products operate for many years, and undergo many changes: • Quality tasks after delivery
Constant Propagation
• Some variables at a point in a program may only take on one value • If we know this, can optimize the code when it is compiled • Constant propagation: the process of substituting the values of known constants in expressions at compile time
Quality Assurance Plan
• Sometimes referred to as "test plan" or "analysis and test plan" • Details the QA steps to be taken in a project • Specific to a particular project • Incrementally developed • Elaborated and revised through the software development lifecycle
Control-Flow-Graph-Based Coverage Criteria (for a Test Suite)
• Statement Coverage • Branch Coverage • Path Coverage • Loop Coverage Guidelines
Test Design Strategies
• Strategies for assigning tasks to participants • Independent Groups Design (Between-Subjects Design) • Within-Subjects Design
Critical Modules
• Strategy: Start with riskiest modules • Risk assessment is necessary first step • May include technical risks (is X feasible?), process risks (is schedule for X realistic?), other risks • May resemble thread or sandwich process in tactics for flexible build order • E.g., constructing parts of one module to test functionality in another • Key point is risk-oriented process • Integration testing as a risk-reduction activity, designed to deliver any bad news as early as possible
Black Box Test Data Selection
• Stress testing • large amounts of data • worse case operating conditions • Performance testing • Combinations of events • select those cases that appear to be more error-prone • Select 1 way, 2 way, ... n way combinations
Proposed Silver Bullets
• Structured programming • Modularity • Data Abstraction • Software Verification • Object-oriented • Agile or Xtreme programming • Aspect oriented programming
Mutation Testing
• Systematic method of error seeding • Approach: considers all simple (atomic) faults that could occur • introduces single faults one at a time to create "mutants" of original program • apply test set to each mutant program • "test adequacy" is measured by % "mutants killed"
What are the deficiencies with Software Inspections?
• Tend to focus on fault detection • what about other "ilities -- maintainability, portability, etc. • Not applied consistently/rigorously • inspection shows statistical improvement • Human intensive and often make ineffective use of human resources • e.g., skilled software engineer reviewing coding standards, spelling, etc. • Lucent study .5M LOCS added to 5M LOCS required ~1500 inspections, ~5 people/inspection • No automated support
Automated Support for Regression Testing
• Test environment or infrastructure support • Specification of test cases and expected results• E.g., JUnit• Capture and replay tools• Especially for GUI components
Components of Usability Test Plan
• Test objectives • Research questions • Participant characteristics • Method (test design) • Task list • Test environment, equipment, logistics • Description of test moderator's role • Data to be collected and evaluation measures • Test report contents and presentation
System Testing Entry Criteria
• Test plan written and reviewed • Test cases written and reviewed • Unit testing and integration testing finished by development team • A build of the entire software system is available
Fixed point
• The data flow analysis algorithm will eventually terminate • If there are only a finite number of possible sets that can be associated with a node • If the function that determines the sets that can be associated with a node is monotonic
Test Case Environment
• The environment in which the test case will be executed • E.g., hardware configuration, operating system, version of software under test • Need to select one or more test case environments that will be used to accomplish the test case objective • Same test case might need to be run in different environments (e.g., run test on Windows 10 and on a Mac OS 10.11.16)
Test Case Preconditions (Test Case Set-Up)
• The initial state of the system under test • E.g., What entries need to be in a database (test plan document 1, test plan document 2, ...) • Helps eliminate confounding factors during debugging • Helps with test case reproducibility
If a test case fails, what does that mean?
• The program under test is wrong • Or, the test case is wrong
Test Case Objective
• The purpose of the test case What the test will accomplish, not how • Related to a requirement
Expected Results
• The result that the system is expected to produce at the end of the test case (i.e., after the test procedure is executed) • Based on the requirements
Pass/Fail Criterion
• The standard by which we decide whether a test case passes or fails • Usually tightly coupled with expected results • i.e., if observed results after executing the test case procedure match expected results, then a test case passes; otherwise it fails
Software product undergo many changes after release such as:
• They adapt to environment changes • They evolve to serve new and changing user requirements.
Use "Thinking Aloud" Protocol, If Appropriate
• Thinking aloud: participants verbalize their thought process while performing a task • Have participants express their confusion, frustration, delight • Especially effective for conducting early exploration usability tests
Guidelines for Developing Orientation Scripts
• Tone: friendly, but professional • Keep it short • Read script to each participant verbatim
Stay objective, but keep the tone relaxed
• Too much solemnity can inhibit other people • Humor can help participants relax • Laugh with not at participants
Black box/Functional Test Data Selection
• Typical cases • Boundary conditions/values • Exceptional conditions • Illegal conditions (if robust) • Fault-revealing cases • based on intuition about what is likely to break the system • Other special cases
Reachability Graph
• Typically, each edge represents progress in a single task • Multiple concurrent events may be possible, but allowing only single events captures all states and simplifies the graph structure (interleaved execution model) • Only have multiple tasks progress when required by the semantics of the programming construct • E.g., rendezvous • Only contains states that are potentially reachable from the start state
Integration versus Unit Testing
• Unit (module) testing is a necessary foundation • Unit level has maximum controllability and visibility • Integration testing can never compensate for inadequate unit testing • Integration testing may serve as a process check • If module faults are revealed in integration testing, they signal inadequate unit testing • If integration faults occur in interfaces between correctly implemented modules, the errors can be traced to module breakdown and interface specifications
JUnit Testing Framework
• Unit under test: usually a class or a small number of classes • Enables creation of individual test cases and combining test cases into test suites • Runs a set of test cases or test suites • Checks the correctness of the test cases and produces a test report • passed test cases • failed test cases • summary statistics Automatic • Maintains the test environment for many test cases
Examples of Preference Data
• Usefulness of the product • How well product matched expectations • Appropriateness of product functions to user's tasks • Ease of use overall • Ease of learning overall • One prototype vs. another prototype • This product vs. a competitor's product
Language for representing logical expressions
• Usually use anotation that can be easily translated into the programming language • Boolean expressions • Use variables and operators defined in the program • Must adhere to programming languages scoping rules • ASSERT X<Y+Z; where X, Y, and Z are variables in the program defined in the scope where the assert stmt appears • Quantification • ForAll or \forall or \all • ThereExists or\exists or \some • Often want to reference original value and current value of a variable • Example notations • Pre(X) • Old(X) • X • E.g., ASSERT (x==old(x) + 1) • Sometimes can only reference previous values in post conditions (post conditions explained shortly)
Issues with Centralized version control systems
• VCS server is a single point of failure • Clients have only a snapshot (a single configuration) • Cannot commit when server is offline • Inspecting history, viewing diffs requires the server • There could be latency • No distinction between saving a change and making it available
Testing design example problem
• We want to compare two different versions of a product: version A and version B • We also want to determine whether the performance of two user groups—supervisors and technicians— varies. • How many participants do we need if we use independent group (between subject) design with 4 participants per group? • How many participants do we need if we use within- subject design? What would this design look like?
Error seeding - Basic Assumptions
• We'd like to estimate the effectiveness of a test suite in finding real faults, by measuring how well it finds seeded fake faults. • Estimates are valid to the extent that the seeded bugs are representative of real bugs • The seeded bugs are "like the real bugs" • The seeded bugs should be "as difficult/easy to find" as the real bugs • E.g., if we mix metal ball bearings into the white marbles, and pull the bearings out with a magnet, we don't learn anything about how many marbles were in the bowl
Quality Assurance Plan Addresses the following questions...
• What quality assurance activities will be carried out? • What are the dependencies among the QA activities themselves and between the QA activities and other development activities? • What resources are needed and how will they be allocated? • How will both the process and the product be monitored?
Debriefing Guidelines
• While participant fills out post-test questionnaire • Decide which issues remain unresolved • Review debriefing topics guide, prioritize points to discuss • Review post-test questionnaire • Let participant take a break • Look for unexpected answers that might need further exploration • Begin by letting the participant say whatever is on their mind • Begin questions from high-level issues • Move to specific issues • Give participant chance to contemplate their answers • Stay with one point until you feel confident that you clearly understand the basis for the problem, difficulty, and so forth • Focus on understanding problems and difficulties, not on problem solving. • Finish your entire line of questioning before opening up the floor to discussion by observers
Good Reasons to Conduct a Usability Test
• You want to understand whether both of your major types of users can use the product equally well. • You want to know whether or not the documentation is able to compensate for some acknowledged problems with the interface. • You have received numerous complaints associated with using the product. You are interested in determining the exact nature of the problem and how you will fix it within your development budget for this year.
QA Activities During Maintenance
• analysis of changes and extensions • generation of new test suites for the added functionalities • re-executions of tests to check for non regression of software functionalities after changes and extensions • fault tracking and analysis
Operator mutations
• arithmetic operator replacement • e.g.,x:=x+5; • would replace + with -, *, /, and ** • relational operator replacement • e.g.,a>b; • would replace > with >=, <, <=, =, and /=
Boundary - Interior Criteria
• boundary test • interior test
Typical Guidelines for loop coverage
• fall through case • minimum number of iterations • minimum + 1 number of iterations • maximum number of iterations (if practical)
Control flow graph for regression testing - Possible changes
• inserted node(s) • deleted node(s) • inserted edge(s) • deleted edge(s) • changed annotation of a node (i.e., corresponding statement has changed)
More operator mutations
• logical connector replacement • absolute value insertion • unary operator insertion • statement deletion • return statement replacement • GOTO label replacement • DO statement end replacement
Issues with manual reviews
• often the first thing dropped when time is tight • labor-intensive • often done informally, no data/history, not repeatable
Why is it difficult to create high-quality software?
• product is unprecedentedly complex • application horizons expand very fast--with human demands/imagination • construction is human-intensive • solutions require unusual rigor • extremely malleable--can modify the product all too easily
More operand mutations
• scalar variable for constant replacement • constant for scalar variable replacement • array reference for constant replacement • array reference for scalar variable replacement • constant for array reference replacement • scalar variable for array reference replacement • array reference for array reference replacement • array index replacement for array index replacement • data statement alteration
What are some quality tasks performed after delivery?
• test and analysis of new and modified code • re-execution of existing test cases • extensive record-keeping
Why are control flow graphs so tentative?
• the edges E represent the potential transfer of control;
If you don't use Configuration Management then...
• you are not keeping track of changes • you won't know when features were added • it is more difficult to determine when bugs were introduced or fixed • you won't be able to go back to old versions of your software