CS409 exam time homie
why do we need both static and dynamic analyis
"An object-oriented programs run-time structure often bears little resemblance to its code structure. The code structure is frozen at compile-time; it consists of classes in fixed inheritance relationships. A programs run-time structure consists of rapidly changing networks of communicating objects. In fact, the two structures are largely independent. Trying to understand one from the other is like trying to understand the dynamism of living ecosystems from the static taxonomy of plants and animals, and vice-versa" - some book
JavaParser - Structure
'Small, lightweight...'?!? Most of what we are interested in is concentrated in a few packages within the ast package: body expr stmt type visitor Will need to read the source code to figure out what to do in some cases - make sure you have access to it (either linked or in a separate project)
JavaParser Example - Printing out all method names
- Need to create a CompilationUnit which corresponds to the object structure - a visitable AST representation of a program - Override just those visitors of interest to include the desired functionality - Start the process by calling visit on the visitor and passing it the compilation unit ... FileInputStream in = new FileInputStream("SomeFile.java"); CompilationUnit cu; try { cu = JavaParser.parse(in); } finally { in.close(); } new MethodVisitor().visit(cu, null); ... private static class MethodVisitor extends VoidVisitorAdapter { @Override public void visit(MethodDeclaration n, Object arg) { System.out.println("Method: " + n.getName()); }
reminder: Frameworks that use Visitor Pattern
- a client that uses the visitor pattern must create and then traverse the object and then traverse the object structure, visiting each element with the visitor -when an element is visited, it calls the visitor operation that corresponds to its class. The element supplies itself as an argument to this operation to let the visitor operation to let the visitor access its state, if necessary. -Frameworks that use the visitor pattern do most of this for you! >Generate the AST and associated classes for each node type > Provide generic visitors for traversing the AST -Include your own functionality by subclassing specific visitors of interest
backwards slicing
-A backwards slice identifies those statements that affect the computation of a particular statement. -Backward slice from System.out.println(i); -Also have: Forward Slices Interprocedural Slices OO Slicing -Creating program slices automatically is far more difficult than it initially looks. -In the example above it may appear that superficially it is just a case of including all lines that contain the variable in question - in this case i. But it is more complex than that and involves identifying all statements than could possibly have an impact - either through control or data dependencies. For example, if you were to slice on the variable sum at the println statement then every statement above it in would need to be included. Try and work out why this is the case
visitor pattern - collaborations
-A client that uses the Visitor pattern must create a ConcreteVisitor object and then traverse the object structure, visiting each element with the visitor -When an element is visited, it calls the Visitor operation that corresponds to its class. The element supplies itself as an argument to this operation to let the visitor access its state, if necessary -There is a further aspect of the dynamic behaviour of the Visitor pattern which is important to understand and is not obvious from the structure diagram, which is the traversal of the object structure. As this tree is navigated each element is visited with the concrete Visitor object by calling the accept() method and passing the concrete visitor object as an argument. The accept() method then makes a call back to the concrete visitor and passes the containing object itself (a concrete element) back as an argument so that the visitor can access its state to execute the operation.
static analysis pros and cons
-Being complete and valid for all executions makes it a powerful verification tool -Main limitation - much information regarding paths followed or bindings is not available until runtime > Dependence on input data > Dynamic memory allocation > Polymorphism -Consequently, static analysis is often "conservative" { errs on the safe side so may tend to include false positives. > i.e. things that it considers to be true which aren't -The real appeal of using static analysis is that the results will always hold - not matter what inputs are provided or what the execution path is. Of course this is very hard to establish and makes the development of static analysis tools very challenging, so this analysis usually comes at the price of caution. Also, as the results are always guaranteed to hold, they may not be that strong as the analyser cannot speculate and needs to play safe (hence the "conservative" point). -To take the example from the previous slide it may be that the call to the Foo constructor always returns a non-null result, as does fooService.read(id) except for values of id that are negative. All possible points from which acquireFoo may be called have to be identified, and if it can't be established that id is non-negative for all of these then static analysis will be unable to state that the assertion holds.
Example: Improving Trace Efficiency (2)
-Compute maximal spanning tree using the heuristic that each loop iterates 10 times and that each decision is assigned an equally likely outcome -Just monitor those edges that are not included in the spanning tree -Can then recreate the full trace from only these values -Lesson is to think carefully about what data you need to collect and how to go about collecting it For this particular problem that detailed level of instrumentation is not necessary and the points at which the program needs to be instrumented can be computed by the following steps: 1. Create a control-flow graph (CFG) of the program. This is basically a flowchart which captures the execution paths which may be taken - branches, loops etc.. For details on CFGs see: https://www.geeksforgeeks.org/software-engineering-control-flow-graph-cfg/ or https://en.wikipedia.org/wiki/Control-flow_graph 2. Convert the CFG into a maximal spanning tree. If you are unsure about the difference between a tree and a graph then take a look at https://pediaa.com/what-is-the-difference-between-tree-and-graph/ for example. To build a maximal spanning tree (similar to a minimum spanning tree https://en.wikipedia.org/wiki/Minimum_spanning_tree except we are trying to maximise the sum of the edge weights) it is necessary to add some weights to the edges. This is done by notionally assuming that each loop executes 10 times and each decision branch (if...then...) is taken with equal probability and then using this heuristic to calculate how many times each edge would be traversed and use this value as the weight. 3. Only instrument the points in the program which correspond to the edges which are missing from the maximal spanning tree. 4. After executing the program, the values of the other edges (i.e. the number of times they have been executed) can be calculated from the values observed at the instrumented points
JavaParser - Mapping to Visitor Pattern III
-Concrete Element > Implementations for every possible element in the AST > All implement the accept method: @Override public <R, A> R accept(GenericVisitor<R, A> v, A arg) { return v.visit(this, arg); } @Override public <A> void accept(VoidVisitor<A> v, A arg) { v.visit(this, arg); } > Also implement methods for accessing and manipulating the data associated with the element (which the visitor can make calls to) along with other methods (e.g. cloning, removing the node etc.) > e.g. AssignExpr has methods getOperator(), getTarget(), getValue() etc.
JavaParser - Mapping to Visitor Pattern IV
-Concrete Visitor > Several types of visitor: VoidVisitorAdapter, GenericVisitorAdapter, HashCodeVisitor, EqualsVisitor... > All contain default implementations for all visit methods defined in the Visitor interface > e.g. AssignExpr visitor in VoidVisitorAdapter public void visit(final AssignExpr n, final A arg) { n.getTarget().accept(this, arg); n.getValue().accept(this, arg); n.getComment().ifPresent(l -> l.accept(this, arg)); } > Doesn't do anything except keep the visitors moving around the AST (object structure).
JavaParser - Mapping to Visitor Pattern II
-Element Interface > Node class - everything in the AST is a Node > Implements the Visitable interface which defines the accept method signature: public interface Visitable { <R, A> R accept(GenericVisitor<R, A> v, A arg); <A> void accept(VoidVisitor<A> v, A arg); } -Visitor Interface >Two of these: VoidVisitor and GenericVisitor (depending on whether or not you want something returned) >Method signatures for visit methods for every possible element type in the AST (provide a useful clue to the names of syntax elements)
performing static analysis - use tools or framworks
-For fairly basic tasks don't underestimate the power of standard Unix tools >Can get a lot done with the likes of grep, sed, awk etc. -For anything else make use of a specific tool or framework: >javacc >ANTLR >Eclipse JDT >Javaparser >... -But first need to understand the visitor design pattern as many of them are built upon this There are a number of Unix-type tools that exist and can support this kind of problem (and typically handle structures such as regular expressions well), but the strong recommendation is that you use a specific parsing framework such as one of the ones mentioned (there are others). -Before looking at these though we firstly need to look at and understand the Visitor pattern as many frameworks make heavy use of this, and a good knowledge of the pattern makes using the framework far simpler.
performing static analysis - DIY?
-Hand-crafted (and coded) solution > Only suitable for the very simplest of problems -At least use something like Java's Regular Expressions > E.g. - count the number of decisions in a method -Generally NOT recommended as programming languages When presented with a static analysis problem then temptation might be write a small custom program for the task. Only consider this for the very simplest of problems (much simpler than we are going to encounter in this class). There are also mechanisms to help you such as regular expressions in Java (the example here demonstrates how to can use them to count the number of decision statements - if, while and for statements - in a piece of code). However, this approach is generally not recommended since you don't only have to consider the problem you are trying to solve but also the language you are trying to parse, and this is where the difficultly usually lies - your program has to be able to deal with any variation in the language syntax that is allowable. You are probably reasonably fluent in Java now but might be surprised by the complexity of the formal language syntax
example - program slicing
-Invented by Mark Weiser to assist with the problem of debugging. -Informally, program slice is an executable subset of the relevant parts of a program. -Useful for many purposes maintenance (identifying impact of change), refactoring (identifying independent subsets of programs) etc. -A slice of a program wrt a program point p and set of variables v, consists of all statements and predicates in the program that may affect the values of the variables in v at p
visitor pattern - reminder of the problem
-Motivation: Consider a compiler performing operations on elements of an abstract syntax tree (AST) -Operations (Typecheck(), PrettyPrint() etc.) are distributed over all node types making the system hard to understand and maintain -Classes become cluttered with operations that may be unrelated This is just a reminder of the problem but you should appreciate now that the AST for even a simple program is a relatively large and complex tree structure consisting of a hierarchy of different node types which capture the details of the program.
example - reverse engineering class diagrams
-Need to identify core classes, their main features and relationships Main components: -Attributes -Methods -Relationships with other classes (inheritance, association etc...) -Use this as an illustration of how to identify and extract key pieces of information using JavaParser
solutions using the visitor pattern
-Package related operations in a separate object - a visitor - and pass it to elements of the abstract syntax tree as it's traversed -When an element "accepts" the visitor, it sends a request to the visitor that encodes the element's class > It also includes the element as an argument -Visitor then has access to the element and will execute the operation -Add new functionality simply by defining new NodeVisitor subclasses
JavaParser
-Small, lightweight parser written in Java > Little documentation but good examples (and source code) -Generates the AST -Provides implementations of classes for each node type I Initially you will find you have to read the source code for these but you'll soon get the hang of it >Several example visitors for traversing the AST representation of the source code -May also be used in a non-visitor way (work your way through the AST)
static analysis introduction
-Static Analysis may be used to extract a wide variety of information from a program > Most appropriate tools and techniques to use will depend upon the information being extracted -Aim to look at some of the ways of performing static analysis -Detailed look at the Visitor design pattern
static anaylsis example - findbugs
-Static Analysis tool that is based on the notion of bug patterns >Code idioms that are likely to be errors -Open source, implemented using BCEL (bytecode analysis and implementation library) -Contains a number of detectors: > Class Structure and Inheritance Hierarchy (ignores code) > Linear Code Scan (approximates control flow) > Control Sensitive > Dataflow (most complicated) -Detectors fall into one of the following categories: >Single-threaded correctness Issues ( e.g. Suspicious equals comparison) > Thread/Synchronization correctness issues (e.g. Inconsistent synchronization) > Performance Issues (e.g. Explicit garbage collection) > Security and Vulnerabilities (e.g. Exposing internal representation) Findbugs exists as an Eclipse plug-in and is a really useful development tool. Try it!
dependencies: identifying local variables
.. public void visit(VariableDeclarationExpr n, Object a){ System.out.println("Var Type is: " + n.getElementType()); for(VariableDeclarator v : n.getVariables()){ System.out.println("Name: " + v.getName()); } } ... Again, create a list of these and compare with user-defined types Dependencies: Method Parameters -Exercise for this week's lab This has a lot of similarity with the way we processed Fields. Having visited the VariableDeclarationExpr mode we then need to extract all the associated variables. Dependencies also appear as method parameters but that, as they say, is left as an exercise for the reader
hang on, what's an AST?
AST is a tree representation of the abstract syntactic structure of source code Each node of the tree denotes a syntactic construct in the source code Abstract - not every detail appearing in the real syntax is represented see slides for example -An AST is a representation of a program that has been parsed and transformed into a tree, where each node in the tree represents a syntactic construct. It is referred to as "abstract" as various parts of the real (concrete) syntax are not represented (e.g. curly brackets ( {, } which serve to group statements but which are captured in the AST as a block node for instance). You don't need to understand ASTs in detail but you do need to appreciate the principles and the way they capture and structure a program. In particular you need to understand the various sytactic constructs which appear (e.g. FieldDef, Modifier, IdentifierExpression in the example above) as these are the types of objects from which the object structure we wish to write the operations for is built. In the case of Javaparser these are very closely related to the terms used in the definition of the Java syntax (see https://docs.oracle.com/javase/specs/jls/se13/html/jls-19.html for details).
Further Analysis consideration - introduction
Aim is to cover a number of points you should bear in mind when performing any static or dynamic analysis In particular -What information needs to be recorded in an execution trace -How to improve trace efficiency -Dangers of code instrumentation -Limitations of static analysis -What we haven't covered Hopefully you now understand the mechanisms by which you can perform dynamic analysis by using tools and frameworks such as JavaParser. This lecture looks at the broader aspects you should consider (such as efficiency) when carrying out dyanmic analysis for real. It finally wraps up with some of the topics we haven't covered and which you should look at if you want to look deeper into the topics of static and dynamic analysis.
Alternative Ways of collecting Dynamic Data
As well as using a framework like JavaParser there are several alternative approaches including using existing tools/languages such as AspectJ or Daikon, or techniques such as reflection: -AspectJ (or other aspect-based languages) > before() : execution(int Fact.factorial(int)) System.out.println("Entering factorial"); > Can add behaviour without having to recompile but restricted to fairly coarse grained data. -Daikon - system to instrument a program and analyse the collected data for interesting patterns -Reflection -JVM instrumentation -System level tools such as BTrace and DTrace As with static analysis the appropriate collection mechanism depends upon the purpose There are also numerous other ways in which dynamic execution data can be collected. The ones mentioned in this slide are just a selection of those available and range from specific tools such as Daikon, languages like AspectJ, using the features of the Java language or the JVM, or system level tools. These all have strengths and limitations and the one you use will depend on the program (and system) you are working on.
some examples of static analysers and static analysis applications
Compilers Reverse engineering tools Style checkers Metrics calculators Program Slicers 'Bad Smell' Detectors ... (there are many more) These are just a few illustrations to give an illustration of both the range and diversity of static analysis tools. There are many others and you may have encountered some in other contexts. you will also find that a large number of software engineering research uses static analysis.
Dynamic Analysis
Deriving information/ abstractions etc. from executing the system under scrutiny -Run the program one or more times with various inputs and track data and control events: >Paths taken, values output, intermediate states etc. -Need to collect a trace of execution or values held by variables >For example, insert \probes" (additional method calls) into source or bytecode >Monitor memory locations -Aim is to try and generalise about a program's behaviour from this set of observations -Information collected is valid for only the input data used in the program execution -The dynamic analysis process is typically a little more complex since the program being analysed often requires some initial (static) analysis to instrument and modify it to collect the data of interest. This modified version is then compiled and run and the outputs of the instrumentation captured.
dynamic analysis example - execution trace profile
Execution trace monitors also take various forms. The illustration shown here is at the application level and even though it looks horribly complicated various discernible patterns emerge that the software engineer developing or maintaining it may find helpful. Other types of execution traces can be useful in identifying performance bottlenecks. For example they might indicate how frequently a method is executed or the amount of time spent in an execution.
Association/Aggregations II
Extracting field details: public void visit(FieldDeclaration n, Object a){ System.out.println("Field Type is: " + n.getElementType()); for(VariableDeclarator v : n.getVariables()){ System.out.println("Name: " + v.getName()); } } Build up a list of class names and field types to identify association/aggregation relationships To find all the field declarations we override the FieldDeclaration visitor. From this we can easily obtain the type, but we then need to find all the variables defined (there may be more than one - e.g. Foo a,b,c;). The individual variables are of type VariableDeclarator which we process in turn to extract the name of each variable. Notice that there are other ways we could have coded this. e.g. by calling accept on the VariableDeclarator and then overriding the visitor for VariableDeclarator to obtain the variables
JavaParser Code Modificaion III
Finally, As we are going to be modifying the AST we can write its contents out to a new file after processing: ... byte[] modfile = cu.toString().getBytes(); Path file = Paths.get(**file location**); Files.write(file, modfile); ... Can then compile and run this Nos that every method in the AST has been modified and we need to save this new version of the system, so we write the contents of the AST out to a file. This can then be compiled and run and should print out the name of each method being executed in turn.
Generalisation/ Realisation information
Firstly, lets just get the class name ... FileInputStream in = new FileInputStream("SomeFile.java"); CompilationUnit cu; try { cu = JavaParser.parse(in); } finally { in.close(); } new ClassDiagramVisitor().visit(cu, null); ... private static class ClassDiagramVisitor extends VoidVisitorAdapter { public void visit(ClassOrInterfaceDeclaration n, Object arg){ System.out.println("Class Name: " + n.getName()); super.visit(n, arg); } } } NB: The call to super.visit(n, arg); makes sure the rest of the class gets visited - important if you have other visitors (otherwise they get ignored). This shows how we can extract the class name from the ClassOrInterfaceDeclaration element and is similar in style to the MethodVisitor example which used the MethodDeclaration visitor. We define a class which extends VoidVisitorAdapter (which has visitor implementations for every element type) and then override the ClassOrInterfaceDeclaration visitor to add in the extra functionality we want (in this case extracting the class name). The initial part of the code reads in the file we want to analyse, parses it to create the compilation unit (the AST representation of code, which corresponds to the object structure), creates an instance of the class we have defined, and calls visit(...) to start the visitors moving through the AST. When the class header is encountered then the call will be made back to our ClassOrInterfaceDeclaration visitor, also passing the AST element object corresponding to the declaration to the parameter n. A final point to note is the call to super.visit(...) at the end of the visitor. If you remember, all the visitor implementations do is keep the visitor moving around the AST. This call pushes control back to the basic implementations and ensures that the rest of the AST is traversed. If you fail to include this then you might find that the visitor fails to dig deeper into the class and might miss other visitors you have implemented.
Association/Aggregations
From the Java language specification, "The variables of a class type are introduced by field declarations" FieldDeclaration: FieldModifiersopt Type VariableDeclarators ; VariableDeclarators: VariableDeclarator VariableDeclarators , VariableDeclarator VariableDeclarator: VariableDeclaratorId VariableDeclaratorId = VariableInitializer VariableDeclaratorId: Identifier VariableDeclaratorId [ ] VariableInitializer: Expression ArrayInitializer The find any associations or aggregations we need to look for field declarations which are defined in the Java syntax as above. Java has quite a complex syntax (to say the least), but again this is helpful in guiding us towards the elements of interest in JavaParser.
JavaParser Code Modification
Going to be adding statements in to the AST so firstly need to create a new node Several ways of doing this but has to be of the right type and has to include the name of the method MethodCallExpr call = new MethodCallExpr(new NameExpr("System.out"), "println"); call.addArgument(new StringLiteralExpr(n.getName())); This code is incorporated into a MethodDeclaration visitor The first thing we need to do is to build the System.out.println(...) statement. This is not just a string but has to be an element of the correct type as it is going to be inserted into the AST. The statement we are inserting of type MethodCallExpr which itself is built from a NameExpr. Note that the argument added to this is going to be then name of the method into which the code is inserted, and so the name is retrieved from the argument n in the MethodDeclaration visitor.
Recovering Inter-Class relationships II
How can we extract these using javaparser? Relationship = javaparser detection Realisation = visit(ClassOrInterfaceDeclaration n ... getImplements() Generalisation = visit(ClassOrInterfaceDeclaration n ... getExtends() Association/Aggregation = Check for user-defined type in fields (need to maintain a list of class (type) names) visit(FieldDeclaration n... Dependency = Check for user-defined type in method parameter lists and in local variable declarations visit(VariableDeclarationExpr n ... What Else? This table shows the particular node types in JavaParser that contain the information we need to identify the constructs identified in the previous slide. This is probably the hardest part of using JavaParser initially - identifying the element type that is of interest. There is no easy way around this - you need to look at the Java syntax, read the JavaParser javadoc, make a few educated guess and experiment a little. You will quickly become more proficient in using it.
The Dynamic Analysis Process
Initially you need to decide what information you want to capture from the program -Depends on what you are trying to do -Often involves inserting "probes" of some nature to record time, values, execution, etc. etc. Then... - Modify the program using static analysis to identify the appropriate points in the program that need code inserting or changing - Save and recompile this modified version - Run the modified version and stand back and admire the results In Summary: Parse program source, insert instrumentation, generate modified source, compile, run, and then analyse collected data In order to capture the information about the running program, the program itself needs to be modified in order to insert the additional code which (for example) records where tests have been or logs the order of execution etc. This initially involves performing static analysis to instrument the program being analysed, which is then saved, compiled and executed.
visitor pattern
Intent: Represent an operation to be performed on the elements of an object structure Visitor lets you define a new operation without changing the classes of the elements on which it operates The point above is the important one - don't be misled by the name Motivation: -Consider a compiler performing operations on elements of an abstract syntax tree (AST) -The name of the Visitor pattern rather misleading. The main purpose of the Visitor Pattern is to be able to easily define and apply new operations on the elements of some object structure without making any modifications to the elements themselves. -The motivational example give is where a compiler might need to perform a range of different operations over the various nodes of an abstract syntax tree (a parsed representation of a program) such as TypeCheck(), GenerateCode() etc. These operations are likely to be different for each type of syntactic structure and so is going to lead to a very complex and cluttered object structure and a maintenance nightmare.
Example: Improving Trace Efficiency (1)
Is it really necessary to insert probes at every decision outcome? To capture the information mentioned for this detailed execution sequence you might be tempted to think that you would need to instrument the program at every point a different branch might be taken - e.g. after every if..then statement, while loop entry and exit point, and so on.
JavaParser - Mapping to Visitor Pattern
JavaParser provides implementations for:- -Element Interface -Visitor Interface -Concrete Element (all possible entries in the AST) -Concrete Visitors (various default implementations) -Object Structure (generated for the program being parsed)
JavaParser Code Modification Example
JavaParser provides mechanisms to parse, modify and add to the AST -For any node type (e.g. AssignExpr, Parameter, WhileStmt etc.), there is a corresponding set* operation for every get* > Allows you to modifying existing source. e.g. change method name: n.setName(n.getNameAsString().toUpperCase()); -Node types also have constructors and builder methods (add* etc.) to assist in the creation of new AST nodes > e.g adding a new parameter to a method n.addParameter(PrimitiveType.intType(), "value"); (where n is of type MethodDeclaration) OK, but limited. How do we add in code to collect information? To modify a program there are some built-in methods in JavaParser which allow you to make simple changes (these vary according to node type, but for each get() method there well also be a set() method. Take a look at the javadoc for the type of element. However, there are limits to what you can do via these methods so we want to look at how to insert any type of code.
Tracing
Key component of dynamic analysis: task of recording information about software executions and storing them for subsequent analysis -Sequence of observations of control (e.g. method entry point) and date state (instance variable values, parameters etc.). More formally: Trace consists of n observations: T =<< p0, D0 >, ..., < pn, Dn >> where pi is some (non-unique) point in the code and D is the state of various relevant data variables at that point Two important points to consider: > What needs to be traced > Which traces to collect: how do you identify 'typical' usage and avoid biased results? (Beyond the scope of this course) A trace is one of the key products of dynamic analysis - the result of executing an instrumented (and recompiled) program. Traces may take several forms such as: • A sequence of all methods called during an execution and the values of parameters passed • A more detailed profile of every statement executed • A record of the amount time spent executing each method • A history of the values held by every field in every object over the duration of an execution • and so on... This give a way of formally describing all of the above (and any other trace) as they each have in common the fact that they are a sequence of observations made at various points in a program (not necessarily unique, as a particular point may appear several times in an execution trace), associated which is the state of the program at the time of execution (e.g. values of parameters, fields, system time...)
dynamic analysis example - test coverage analysis
Like the illustrations of static analysis this is just a small number of sample application examples. Test coverage analysers provide feedback on how thoroughly your test cases exercise the program. For example, they may report on what proportion of statements or branches within a method have been exercised as a result of running a test suite and provide guidance on which parts of the code have never been executed. There are many more sophisticated levels of test "coverage" (as it is known) that can be measured such as data flow, mutation etc. etc.
example - "bad smell" detection
List of (initially) 22 coding patterns that were argued to be indicative of design problems (code needing refactoring): -Duplicated Code -Long Method -Large Class -Long Parameter List -Divergent Change -Shotgun Surgery -Feature Envy -Data Clumps -Primitive Obsession -Switch Statements -Parallel Inheritance Hierarchies -Lazy Class -Speculative Generality -Temporary Field -Message Chains -Middle Man -Inappropriate Intimacy -Alternative Classes with Different -Interfaces -Incomplete Library Class -Data Class -Refused Bequest -Comments many detectable using static analysis -Bad smells is the term given to constructs within a piece of software that are considered suspicious - not definitively problematic but indicators or potential problems which may need investigation and addressing (via refactoring). There have been many additional smells added to this list since it was created. They are also behind the refactoring actions build into tools such as Eclipse (another example of static analysis).
Alternatice Instrumentation Approaches
Manipulate bytecode instead of the source public class Fact { public int factorial(int n){ int x = 1; while (n > 0){ x = x * n; n = n + 1; } return x; } } -Simpler (but less familiar?) syntax -Lots of frameworks to support analysis and instrumentation > ASM, BCEL etc. - all based around visitor again public Fact(); public int factorial(int); Code: 0: iconst_1 1: istore_2 2: iload_1 3: ifle 17 6: iload_2 7: iload_1 8: imul 9: istore_2 10: iload_1 11: iconst_1 12: iadd 13: istore_1 14: goto 2 17: iconst_1 18: ireturn The focus in this part of the class has been on carrying out static and dynamic analysis on Java source code but it is also possible to perform very similar tasks on the byte code. This has several advantages, particularly for dynamic analysis, as the syntax is simpler and there is no need to recompile the instrumented code (which also means you need to be especially carefully about the instrumentation). There are frameworks such as ASM https://asm.ow2.io/ and BCEL https://commons.apache.org/proper/commons-bcel/ which support this and which follow a very similar design to JavaParser.
recovering inter-class relationships
May be simplified to the following: -Inter-Class Relationship Identification Rules -Relationship = Code Realisation = Directly from keyword: Class A implements B { ... } Generalisation = Directly from keyword: Class A extends B { ... } Association/ Aggregation = Class attribute class A { B b; } Dependency = Parameter or local variable Class A { void f(B b) { b.g();} } Class A { void f() {B b; ... b.g();} } This table illustrates how the various inter-class relationships appear in Java code. We now need to think about how we can spot these constructs using JavaParser.
dynamic analysis example - memory monitors
Memory monitors come in various shapes and sizes but the example here shows the system heap allocation of objects from an application - how many instances and how much space they occupy - and can be useful in examining performance issues.
Dependencies
Method parameters and local varables Local variables - again look at the Java syntax: variable_declaration = < modifier > type variable_declarator < "," variable_declarator > ";" variable_declarator = identifier < "[" "]" > [ "=" variable_initializer ] type = type_specifier < "[" "]" > type_specifier = "boolean" / "byte" / "char" / "int" ... class_name / interface_name modifier = "public" / "private" / "protected" ... "threadsafe" / "transient" Dependencies appear as references to classes used as local variables or parameters. To process the local variables we again look at the Java language syntax. Like I said, the syntax of Java is quite complex...
Hazards of Code Instrumentation
Need to be careful that the semantics of the original code are preserved. e.g. ... if(a<b) b = b*a; else a = a/b; ... ... if(a<b) probe(B); b = b*a; else (highlighted in red) probe(C); a = a/b; ... ... if(a<b){ probe(B); b = b*a; } else { probe(C); a = a/b; } When actually writing the code to perform the instrumentation you need to be careful that the semantics of the original program are preserved. This illustration shows the consequences of inserting some instrumentation code after every condition. In this case the problem would be flagged up when you tried to recompile the program, but in many cases such errors might go unnoticed (e.g. if the else part was missing so you need to test your instrumentation code very carefully.
JavaParser Code Modification II
Next step is add this newly created statement into the method body: n.getBody().get().addStatement(0,call); Could also be written as n.getBody().ifPresent(l -> l.addStatement(0,call)); The next step is to insert the line of code we have just created into the body of the method that has been passed to the visitor. To achieve this we need to access the body of the method, and then add the call statement into the start of this.
example - software metrics
Numeric abstractions/summaries of program attributes -Lines of Code -McCabe's Cyclomatic Complexity -Chidamber and Kemerer suite (or C&K metrics) I Six metrics intended to capture important aspects of OO systems: > Weighted Methods Per Class (WMC) > Depth of Inheritance Tree (DIT) > Number of Children (NOC) > Coupling Between Objects (CBO) > Response for a Class (RFC) > Lack of Cohesion in Methods (LCOM) -Several tools available to collect these There have been literally hundreds of metrics developed over the years but they essentially all follow the same principles of creating numeric summaries of programs based on a vareity of criteria (e.g. numbers of line/methods in a class, connections to other classes through references or inheritance, the depth of nesting in a method, the number of control statements etc. etc.). The ones mentioned here are some of the more well-known or popular.
static anaylsis example - reverse engineering
Process of supporting software engineers in understanding large, complex software systems from just the source or object code by providing useful abstractions and summaries about their structure and behaviour Examples of reverse engineering include tools for creating UML diagrams from source code (e.g. class, object, interaction and state machine diagrams). They are typically used to gain insights into and an understanding of a system, or generate up-to-date documentation.
Tracing Example - Execution Profiling
Record of which paths have been taken -Insert "probes" (method calls of some sort) at key points in the code (e.g. directly after decisions) -Recompile program with probes and run. -Record of probes provides profile information > e.g. ABtCtDFBtCtDFBtCtDFBtCfEFBfG This is an illustration of how trace efficiency can be improved. It uses the example of creating a trace of the detailed execution sequence - which paths have been taken through a program - so that you will have a record of the sequence of statements that have been executed, branches taken, loops iterated etc. etc.
dynamic analysis - example applications
Similarly broad range... -Profilers -Debuggers -Testing frameworks -Memory monitors ... (there are many more) Like the illustrations of static analysis this is just a small number of sample application examples.
JavaParser Code Modification - Summary
Simple illustration but hopefully easy to see how it might be adapted to perform other sorts of instrumentation: -Write information out to a file or record in a logging object -More detailed coverage (branches taken etc...) -Timing information -Recording values of fields at various points -etc etc. ... The general principles here can be applied to a wide variety of problems which involve capturing run-time information from a program
basic class diagrams
Start with key components of individual classes -Attributes -Methods -Type, visibility and scope (class or object) of above -Method parameters Information can be directly obtained by analysing the code using visitors for particular syntactical elements Seen example of this already with MethodVisitor and will look at others later
Dynamic Analysis
Static analysis is a powerful technique but has several limitations: -Hard to capture the run-time configuration and behaviour of the system -How does the system react to different inputs, conditional execution (ifs, loops, recursion...), polymorphism etc. Dynamic analysis can provide a more accurate (but possibly less complete) representation Complements static analysis but also a valid analysis technique in its own right Aim of lecture: -Reminder of the concept of dynamic analysis -Overview of how to collect data to perform dynamic analysis It's important to appreciate the difference between static and dynamic analysis. Static analysis involves no execution of the code being analysed. Dynamic analysis aims to provide a complementary view of the system and capture its dynamics by running the system being analysed
Dynamic Analysis - More Typical Applications
Test coverage or execution traces for example... To capture the information to achieve the above involves modifying the program in some way These are just a couple of examples. Both involve monitoring the programs being analysed in some way but capture different aspects - one looks at test coverage (which parts of the code have been executed by test cases) and the other records execution traces (looking at which parts of the code have been executed and in what order).
Tracing - What needs to be traced
Tracing is expensive: > Recording data is time-consuming and may also interfere with execution > Traces can occupy large amounts of storage > Analysis can be time-consuming Simple JHD example: >Trace of method signatures involved in starting the program, drawing a rectangle, and closing the program: > 2426 items! -Important trade-off between utility and scale -Various ways in which trace efficiency can be improved - placing of probes and data collected. A very important point to remember is the consequences of dynamic analysis and execution tracing. This example shows just how many method calls might be made in an apparently relatively small execution sequence. The message here is to think carefully about what information you need to collect (i.e. what instrumentation you are going to perform) for the problem you are trying to solve as this can have consequences fr the execution time of the program and the volume of data that gets produced and which needs to be subsequently analysed.
Simple Dyamic Analysis Illustation
Want to track method invocations - what's called and in what order Insert code to print out method name each time it is executed (in practice you would do something more sensible) -Original class public class SomeClass { ... public int methodA(int n){ int x = 1; while (n > 0){ ... } public void methodB(String m){ anotherClass.methodC(m); ... } .. etc. } -Modified class public class SomeClass { ... public int methodA(int n){ System.out.println("methodA"); int x = 1; while (n > 0){ ... } public void methodB(String m){ System.out.println("methodB"); anotherClass.methodC(m); ... } .. etc. } This is a simple illustration of the kind of thing we are trying to achieve. To create a basic execution trace of which methods have been executed we are going to print out the name of each one visited, so we want to automatically insert the println() statements. (If you were doing this for real then you would write out the information to a log file or record it in some object for example.) The next slides look at how we can use JavaParser to insert this code into every method in a system.
Generalisation/ realisation information II
What other information can we get from the ClassOrInterfaceDeclaration visitor? Look at the Javadoc and code again ... private static class ClassDiagramVisitor extends VoidVisitorAdapter { public void visit(ClassOrInterfaceDeclaration n, Object arg){ System.out.println("Class Name: " + n.getName()); System.out.println("Class Implements: "); for (ClassOrInterfaceType coi : n.getImplements()) { System.out.println(coi.getName()); } super.visit(n, arg); } } Inheritance information can be found in a similar manner This illustration extends our ClassOrInterfaceDeclaration visitor to extract details of any interfaces the class implements. You can also find inheritance information in a similar way. NB: If you implement these you might need to modify them slightly to catch the case where a class may not implement or extend any other classes and these calls then return null
working with JavaParser
What other information might we get extract from a MethodDeclaration? -Take a look at the javadoc -Take a look at the code associated with the concrete element Key to using javaparser: -Need to be able to relate the code element you are interested in to the name used by javaparser > Reading the official Java syntax can help > Look at the code - Visitors as well as Concrete Elements - for more details > Bit of intelligent guesswork and experimentation! -JavaDoc also useful > Includes small examples of relevant syntactical element
Summary
What we have covered:- -Overview of static and dynamic analysis, including representative tools/applications and strengths/weaknesses -How to collect static analysis data -JavaParser framework and the Visitor pattern -How to collect dynamic analysis data What we haven't covered:- -Data Flow Analysis -Program Dependence Graph -System Dependence Graph Often need to consider these for some applications There is lots that we have not had time to cover, in particular collecting the information needed to perform detailed analysis of programs. To do this requires examination of the data flow in particular - i.e. the way that data is defined and used within a program. The standard way of analysing this is by way of a program dependency graph which captures both control-flow and data-flow relationships. This is also a systems dependence graph which captures dependencies to other modules (methods classes etc.). If you are keen to follow this up (by no means a requirement of this class) then also look at the symbol solver feature within JavaParser.
a word of caution with using visitors
What would happen if we tried this? ... public void visit(VariableDeclarator v, Object a){ System.out.println("Name: " + v.getName()); } Need to use the most specific visitors for the element of the program being processed May also need to control flow of execution - e.g. by calling accept(this, arg) on an element you may wish to visit further. Try doing this to see what happens. Controlling the execution of visitors and making sure you visit the appropriate elements can sometimes be a little tricky
frameworks that use Visitor pattern
earlier we said... -a client that uses the visitor pattern must create a ConcreteVisistor object and then travers the object structure, visiting each element with the visitor - when an element is visited, it calls the Visitor operation that corresponds to its class. The element supplies itself as an argument to this operation to let the visitor access its state, if necessary -Frameworks that use the visitor pattern do most of this for you! > Generate the AST and associated classes for each node type (this is the object structure) > Provide generic visitors for traversing the AST -Add new functionality by subclassing specific visitors of interest Will look next at how to do this using Javaparser This might all sound like a lot of work but the good news is that frameworks which make use of the Visitor pattern will have already done a lot of the hard work: • The AST has been generated along with the associated classes which all implement the accept() method which represents the object structure. • Generic visitors exist for traversing the object structure The problem of adding new functionality is then just a matter of sub-classing specific visitors or interest.
what is the visitor pattern?
main design patter upon which parsing frameworks are usually based • The Visitor Pattern forms the basis for building parsers and understanding it is an essential pre-requisite to understanding and using the parser (but it does have other uses other than building parsers).
what is the javaparser framework?
parsing framework which makes building static and dynamic analysers much Javaparser is a lightweight parser for Java which is built using the visitor pattern and which we will be using to build some simple dynamic or static analysis tool. There are other parsers available - most notably build into the Java Development Toolkit - and even though they are more complex they follow the same principles as Javaparser
what is static analysis?
process of automatically analysing a system's code for information (entities and relationships) which is used to create abstractions of, and answer questions about, the system -No execution of the code being analysed takes place > It is just input data for the analyser -Used for real - Windows regression tests takes two weeks to run. Microsoft uses static analysis to determine which subset to execute prior to a critical release. -Aims to obtain information that is valid for all possible executions -As you will see there are a number of applications of static analysis but the essential point is that you are writing a program that is analysing and establishing properties about another program. Static analysis may also be applied at different levels - for example you might write something fairly lightweight -which just highlights empty exception handlers, or you could go much deeper and write a system which checks that all variables are appropriately defined before use, or adds some sophisticated extension to the type checking system. However, these are all examples of static analysers. -The Windows example is a couple of years old and it probably takes even longer now but the point still stands (and applies to all big systems). If you make a small change it is really not feasible to run an entire regression test suite to check nothing is broken, so static analysis is used to identify the dependencies > parts of the system which use the changed code or may be impacted by it in some way - and then identify and run the regression tests associated with this subset of the system
what are static and dynamic analysis?
programs which operate on other programs to provide useful information • Static and dynamic analysis techniques provide a context and motivation for carrying out the parsing (basically reading in the program and then identifying the components you are interested in) - there is much more to them than is introduced here (read the related articles on myplace).
visitor pattern - generic structure
see slides The structure of the Visitor pattern involves two separate hierarchies of objects - the one on the left is the object structure that we wish to traverse (the AST of a program in this case), and the one on the right captures the operations that we wish to perform on these nodes (the visitors). • Every node in the object structure must contain an accept() method which has a parameter of type visitor • The operations we wish to perform on the object structure are grouped in the visitor hierarchy according to their functionality (e.g. type checking, pretty printing). • The visitor is passed to the AST elements as it is traveresed via the accept() method • When the accept() method executes it makes a call back to the visitor and passes the object element itself as an argument • The visitor now has access to the AST object of interest and can query it to carry out the necessary operations -If we wish to add in more functionality (e.g. for GenerateCode()) then the only thing we need to do is to modify the visitor hierarchy and define new subclasses as appropriate This reinforces the structure above and presents it in a more generic fashion which may be applied to other problems. The main point is that there are two separate hierarchies - the visitor hierarchy and the element hierarchy which both have interfaces defining operations that are implemented by concrete subclasses. Every class in the element hierarchy must implement the accept() method. This visitor hierarchy is where we define the new operations we wish to carry out on the elements. The client program holds references to both these hierarchies.