Design of information structures

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

What are the two issues to keep in mind when designing a CBHT?

1. Number of buckets 2. Hash function

What are the motivations as to wanting to classify data into types?

>Often particular operations can only be performed upon specific types >The representation and use of data are simplified by well defined types Type declarations are strictly enforced in many modern languages

What is the motivation behind abstract data types? (ADTs)

Abstract data types are motivated by a desire for separation. Avoiding coupling between application and representation: >Application code should not impact upon data representation >Good software engineering practice safeguards code quality Maintaining a distinction between concept and implementation Concepts must be independent of any specific implementation

What is the differences between ordered and unordered trees?

An ordered tree is one in which the children of each node have a some particular order The children of a node can be viewed as a list An unordered tree is one in which the children of each node have no particular order The children of the a node can be viewed as a set

What are the applications of queues?

Computer games Queues are used as software buffers to maintain input ordering Program scheduling Round robin schedulers often use queues to enforce fairness Resource access control / reservation systems Queues can ensure that access to resources is fairly allocated

What are the disadvantages of arrays?

Constant time deletion may yield a sparse array Element shifting is required Insertion may require that elements be shifted A relatively expensive operation Arrays have a fixed size / capacity Array-based ADT implementations are often bounded Array expansion may involve an expensive copying procedure

What are the different types of compression functions?

Division hash(x) = x mod n, where the hash table size n is prime The basis for selecting a prime n is linked to number theory Multiply, Add and Divide hash(x) = (ax + b) mod n, where a > 0 and b > 0 We need (a mod n) ≠ 0 to avoid every hash value being b

What is the halting problem?

Given a program (P) and an input (I), determine whether P will eventually halt when run with input I What if P were to fall into an infinite loop? Would any algorithm that we use to decide whether P eventually halts ever terminate? No termination => No solution => Not an algorithm!

How does a open bucket hash table work? (OBHT)

In an open-bucket hash table (OBHT) each bucket may contain a maximum of one entry Collisions are handled through displacement When a collisions occurs the entry to be inserted is displaced into an unoccupied bucket The approach depends upon a search algorithm which is able to find displaced buckets in a consistent / repeatable way The insertion and deletion operations rely upon the search algorithm

What are collisions?

In general it may not be possible to assign each unique key to a separate bucket The number of distinct keys may exceed the number of buckets The hash function may not distribute keys uniformly The situation where two distinct keys are hashed to the same bucket is commonly known as a collision A collision occurs when k1 ≠ k2 but hash(k1) = hash(k2) Select a hash function which minimises the probability of collisions

Why may a small integer assumption for key-based array implementations of maps not be unreasonable?

In many problem domains this small integer assumption is not as unreasonable as it might first seem Organisations often assign numbers to customers and products Integers offer an intuitive ordering that is easy to interpret Unique identifiers based upon integers are common

How efficient is a closed bucket hash table?

In order to analyse the complexity of searching a CBHT with n entries we can make simplifying assumptions Hashing a given key is a constant time operation Comparisons are the only form of characteristic operation The algorithm employed to search within a bucket is linear On average (n/2) comparisons will be made, though we are really interested in the best and worse case time efficiency Best case is O(1) All distinct keys hash to a unique bucket Worst Case is O(n) All distinct keys hash to the same bucket In reality we can achieve efficiencies approaching O(1) if we think carefully about specific design choices

Summary of recursive algorithms?

Linear Recursion - A set of base cases and a single recursive call Multiple Recursion - A set of base cases and two or more recursive calls Head Recursion - An algorithm where the first operation is a recursive call, though more precisely it is where a recursive algorithm must wait for prior call evaluation Tail Recursion - A recursive algorithm where the last operation is a recursive call which does not require that prior recursive calls be evaluated before it can proceed Indirect / Mutual Recursion - A recursive algorithm comprised of two computational components which make interdependent calls during execution

What is the relation between trees and arithmetric expressions?

Simple arithmetic expressions can be represented in several ways, such as simple strings or sequences of symbols The efficiency of evaluating expressions when using these various representations can often be extremely disappointing Trees provide a simple and efficient representation for the evaluation of arithmetic expressions Leaf nodes contain a numeric value Non-leaf nodes contain operators Any correctly formed arithmetic expression can be expressed as an arithmetic expression tree Simple recursive algorithms can be devised to evaluate arithmetic expression trees, i.e., recursive post-order traversal Recursively evaluate root, left and right nodes until a single node remains

What are the advantages of arrays?

Random access Potential for constant time access to arbitrary elements Memory can be statically allocated Dynamic memory management can be troublesome and difficult Machine architectures can be optimised for arrays Storing array elements in contiguous memory

What are the advantages of using abstract data types?

Simplicity: Abstraction separates essential qualities such as data and operations from inessential qualities such as representation and implementation Integrity: Limiting the class of operations that can be performed helps to maintain the integrity of data, as users can not perform operations which destroy the integrity of data Implementation Independence: Provided that a contact / specification remains stable then any underlying implementation can be changed without requiring that client applications be modified Encapsulation: The separation of specification and implementation means that client applications can only manipulate data through well defined interfaces

What is the JCF?

The Java Collections Framework is a set of classes, interfaces and algorithms that implement common collection data structures Framework is based upon 14 collection interfaces Facilitates and encourages interoperability among APIs Implementations are reusable

Difference between java.util.map and our map adt?

The Java interface we have defined is almost consistent with the java.util.Map interface The java.util.Map interface defines several additional methods which are non-essential to the Map ADT The java.util.SortedMap interface defines the values and operations associated with an ordered map Keys must implement the java.lang.Comparable interface to allow ordering No iterator is defined for the java.util.Map interface, thus getKeys() must be used to traverse instances of any implementing class

The relation between sets ADT and a binary tree

The best case time complexity for each operation will be achieved when the BST is balanced The worst case time complexity for each operation will be achieved when the BST is unbalanced

What is a balanced binary tree and how does it work?

The depth of a node is the number of links that must be followed to reach the root from that node The depth of a tree is the depth of its deepest node A binary tree of depth d is balanced if every node at depth 0, 1, 2, ..., d-2 has two children Nodes at depth d-1 may have a maximum of two children Nodes at depth d have no children Balanced binary tree of depth d must have at least 2d nodes and may have as many as 2d+1-1 nodes The depth of a balanced binary tree of size n is floor(log2n) A non-balanced binary tree of depth d could have as few as d+1 nodes The maximum depth of a non-balanced binary tree of size n is n-1

What are desendants and anscestors of a node?

The descendants of a node N are the nodes that are reachable when following any direct path from N to any leaf node The ancestors of a node N are the nodes that are reachable when following any direct path from N to the root node

Why dont we use real time taken as a measure of efficiency?

The hardware and software upon which the the algorithm is running would impact upon any real time measure, thus we wouldn't really be analysing the algorithm itself

Priority queue terms:

The length of a priority queue is the number of elements it contains An empty priority queue has a length of 0 Stacks and queues model specific types of priority queue In a stack the priority of each inserted element is monotonically increasing, i.e., the last element inserted is always retrieved first In a queue the priority of each inserted element is monotonically decreasing, i.e., the first element inserted is always retrieved first

How does the number of buckets affect a CBHT?

The load factor of a hash table at any given point in time is given by (n/m), where n is the number of entries and m is the number of buckets A hash table with a load factor that is likely to be between 0.5 and 0.75 is generally considered to be good A high load factor increases the likely length of each linked list A low load factor may indicate that memory space is being wasted

What is the Sieve of Eratosthenes?

The sieve of Eratosthenes is a simple algorithm for finding all prime numbers below a given integer Can easily be implemented using a set The algorithm is efficient for small primes, e.g., less that ~12 million Works by filtering / sieving based on previous results Begin with a set of integers {2,3,4,5, ... , n-1} Continuously remove integers that are found to be multiples of previously encountered integers

What operations does a set ADT have?

Void add(Object o) adds object o a member of the set Void remove(Object o) removes object o as a member of the set Void union(Set s) makes set equal to the set union of itself and s Void intersection(Set s) makes set equal to the set intersection of itself and s Void difference(Set s) makes set equal to the set difference itself and s Boolean contains(Object o) returns True if object o is a set member, False otherwise Boolean containsAll(Set s) returns True if the set subsumes s, False otherwise Boolean isEqual(Set s) returns True if s is equal to the set, False otherwise Integer size() returns the cardinality of the set Boolean isEmpty() returns True if the set has no members, False otherwise

What are the OBHT's operations?

search (Object k) - Starting at bucket hash(k), inspect consecutive buckets until <k,v> is found, an empty bucket is reached or all buckets have been inspected insert(Object k, Object v) - Starting at bucket hash(k), inspect consecutive buckets until an empty bucket is found (and thus <k,v> can be inserted) or all buckets have been inspected remove (Object k) - Perform a search for <k,v> and make the associated bucket empty

What is a collection?

A collection consists of zero or more elements and is equipped with operations to add and remove elements Elements stored within a collection are usually of the same type Inheritance allows a degree of heterogeneity in Java A collection will have its own specific characteristics which are dictated by the operations defined over it A collection is merely a specification, it must be implemented before it can be used by a client application

Explain collections:

A collection groups multiple elements to form a single entity A collection facilitates the storage, retrieval, manipulation and communication of aggregated data A collection is typically used to represent data items which have some form of natural grouping

What is a collection framework?

A collections framework is a unified architecture that facilitates the representation and manipulation of collections, and contains interfaces, implementations and algorithms

What is lexicographic ordering?

A d-tuple is a sequence of d keys taken from {k1, k2,...kd}, where key ki is said to be the i-th dimension of the n-tuple Cartesian co-ordinates are expressed as 3-tuples The lexicographic order to two d-tuples is defined recursively (x1, x2,...xd)<(y1, y2,...yd) OR (x1 < y1)∨(x1 = y1)∧(x2,...xd) < (y2,...yd) The d-tuples are compared by their first dimension, then compared by their second second dimension etc.

What is a data structure?

A data structure is a systematic way of storing and organising a collection of data

What is a destructive and non-destructive method?

A destructive method is a method that alters the attribute of an object. A non-destructive method you do no alter the attribute of an object.

What is a linkedlist?

A linked list is a concrete data structure composed of a sequence of nodes connected by links Each node contains: An element A link to one or both neighbouring nodes

What is a map?

A map is a collection of entries with distinct keys The ordering of entries in a map is not significant Each entry in a map is a tuple <k,v> Key field (k) Value field (v) No two entries in a map can have equal keys There are no restrictions on the values that a map can store There is no loss in generality incurred by assuming that map entries are a key-value pair A key can be of any type provided that the key is distinct Arbitrary data may be stored (e.g. tuples with multiple fields) The cardinality of a map is the number of key-value pairs it contains An empty map has a cardinality of 0

What is mutual / indirect recursion?

A mutually recursive algorithm is characterised by two computational components which make interdependent calls during execution. It is possible to turn any mutually recursive algorithm into a linearly recursive algorithm. Transformation involves inlining computation components.

Tree terminolgy:

A node may have any number of children A leaf node is one that has no children The size of a tree is the number of nodes it contains An empty tree has a size of 0 The subtree rooted at a node N consists of that node and all of its descendants

How can a priority queue be represented by a heap?

A priority queue can be efficiently represented by a heap The heap property ensures that the least element in the priority queue will be stored at the heap's root position The add(...) and removeLeast() operations now have a time complexity of O(log n) Improvement over the sorted and unsorted linked list representations The idea of maintaining a sorted heap gives rise to the heap sort algorithm which we will encounter in later lectures

What is a selection sort?

A selection sort repeatedly finds and swaps the element with the smallest value in the list with the first non-considered element# For the number of elements in the input list, find the minimum element of the list, swap the minimum element with the first unordered element in the list

What is an insertion sort?

A simple algorithm based upon comparison sorting Broadly comparable with the bubble sort algorithm A sorted list is built one element at a time Performs reasonably well for short lists Sorting algorithms such as quick sort, heap sort and merge sort are more efficient for larger lists

How does a bubble sort work?

A simple sorting algorithm which starts at the beginning of a list Compares the first two list elements and exchanges them if the first is greater than the second Continues along the list, comparing and swapping adjacent elements, until the end of the list is reached Begins the process again with the first two elements and repeat until the list is sorted

What is a sorting algorithm?

A sorting algorithm is an algorithm that puts a given list of elements in a certain order

Explain spanning trees and forests:

A spanning tree of a connected graph is a spanning subgraph that is a tree A spanning tree is not unique unless the graph is a tree A spanning forest of a graph is a spanning subgraph that is a forest The concept of a spanning tree / forest is central to the design of communication networks

What is a stack?

A stack is a sequence of elements with the property that elements can only be added and removed at one end of the sequence, i.e., at the top

What is a subgraph?

A subgraph S of a graph G is a graph such that the vertices of S are a subset of the vertices of G and the edges of S are a subset of the edges of G A spanning subgraph of G is a subgraph that contains all the vertices of G

What is topological ordering?

A topological ordering of a directed graph is a numbering v1,..., vn of the vertices, such that for every edge (vi,vj) it is the case that i<j A well known theorem states that a directed graph admits a topological ordering if and only if it is a DAG The goal of a topological sort on a graph G = <V,E> is to number vertices such that (u,v) in E implies that u<v

What is a tree?

A tree is a collection of nodes such that: Each node contains an element Each node has branches leading to a number of other nodes (its children) A tree has a unique root node Each node is the child of precisely one node (its parent)

What is the A* search algorithm?

A* is a best-first graph search algorithm A* can be used to find the least-cost path from a given start node to a goal node A* is an informed search because of the heuristic it employs Uses a distance-plus-cost heuristic to determine the order in which the nodes are explored By convention the distance-plus-cost heuristic function is normally denoted as f(x)

What is adaptability?

Adaptability relates to the impact that the degree to which an input list is already sorted has upon the performance of a sorting algorithm A sorting algorithm whose running time is impacted by the degree to which its input list is already sorted is known as adaptive Typically we are interested in whether a sorting algorithm is adaptable or not, thus the degree of adaptability is not usually considered

Divide and conquer analysis:

All efficient divide and conquer algorithms divide a problem into subproblems, each of which is come fraction of the overall problem Establishing complexity often involves solving a recurrence relation We have seen that merge sort operates on two subproblems, each of which is half the size of the overall problem We can therefore argue, as we have done in the past, that under appropriate initial conditions the running time of merge sort is equal to the running time of the problem splitting plus O(N) addition work

What is java.util.arrayList?

An extensible array-based implementation When the additional of a new element would lead to an overflow, the array is copied into a new longer array The size of the new array is significant! Why? The default scale factor is 1.5 The implementation is highly optimised and specifically designed to fit the Java Collections Framework

What is an interface (2)?

An interface declares the method signatures that any implementing class must provide define / implement A method signature consists of a: Method name Access modifier Return type Set of parameters (including types)

Disadvantages of greedy algorithms:

Applicability In many cases it can be shown that a greedy algorithm can not be devised or that it would not yield an acceptable solution For some specific problems it can be shown that a greedy algorithm will reliably produce the unique worst possible solution Optimality Making locally optimal improvements in a locally optimal solution is not guaranteed to yield a optimal global solution

Dynamic programming disadvantages:

Applicability It is often not possible to apply a dynamic programming approach The requirements that a problem exhibit overlapping subproblems that are only slightly smaller and optimal substructure make the approach difficult to apply in some situations Difficulty of implementation Despite many dynamic programming algorithms being similar in function and structure, it can be difficult for novice software engineers to produce algorithms which achieve the highest possible efficiency

How can you implement the list interface in java?

ArrayList An optimised array-based implementation which offers better general performance than alternative List implementations LinkedList A linked list implementation which can out-perform an ArrayList under specific circumstances Vector A legacy class of Java 1.0 that has recently been retrospectively modified to implement the List interface

What is the difference between vector and arraylist?

ArrayList replaced Vector Provides the same functionality within the structure of the JCF Methods in Vector are synchronised Safe to call when writing multi-threaded software Introduces a performance penalty Vector defines several legacy methods that will not be supported in future versions of Java Both implementations hold elements in a fixed-size array If there is space in the array, insert into the next empty component If there is no space in the array, expand the array before inserting into the next empty component Expansion uses the arrayCopy(...) method in java.lang.System Expansion increases the size of an ArrayList by a factor of 1.5 Expansion increases the size of an Vector by a factor of 2 An ArrayList is generally fast for: Insertion Retrieval by index Replacement A Vector is generally fast for: Checking membership Finding and removing elements

What is Asymptotic Analysis?

Asymptotic analysis is about expressing the running time of an algorithm in Big-Oh notation

Compare open and closed bucket hash tables:

CBHT have two significant advantages Inherently unbounded Less sensitive to high load CBHT are preferable to OBHT in almost all situations OBHT are useful in situations where memory must be allocated statically or there is a specific upper bound on the required number of buckets

What are the two common solutions for dealing with collisions?

Closed-bucket hash tables Open-bucket hash tables

What is the definition of a directed graph?

Formally, a directed graph is a pair G = <V,E> where: V is a set whose elements are called vertices E is a set of ordered pairs of vertices taken from V A simple directed graph is a directed graph with no multiple edges or cycles Corresponding to 0's on the diagonal of the adjacency matrix representation A simple directed graph of n vertices has between 0 and n(n-1) edges

Explain generics:

Generics Adds extra compile-time type-safety to collections classes by restricting the type of objects that a collection can store Removes the need for an explicit cast at element retrieval

What are the three types of graph algorithms with examples?

Graph search algorithms Depth first search, breadth-first search Shortest path algorithms Dijkstra's algorithm, Bellman-Ford algorithm Minimum spanning tree Prim's algorithm, Kruskal's algorithm

What are the possible ways to implement the map interface in java?

HashMap Highly efficient hash table representation that gives no guarantees regarding iterator ordering TreeMap Stores elements in a red-black tree and is therefore significantly less efficient than a HashMap for many operations LinkedHashMap A hash table representation that uses a linked list to maintain an ordering amongst elements, thus allowing ordered iterators to be created HashTable A straightforward hash table implementation Retrofitted to implement the java.util.Map interface Unlike other Map implementations, HashTable is synchronised

How can you implement a set in java?

HashSet Highly efficient hash table representation that gives no guarantee regarding iterator ordering TreeSet Stores elements in a red-black tree (ordered by their values) and is therefore significantly less efficient than a HashSet for many operations LinkedHashSet A hash table representation that uses a linked list to maintain an ordering amongst elements, thus allowing ordered iterators to be created

What is hashing?

Hashing is about taking arbitrary keys and translating them to integers After a key has been hashed, the resultant integer can be used to index an array representation Can yield optimal time complexity of O(1) for the required put(...), get(...) and remove(...) operations Most importantly, there is no loss of generality

What is head recursion?

Head recursion arises when the recursive calls occur before the operations. This may seem similar to tail recursion however the order of operations is different; recursive calls are made first; For example: "return n * fib(n-1)" and not "return 1"

What is an implementation?

Implementations - Concrete implementations of interfaces (essentially reusable data structures)

What is an interface?

Interfaces - Abstract data types that represent collections independently of representation

What is a static data structure?

It is a static data structure because its capacity if fixed upon instantiation. This is in constrast dynamic data structures can expand and shrink to account for situations where more / less data must be stored.

What is the ordering of a stack?

LIFO - last in first out FILO - first in last out

What are the advantages of a heap sort?

Low memory consumption In-place sorting requires a constant amount of extra memory (Implementations which sort in-place can be difficult to design) Performance The algorithm has a worst case running time of O(n*log(n)) Easily outperforms quadratic sorting algorithms, despite the fact that it is essentially just a selection sort

What are the possible ways to implement the queue interface in java?

Most queue implementations in java.util.concurrent are bounded All implementations in java.util are unbounded Generally queue implementations do not permit null elements to be stored, though java.util.LinkedList is an exception java.util.LinkedList is another legacy class that has been retrofitted to be consistent with the Java Collections Framework The java.util.concurrent.BlockingQueue class is frequently used in concurrent programming because it has blocking methods

What are the properties of minimum spanning trees?

Multiplicity Several minimum spanning trees of the same weight may exist Each minimum spanning tree of the same weight is equivalent Uniqueness If each edge has a unique weight then there will precisely one minimum spanning tree This property also applies to spanning forests Cycles For any cycle C in a graph G, if the weight of an edge e of C is larger than the weights of other edges of C, this edge can not belong to a MST Min-cost subgraph If weights are non-negative, a minimum spanning tree is the minimum-cost subgraph connecting all vertices This property holds since subgraphs containing cycles have a higher total weight

What are lambada expressions?

Named functions can be passed as arguments Functional programming style

Binary tree terms:

Null links are used where there is no child node A leaf node is a node with no children The links between nodes are known as branches The size of a binary tree is the number of node it contains (maybe 0)

When are operations sufficient and when are operations necessary for a given ADT?

Operations are necessary if no non-empty set of operations could be removed and the needs of the application still meet. Operations are sufficient if they meet all application needs

What are the disadvantages of sequential searching?

Performance In the worst case the algorithm must compare each and every element in a collection against the search element Naivety When the nature of the structure or data being searched would permit greater efficiency, a sequential search does not take a advantage of it

What are the strategies to balance a BST?

Periodically rebalance Modify the insertion and deletion algorithms to maintain balance Avoid BSTs in favour of naturally balanced tree structures

What are the disadvantages of a merge sort?

Relatively high memory consumption In general, merge sort requires O(n) additional space When tackling the less general problem of sorting a linked list, the merge sort algorithm can be implemented such that only O(log(n)) additional space is required Not adaptive The degree to which an input list is already sorted does not impact upon the performance of the merge sort algorithm

Why are packages important?

Sensibly designed packages simplify the engineering of software

What are the advantages of a bubble sort?

Simple The algorithm can be easily understood Adaptive A nearly sorted input list can be sorted relatively efficiently Low memory consumption In-place sorting requires a constant amount of extra memory

What are advantages of an insertion sort?

Simple The algorithm can be easily understood and implemented Adaptive A nearly sorted input list can be sorted relatively efficiently Stable The relative order of list elements with equal keys is maintained Low memory consumption In-place sorting requires a constant amount of extra memory A naive implementation could yield O(n) space efficiency Online A list can be sorted as it is received Useful when a list is to be passed element-by-element

B-tree terminology:

Some of the terminology associated with trees takes on a new meaning in the context of B-trees The size of a B-Tree is the number of elements it contains The empty B-tree has a size of 0, i.e. no nodes or elements The size of an individual node is the number of elements it contains The arity of a B-tree is the maximum number of children that a node can have, i.e., k for a k-ary B-tree

How does In-order Traversal work?

Sometimes know as a symmetric traversal Apply the ordered operations shown below recursively from the root of the tree: 1. Traverse the left subtree 2. Visit the root 3. Traverse the right subtree

How does level-order traversal work?

Sometimes known as a breadth-first traversal Visit every node at a tree depth / level before proceeding to visit the nodes at the next lowest tree depth / level Rarely used by most application programmers Mostly used in numerical / statistical methods to traverse a tree which has been constructed in a very specific way

How does pre-order traveral work?

Sometimes known as a depth-first traversal Apply the ordered operations shown below recursively from the root of the tree: 1. Visit the root 2. Traverse the left subtree 3. Traverse the right subtree

What is tail recursion?

Tail recursion is a special case of recursion characterised by a recursive algorithm making a recursive call as a last step. Tail recursive algorithms can be implemented iteratively. Many modern compilers will perform this transformation transparently in order to reduce stack space consumption. When a non-tail recursive algorithm must be transformed into a tail recursive algorithm an accumulator parameter must often be introduced into the recursion.

What is the set interface in java?

The Set interface defines a Collection that can not contain duplicate elements Accurately models mathematical set abstraction Contains only methods inherited from Collection, though there are some differences: The condition that duplicate elements are not allowed A stronger contract for the behaviour of equals(...)

What is instanceof used for?

The instanceof keyword can be used to test whether an object is of a particular type o instanceof double

How are operations specified?

The set of operations associated with an ADT can be specified by defining the characteristics of each individual operation Name Parameters (including types) Result (including type) Observable behaviour

What are the four tree travesal schemes?

There are four common tree traversal schemes: Pre-order traversal In-order traversal Post-order traversal Level-order traversal

Bellman-ford analysis:

Time efficiency The worse case performance of the Bellman-Ford algorithm for a graph with v vertices and e edges is O(v*e) Space efficiency The worse case space efficiency of the Bellman-Ford algorithm for a graph with v vertices is O(v) Can you see why edges are not involved in the dominant term in the case of space efficiency?

How many attributes can we use to classify algorithms into?

Time efficiency, i.e., computational complexity Space efficiency, i.e., memory usage Recursion, i.e., recursive or non-recursive Stability Adaptability

Explain collection of enums:

Two new collection implementations that are specifically designed for storing type-safe enumerations EnumMap is a hash map for enums EnumSet is a set of storing enums Both implementations are more efficient and compact than their more general equivalents when storing enums

What are the applications of sets?

Typically Sets are used in situations where set membership must be frequently updated and checked User groups Validation of constrained user input Often programmers will use their favourite map implementation in situations where a set implementation would be more appropriate Performance is not degraded Memory consumption may be higher

What circumstances does a linear/sequential search outperform a binary search?

When the list length is less than 8 - can you think why? When there is a good reason to believe that most searches will be associated with elements close to the position where the sequential search will begin

What does "f(n) is Ω(g(n))" mean?

When we say that f(n) is Ω ( g(n)) we are really saying that the growth rate of f(n) is never less than the growth rate of g(n) - big omega

What does "f(n) is O(g(n))" mean?

When we say that f(n) is O(g(n)) we are really saying that the growth rate of f(n) never exceeds that of g(n)

What does canonicalise mean?

canonicalization (sometimes standardization or normalization) is a process for converting data that has more than one possible representation into a "standard", "normal", or canonical form.

Understanding Asymptotic Notations:

f(n) is O(g(n)) implies that f(n) is asymptotically less than or equal to g(n) f(n) is Ω(g(n)) implies that f(n) is asymptotically greater than or equal to g(n) f(n) is Θ(g(n)) implies that f(n) is asymptotically equal to g(n)

Advantages of greedy algorithms:

simplicity The concept of a greedy algorithm is intuitive and well known Complex problems can be solved by making relatively shortsighted yet simple decisions regarding local state Ease of implementation Implementation of greedy algorithms is often more straightforward that alternative approaches Many greedy algorithms share a common structure, meaning that a new greedy algorithm can often be implemented accord to a known template

What operations does a map ADT contain?

void put(Object k, Object v) adds entry <k,v> to the map Object get(Object k) returns the value associated with key object k in the map void remove(Object k) removes the entry associated with key object k in the map Boolean contains(Object k) return True if key object k is a key in the map, false otherwise void overlay(Map m) makes the map equal to the overlay of itself and m Set<Object> getKeys() returns a set of references to all keys stored in the map Set<Object> getValues() returns a set of references to all values in the map Integer size() returns the cardinality of the map Boolean isEqual(Map m) returns True if the map is equal to m, false otherwise Boolean isEmpty() returns True if the map contains no entries, false otherwise

How can you analyse recursive algorithms?

1. Define a piecewise function to represent the running time of the algorithm 2. Determine the value of the defined function for base cases and recursive calls 3. Build up a sequence from small parameters to large parameters 4. Generalise the pattern and evaluate (see book for example of fib seq)

Explain path and cycles:

A path in a graph is a list of nodes such that each node and its successor in the list are connected by an edge in the graph In a directed graph a path must observe the directions of edges A cycle in a graph is a path with the same first and last nodes A cyclic graph is a graph which contains at least one cycle An acyclic graph is on which contains no cycles A simple cycle is a cycle containing unique vertices and edges

How does an edge set representation work for a graph?

A graph can be represented by a node set containing n nodes and an edge set containing e edges This representation is intuitive given the formal definition of a graph To facilitate node deletion we assume that the node set and edge set are both represented by a doubly linked list The memory requirements for the representation are O(n+e) As for all graph representations we will study, the size of the graph can be explicitly maintained and therefore runs in O(1)

What is a graph?

A graph is a pair (V,E) such that: V is a finite set of nodes, called vertices E is a collection of pairs of vertices taken from V, called edges Each node contains an element Each edge connects two nodes An edge may or may not contain an edge attribute

What is connectiviity in relations to graphs?

A graph is connected if there is a path between every pair of vertices A connected component of a graph G is a maximal connected subgraph of G

What is a heap?

A heap is a form of a binary tree known as a complete binary tree A priority queue can be efficiently represented as a heap. A complete binary tree is balanced

Graph terms:

A node may be connected by any number of edges to other nodes If two nodes are directly connected by a single edge then these nodes are said to be neighbours The degree of a node is its number of connecting edges The size of a graph is the number of nodes it contains An empty graph has a size of 0

What is a package?

A package is a group of related types, providing access protection and namespace management "Types" refers to classes and interfaces in this context The types that a part of Java are members of specific packages Classes and interfaces for I/O can be found within the java.io package JCF classes and interfaces can be found within the java.util package

What is java.util.vector?

An alternative to java.util.ArrayList Another array-based implementation, a legacy of Java 1 Subsequently adapted to fit the JCF The default scale factor is 2 Similar but less favourable than java.util.ArrayList

What is a B-tree?

A B-tree is a restricted form of a k-ary tree The associated insertion and deletion operations ensure that the tree remains shallow and well balanced Operations account for both growth and shrinkage of the tree In general, every non-leaf node in a B-tree which contains e elements will have links to e+1 children We usually visualise links and elements of a node as being arranged as link0, element0,..., elemente, linke+1 A leaf node of a B-tree has no children This is true of any tree structure, but is crucial for B-tree operations Each non-leaf node of a k-ary B-tree may have up to k subtrees Each element of a non-leaf node has a left subtree and a right subtree A B-tree is sorted Elements in any node and the elements of that node's subtrees are arranged in ascending order A k-ary B-tree is a tree where: The root node contains at least 1 and at most k-1 elements, whilst other nodes must contain a minimum of (k-1)/2 elements and a maximum of k-1 elements A non-leaf node that contains e elements, such that 1≤e≤k, also has links to precisely e+1 child nodes Each element of a non-leaf node has separate links for its children All leaf nodes have the same depth The elements in each node are arranged in ascending order For each element x in in a non-leaf node, all elements in x's left subtree are less than x, and all elements in x's right subtree are greater than x

divide and conquer algortihms:

A common algorithm design technique with two components Divide - Subproblems of the overall problem are solved recursively Conquer - The solution to the overall problem is formed based upon the solution to many subproblems Traditionally algorithms which make at least two recursive calls are thought of as being divide and conquer algorithms It is generally accepted that subproblems should be disjoint We have seen two classic examples of the divide and conquer approach to algorithm design Quick sort Merge sort We have also seem algorithms that seem to adopt a divide and conquer approach but suffer because it is inappropriate to do so A naive computation of Fibonacci numbers can use two recursive calls on disjoint problem parts, but does it really subdivide the problem?

What are directed acyclic graphs?

A directed acyclic graph (DAG) is a directed graph that has no directed cycles The reachability relation for a DAG forms a partial order Any partial order may be represented by a DAG DAGs may be used to model processes and situations where information flows in a consistent direction through a network of processors

What is a directed graph?

A directed graph is one in which each edge has a direction If e is an edge from node u to node v then: u is the source of e and v is the destination of e e is an in-edge of v and an out-edge of u u is the predecessor of v and v is the successor of u The in-degree of a node is its number of in-edges and the out-degree of a node is its number of out-edges

What is a pre-condition?

A pre-condition is an assertion that must be true in order for a sequence of statements to execute correctly. The key thing to remember about pre-conditions is that they reflect what must be true for correct execution, not what is actually true

What is the queue abstract data type?

A queue is a sequence of elements Queues are characterised by the property that elements can only be added at one end, i.e., the rear, and removed from the other end, i.e., the front The length of a queue is the number of elements it contains An empty queue has a length of 0 The length of a queue may change during execution

Randomised algorithms:

A randomised algorithm is an algorithm which uses a random number to make a decision at least once during its execution The running time of a randomised algorithm depends on both the input to the algorithm and the random numbers occurring The aim of introducing a degree of randomness is to reduce the average-case performance over all possible random numbers The worst-case running time of a well designed randomised algorithm is usually the same as the worst-case running time of an equivalent non-randomised algorithm Main benefit of randomised algorithms is they have no "bad" input, only "bad" random numbers The concept of "bad" random numbers is taken with respect to algorithm input The fact that a randomised algorithm has no "bad" inputs can be useful in situations where the performance an algorithm is consistently impaired by a class of degenerate input It is common for classes of degenerate input be frequently occurring, e.g. a sorted list is input to a sorting algorithm Malicious users may attempt to attack a system using degenerate inputs

What is an recursive algorithm?

A recursive algorithm is an algorithm that calls itself and has a terminating condition. It consists of two parts: base cases and recursive calls. Base cases are normally checked before the recursive calls are made. All possible chains of recursion should converge to a base case. Base cases should not be recursively defined. All recursive calls should progress towards a base case.

What is a search algorithm?

A search algorithm is an algorithm that, given some collection of elements, locates a particular element with specified properties

What are trees and forests?

A tree is an undirected graph T such that T is connected and has no cycles A forest is an undirected graph without cycles The connected components of a forest are trees

What is the Floyd-Warshall Algorithm?

A very versatile algorithm that can be adapted to solve many problems defined over directed graphs Transitive closure Shortest path Inversion of real matrices Given a graph G = <V,E>, the Floyd-Warshall algorithm can be used to compute transitive closure The general idea is that we: Number the vertices 1, 2, 3, ..., N Consider only paths that use vertices 1, 2, 3, ..., k as intermediate vertices The Floyd-Warshall algorithm numbers the vertices of G as v1,..., vn and computes a series of directed graphs G0, G1,..., Gn G0 = G Gn = G* Gk has a directed edge (vi,vj) if G has a directed path from vi to vj with intermediate vertices in the set {v1,..., vk} In phase k of the algorithm, the directed graph Gk is computed using Gk-1

What are search sentinels?

A way to reduce sequential search overhead is to eliminate the need to check the loop index Can be achieved by inserting the search element as a sentinel at the end (or beginning if your iterating backwards) of a list Using a sentinel it is not necessary to check the value of the loop counter against the list length Even if the search element was not in the list to begin with, the loop will terminate when the loop counter reaches n + 1 This strategy is only possible if the list has a currently unused "slot" available the end and the calling method is aware of the sentinel value

What is single processor scheduling?

Almost all scheduling problems are either NP-complete (not covered in CS126) or can be solved by a greedy algorithm We are given jobs j1, j2...jn-1 with known running times t1, t2...tn-1 and a single processor What is the best way to schedule these jobs in order to minimise the average completion time? Assume that we are using a non-preemptive scheduler, i.e. once a job has begun it will run to completion The optimal schedule is arranged shortest job first Can you see how a greedy algorithm would ensure this? We can show that shortest-job-first will always give an optimal solution Let our jobs be ji1, ji2...jiN with times ti1, ti2...tiN The first job finishes in ti1 The second job finishes in ti1 + ti2 Observing this we can calculate the cost C of a given schedule

What is reverse polish notation?

Also know as postfix notation Mathematical notation where operators follow their operands "3 - 4 + 5" becomes "3 4 - 5 +" Given fixed operator arities the notation is parenthesis free Operator precedence is not required for disambiguation Expressions can be evaluated using a stack-based algorithm

What is sequential search?

Also known as a linear search A simple algorithm that steps though a list by considering each element in-turn Compares the target element with the element being considered at each step In the best case the target element is equal to the first element to be considered by the search algorithm In the worst case the target element is not in the list, thus the target element must be compared to each list element

What is the quicksort algorithm?

Also known as a partition-exchange sort , it was originally developed by Tony Hoare in 1962 Employs a simple divide and conquer strategy Divides lists into sub-lists in order to sort Well aligned with modern computer architectures Can exploit memory hierarchies involving small caches

What is an AVL tree?

An AVL tree is a height-balanced BST AVL trees were first proposed by G.M. Adelson-Velskii and E.M. Landis in 1962 Every strictly BST is an AVL tree Not every AVL tree is strictly binary AVL trees are always sufficiently balanced to allow the search, insertion and deletion algorithms to run in O(log(n)) The height-balanced property guarantees that an AVL tree search algorithm will run in O(log(n)) Effectively guarantees that the best-case search efficiency associated with BSTs is always achieved

What is an algorithm?

An algorithm is a step-by-step procedure for solving a stated problem in finite time.

What are the advantages of linked lists?

An arbitrary number of elements can be stored Linked list-based ADT implementations are usually unbounded Many operations are performed by modifying the links between nodes No explicit element shifting

What is an array along with its properties?

An array is a concrete data structure composed of a sequence of indexed components with four properties 1. The length of the array is fixed when the array is created 2. The index of each component is fixed and unique 3. Component indices are sequentially ordered and range from a lower index bound to an upper index bound 4. Components of an array can be accessed in constant time using their indexes

What is an elementary operation?

An elementary operation is one whose execution time is bounded above by a constant depending only upon the implementation used

What properties do arrays have in java?

Arrays in Java have several specific properties 1. The lower and upper index bounds for an array of length n are 0 and (n-1) respectively 2. Inserted elements must be homogenous (though Java does permit heterogeneity by allowing sub-classes to be stored) 3. Arrays are objects that must be instantiated before use 4. An element of array A with index i can be accessed using A[i]

What are the differences between the list ADT and java.util.list interface?

As a quirk of being derived from the Object class, the equality comparison method accepts arbitrary objects as arguments 2. To facilitate interoperability, the concatenation method can accept an instance of any collections class as an argument 3. A number of supplementary methods are provided in java.util.List 4. A variety of iterator types can be created against an instance of java.util.List

Floyd-Warshall Analysis:

As with almost all graph algorithms, the efficiency of the Floyd-Warshall algorithm is dependent upon the adopted graph representation The algorithm has a running time of O(n3) Assumes that the areAdjacent(...) method is O(1) Possible to achieve a time efficiency of O(1) for areAdjacent(...) method by using an adjacency matrix

How does adjacency-matrix representation work for graphs?

Assume that each node is arbitrarily assigned a distinct position number p between 0 and n-1 Represent the node set using an array A of length n, such that the node with position number p is given by A[p] Represent adjacency sets using an array of length n The structure is essentially an n×n matrix If edge e connecting nodes u and v have position p and q respectively then matrix element A[p][q] corresponds to e The memory requirements for the representation are O(n2) Directed graphs A matrix representation is inherently directed If a source and destination have position numbers p and q respectively then put edge e in A[p][q] Undirected graphs If nodes p and q are connected by edge e then put e in A[p][q] and A[q][p] We can lower the memory requirements of the representation by storing edges in the array location where p > q (i.e., using a triangular matrix)

What is an auxilliary array?

Auxiliary storage is all addressable data storage that is not currently in a computer's main storage or memory. Synonyms are external storage and secondary storage.

B-tree deletion analysis:

B-tree deletion initially focused upon finding an appropriate location for element deletion The analysis of the B-tree search algorithm is valid here Approximately log2(n) comparison to search for the deletion location The algorithm then focuses upon performing any node restocking that is required after the insertion (maximum of one per tree depth) Maximum node to restock = log2(n+1) / log2(k) In terms of comparisons or restocking, element deletion is O(log(n))

B-tree applications:

B-trees are commonly used to index relational databases Database may consist of a very large number of tuples, which are usually stored in secondary memory Locating a tuple with a particular key can be very expensive Retrieval times can be improved by indexing The index is a map of <keyT,addressT> pairs, where keyT is the key of tuple T and addressT is the disk address of tuple T We can represent the index for this situation using a high arity B-tree Elements are <keyT,addressT> pairs A B-tree guarantee O(log(n)) search times, regardless of how many elements must be stored Since the index will typically be stored on disk, the tuple search time will, in a practical sense, be dominated by disk accesses If we let each B-tree node occupy a single disk then the number of disk accesses is equal to the number of nodes visited, thus a B-tree can outperform a BST in this situation

Backtracking algorithms:

Backtracking is an approach for solving problems by incrementally constructing partial solution candidates, whilst abandoning partial solution candidates as soon it can be determined that they are not part of a correct solution Often a backtracking algorithm amounts to an advanced implementation of an exhaustive search Implementations can be expensive, but the improvement over a naive implementation can allow otherwise unsolvable problems to be tackled A backtracking approach can be only be used to solve problems exhibiting the partial candidate solution property The test to determine whether a candidate solution could possibly lead to correct solution must be relatively inexpensive with respect to the overall problem The approach enumerates partial candidates that could be completed in many possible ways to form a correct solution To build a correct solution a chosen candidate solution is incrementally improved through a candidate extension step Partial candidates are viewed as the nodes of a search tree, where each candidate is the parent of candidates that differ by a single extension step Backtracking traverses the search tree in depth-first order and, at each node n, decides whether n could be extended towards a solution If it can not be extended towards a solution then the sub-tree rooted at n is pruned Otherwise, proceed by checking for a solution and enumerating sub-trees of n

What are the applications of backtracking?

Backtracking is often used to solve constraint satisfaction problems Crosswords Sudoku Many combinatorial optimisation problems can also be solved or approximatelysolved using backtracking Travelling salesman problem Minimum spanning tree calculation

What is Big-OH notation?

Big-Oh notation is concerned with what happens for large input sizes, hence we select the dominant term and ignore constant factors when expressing the time complexity of an algorithm

What is binary recursion?

Binary recursion is a term that can be used to describe a special case of multiple recursion. Binary recursive algorithms make two recursive calls for every non-base case.

How does parenthesis matching work using a stack?

Brackets are often used to structure expressions and statements An expression is said to be well-bracketed if each opening brackets is correctly matched by a closing bracket We can check for well-balanced expressions using an algorithm The algorithm scans the expression, pushing values onto a stack when an opening bracket is encountered and popping them off the stack when a corresponding right bracket is encountered A well-balanced expression leaves an empty stack at termination

What is breadth-first searching?

Breadth-first search (BFS) is a general technique for traversing a graph A BFS traversal of a graph G: Visits all the vertices and edges of G Determines whether G is connected Computes the connected components of G Computes a spanning forest of G BFS is an uninformed search that expands all neighbours of the current node before proceeding any deeper into the search tree When a node is found to have no more children the neighbours of the previously encountered children are expanded In an iterative implementation a queue is typically used to store all nodes when they are encountered, thus the first encountered children of a current node will be expanded

How does java allow for the definitions of new data types?

Class declarations define new data types. Class constructors and member methods specify the operations of a data type. Instance variables of a class dictate the values and representations of a data type

How does Post-order Traversal work?

Commonly used to evaluate postfix expressions that have been represented using binary trees Apply the ordered operations shown below recursively from the root of the tree: 1. Traverse the left subtree 2. Traverse the right subtree 3. Visit the root

B-tree searching analysis:

Consider a full k-ary b-tree that stores n elements and has a depth of d If we know n we can determine d d = logk(n+1)-1 An element search begins at the root node and visits a child of the current node at each iteration Maximum number of nodes visited = d+1 d+1 = logk(n+1) logk(n+1) = log2(n+1) / log2(k) Assuming binary search of the k-1 elements in each visited node, the number of comparisons at each node is log2(k-1) Approximate maximum comparisons = log2(k-1) * logk(n+1) / log2(k) log2(k-1) * logk(n+1) / log2(k) = log2(n) By repeating the analysis upon a half full k-ary B-tree, the maximum number of nodes visited = log2(n+1) / log2((k+1)/2) log2(n+1) / log2((k+1)/2) = log2(n+1) / (log2(k+1)-1) Approximate maximum comparisons = log2(n) Based upon the analysis performed, we can conclude that the time efficiency of searching a B-tree is O(log(n)) The actual maximum number of comparisons for searching a B-tree is roughly equal to that for searching a balanced BST In a B-tree the maximum number of comparisons does not depend upon the B-tree arity or how full the B-tree is, which means that a B-tree search is guaranteed to be at least as fast as searching a balanced BST

What is a minimum spanning tree?

Consider a weighted graph where the weight associated with each edge reflects how unfavourable that edge is A weight can be assigned to a spanning tree based upon the sum of the weight associated with the edges it contains A minimum spanning tree (MST) is a spanning tree with a weight that is less than or equal to the weight of every other spanning tree Minimum spanning trees are also known as minimum weight trees Any undirected graph has a minimum spanning forest, which is the union of minimum spanning trees of connected components

Explain Dijkstra's algorthm: Edge relaxation

Consider an edge e = (u,z), where u is the vertex most recently added to the search graph and z is not in the search graph The relaxation of edge e updates distance d(z) such that d(z) = minimum(d(z), d(u) + weight(e))

What is depth first search and how does it work?

Depth-first search (DFS) is a general technique for traversing a graph A DFS traversal of a graph G: Visits all the vertices and edges of G Determines whether G is connected Computes the connected components of G Computes a spanning forest of G DFS is an uninformed search that expands the first child node of the search tree that appears, thus progressing deeper and deeper until a target node is found or a node is found to have no children When a node is found to have no children the algorithm backtracks, returning to the most recent node that it has not fully explored In an iterative implementation a stack is typically used to track all nodes whose expansion is currently being explored

What is the merge sort?

Developed by John von Neumann in 1945 Another example of a divide and conquer approach Repeatedly splits an input list into halves until each half contain just a single element, then merges lists to sort Provides very predictable performance A precise bound on the number of comparisons and swaps Does not require random access to data

What is the Bellman-Ford Algorithm?

Developed by Richard Bellman and Lester Ford Jr. The Bellman-Ford algorithm is a shortest path algorithm It permits negatively weighed edges, thus it does not suffer from main disadvantage of Dijkstra's algorithm The price paid to account for negatively weighted edges is an increased time complexity The Bellman-Ford algorithm is only useful when edge weighting could be negative, as Dijkstra's algorithm is more efficient Instead of greedily choosing the vertex with minimum-weight and relaxing it, the algorithm relaxes all edges Edges are relaxed v - 1 times, where v is the number of vertices in the graph The repeated relaxation of edges allows minimum distance values to propagate throughout the graph Iteration i finds all the shortest paths that use i edges

What is prim's algorithm?

Developed independently by Vojtech Jarnik (1930), Robert Prim (1957) and Edsger Dijkstra (1959) Prim's algorithm find a minimum spanning tree for a connected weighted graph It finds a subset of edges that form a tree that includes every vertex, where the total weight of edges in the tree is minimised Prim's algorithm is yet another example of a greedy algorithm Similar in nature to Dijkstra's algorithm The algorithm starts at an arbitrary vertex The size of a search tree is repeatedly increased until it spans all vertices At each iteration an edge that is currently known to have a minimum weight is added to minimum spanning tree Can you see why this is a greedy algorithm?

Why does dijkstra's algorithm work?

Dijkstra's algorithm is a greedy algorithm Suppose that the algorithm didn't find all shortest distances, thus let F be the first wrong vertex the algorithm processed When the previous node, D, on the true shortest path was considered, its distance was correct The edge (D,F) must have been relaxed Thus, as long as d(F) >= d(D), F's distance can't be wrong, thus there is no wrong vertex!

What are the terms in directed graphs?

Directed graphs: source, destination, in-edge, out-edge, predecessor, successor, indegree, out-degree

Dynamic programming algorithms:

Dynamic programming is an algorithm design technique that solves complex problems by breaking them down into simpler problems The approach is applicable to problems that exhibit overlapping sub problems that are only slightly smaller and optimal substructure To differentiate divide and conquer from dynamic programming it is important to observe the "only slightly smaller" caveat When overlapping subproblems are not slightly smaller, e.g., half the size as in merge sort, then we are really talking about a divide and conquer approach Dynamic programming is often said to be "programming with a table instead of recursion" This is quite coarse and ignores the elegance of solving problems based on known solutions to slightly smaller overlapping problems The statement is motivated by dynamic programming algorithms that systematically record computed results in a "table" so they can solve larger problems in the future It is possible to adopt either a top-down or bottom-up approach to the design of a dynamic programming algorithm The difference between these approaches can seem subtle

Divide and conquer advantages:

Ease of solution Complex problems can be tackled by solving and combining simpler cases Natural parallelism Divide and conquer algorithms are often well suited to multiple processor machines Memory access patterns Divide and conquer algorithms often make efficient use of caches because, once a problem has been sufficiently reduced in size, it and its subproblems can be solved within cache

What are the three most common ways to represent graphs?

Edge-set Adjacency-set Adjacency-matrix

What is the difference between a queue and priority queue?

Elements in a priority queue are prioritised

Why may perodic rebalancing may not be ag ood idea?

Even the most efficient algorithms to rebalance an arbitrary BST have a running time of O(n*log(n)) Rebalancing can incur noticeable pauses, which makes it an inappropriate strategy for interactive programs

What kind of ordering do queues have?

FIFO - first in first out LILO - last in last out

What facts does the merge sort rely on?

Fewer steps are required to sort a small list compared to a large list Fewer steps are required to build a sorted list from two sorted lists compared to building a sorted list from two unsorted lists

What is krukal's algorithm?

First published by Joseph Kruskal in 1956 Kruskal's algorithm find a minimum spanning tree for a connected weighted graph It finds a subset of edges that form a tree that includes every vertex, where the total weight of edges in the tree is minimised If a graph is not connected then Kruskal's algorithm finds a minimum spanning forest A minimum spanning tree for each connected component Given a graph G, the algorithm first creates a forest F where each tree is a single vertex and a set S containing all edges of G The smallest edge in S that connects two distinct trees is repeatedly added to F The step inherently avoids cycles! Why? The algorithm terminates when F spans G or when all edges in S have been considered

What is a spanning tree?

For a connected, undirected graph G, a spanning tree of G is a spanning subgraph that is a tree A spanning tree is not unique unless the graph is a tree A single graph may have many spanning trees A spanning forest of a graph is a spanning subgraph that is also a forest

Properties of dijkstra's algorithm?

For efficiency, a priority queue is typically used to store vertices outside the current search graph The key is the distance value The element is the vertex Locator-based methods insert(k,e) returns a locator replaceKey(l,k) changes the key of an item The distance and locator is stored for each vertex

What is the transitive closure?

Given a directed graph G, the transitive closure of G is the directed graph G* such that: G* has the same vertices as G If G has a directed path from u to v, where u and v are not equal, G* has a directed edge from u to v The transitive closure property can be used to provide reachability information for a directed graph We can compute transitive closure using several different approaches We could perform a DFS starting at each vertex Running time of O(n*(n+m)) We could use a dynamic programming algorithm The Floyd-Warshall algorithm

Kruskal's algorithm analysis:

Given a graph with n vertices and m edges: Kruskal's algorithm can be shown to have worst case performance of O(m*log(m)) Due to the nature of Kruskal's algorithm the stated worst case running time can be equivalently expressed as O(m*log(n)) Why are these running times equivalent?

What is the shortest path problem and what is its applications?

Given a weighted graph and two vertices u and v, we want to find a path of minimum total weight between u and v The length of a path is the sum of the weights on its edges The problem has many practical applications Packet routing Flight plans Electronic circuit design

What do greedy algorithms have in common?

Greedy algorithms run in phases In each phase a decision is made that appears to be the best choice available, i.e. a local optimum is selected in each phase Disregards the possible consequences of selecting a local optimum When a greedy algorithm terminates it is hoped that the current local optimum is equal to a global optimum If this is the case then the algorithm is considered correct Otherwise, the algorithm has produced a suboptimal solution In situations where obtaining an optimal solution is not a major concern, greedy algorithms can often be used to provide approximate solutions Greedy algorithms are usually simple to design and implement Algorithms which produce optimal solutions to difficult problems can be complex, tricky to implement and computationally expensive Often greedy algorithms do actually provide optimal solutions Consider the algorithm that you implicitly use when giving change

Which classes provide inefficient positional access?

Implementations of the List interface provide fast positional access Many other classes which implement Collection either do not provide positional access or provide inefficient positional access Map does not provide positional access LinkedList provide inefficient positional access Classes which implement the Collection interface must define the iterator() method The iterator() method returns an object of type Iterator

How do closed-bucket hash tables work?

In a closed-bucket hash table (CBHT) each bucket is isolated from every other bucket Entries occupy the bucket dictated by their hash Collisions mean that some entries with different keys may occupy the same bucket Entries in each bucket are organised using a linked list CBHT is essentially an array of linked lists and a hash function

What are the properties of breadth first search?

In a similar way to a depth-first search: BFS(G,v) visits all the vertices and edges in the connected component of v The edges labelled as DISCOVERED by BFS(G,v) form a spanning tree Ts of the connected component of v For each vertex v in Li: The path of Ts from s to v has i edges Every path from s to v in Gs has at least i edges

What does a heap sort work?

Heap sort improves upon selection sort by using an implicit heap to allow the element with the lowest or highest value to be accessed in O(log(n)) instead of O(n) A heap must be constructed from the initial data set Usually represented implicitly using an array The element with the largest value (max-heap) or the smallest value (min-heap) is removed until no elements remain, thus elements are removed in sorted order Implicit heap structure reduces overall runtime to O(n*log(n)) The only space that is required for element removal is that needed to store the heap structure We can achieve O(1) storage overhead by storing the heap in a part of the array that is yet to be sorted We must store some reference to the region of the array that has not been sorted Heap sort can be efficiently implemented using only those operations which allow us to insert elements into a heap and remove the root of a heap

What is top down dynamic programming?

If a solution can be formulated recursively using the solutions to overlapping subproblems then systematically store the solutions to subproblems Given a problem instance, check the store to see if it is already solved, otherwise we solve it and store the new solution

What is the OBHT displacement strategy?

If an entry cannot be inserted because a bucket is occupied then displace the entry to the next available bucket Treat the array underlying the hash table as being circular Generally a bucket in an OBHT will be in one of three states Never occupied, occupied or previously occupied The previously occupied state may be unnecessary if empty buckets are initialised to a known value, e.g. null in Java

What is the duality rule?

If f(n) is O(g(n)) then g(n) is Ω(f(n)) The duality rule stated above is relatively intuitive when you consider the definition of O(n) and Ω(n)

What are the two broad approaches to dynamic programming?

It is generally applicable to problems that exhibit overlapping subproblems and optimal substructure Every subproblem is a part of some larger overall problem Two broad approaches to dynamic programming Top-down: Reusing the results of sub-calculations in larger problems Bottom-up: Employing a recursive series of simple calculations

B-tree deletion:

In order to insert an element e in a B-tree we must first find the location where the element to be deleted is being stored There are strict constraints regarding the number of elements that any node in a Btree can store A node in a k-ary B-tree should not be empty (and may even have a specified minimum number of elements that can be stored) Deleting an element may violate these storage constraints We must "restock" nodes which would violate these constraints To restock a node n we must consider three cases Case 1: Underflowed root node No elements are contained, thus the node can be discarded Case 2: Underflowed node that has a sibling with a spare element Interchange elements to meet storage constraints Case 3: Underflowed node that must be fused with its nearest sibling The parent node contracts (restocking may need to be recursively applied)

How does B-tree insertion work?

In order to insert an element e into a B-tree we must first find the location where the element must be added There are strict constraints regarding the number of elements that any node in a Btree can store A node in a k-ary B-tree may store a maximum of k-1 elements Inserting an element may violate these storage constraints We must "split" nodes which would violate these constraints To split a node n: Determine n's median element Split the node into two siblings, where l_sib and r_sib take the elements less than and greater than the median respectively (along with their children) Move the median itself, and l_sib and r_sib, up to the parent node. If no parent node then create one. If a parent node already exists then insert median into the parent node (split again if required)

What is the The Heuristic Estimate Function

It is important that h(x) is admissible In the context of an A* search, an admissible heuristic is one that does not overestimate the distance to the goal A commonly adopted h(x) is the "straight-line" distance The straight-line distance to any goal is the shortest possible distance between any two points Straight-line distance is appropriate in many problem domains, including maze-solving and network routing

The binary search algorithm can be implemented iterartly or recusively. What is the difference?

Iterative implementations usually maintain explicit references to the start and end of the current list When these references match the search algorithm terminates Recursive implementations make recursive calls which divide the current list into the required sublists No need to maintain explicit references to the start and end of any list

What should a specification specify?

Its set of values Its applicable operations A specification does not describe a data representation or any specific implementation details, e.g., the algorithms used by operations

Explain type-safe enums:

Java now provides a (relatively) new primitive type called an enum Ensures named values cannot be directly compared to other values Forces programmers to write robust code

What are the properties of a bucket sort?

Keys are used to index an array It is natural that they should be integers Arbitrary objects can not be directly used as keys Possible to translate objects to integers Typical that string keys are translated to integer keys Important to ensure that any translation does not impact upon the time efficiency of the bucket sort algorithm

What is a lexicographic sort?

Let Ci be the comparator that compares two d-tuples tuples by their i-th dimension Let stableSort(S,C) be a stable sorting algorithm that uses comparator C Lexicographic sort sorts a sequence of d-tuples in lexicographic order by executing the stableSort algorithm for each tuple dimension, i.e. one execution for each of the d dimensions Lexicographic sort has a worst case performance of O(d*T(n)), where T(n) is the running time of the stable sorting algorithm

How does a bucket sort work?

Let S be a sequence of n key-element pairs, where all keys are in the range [0,N-1] Bucket sort uses the keys as indices in an auxiliary array Stage 1: Empty sequence S by movie each entry <k,v> into bucket B[k] Stage 2: For i = 0, 1, ..., N-1, move the entries of bucket B[i] to the end of S

How does an adjacency set representation for a graph work?

Maintaining a single large edge set is inefficient An adjacency set contains edges connecting a given node A graph can be represented by a set of nodes and an adjacency set for each of these nodes The node and edge sets can be represented by a doubly and singly linked list respectively (adjacency sets are small) The memory requirements for the representation are O(n+e) Directed graphs The adjacency set of each node contains only out-edges Account for the fact that only out-edges are contained in node / edge addition and removal algorithms Undirected graphs Include each edge in the adjacency sets of the nodes that it connects

What is function call tracking and how does it work? (stack)

Many programming languages use stacks within their runtimes or virtual machines to track function execution A function call results in the creation of a stack frame Stack frames contain local variables, returns addresses etc. Each stack frame is pushed onto a stack upon creation The currently executing function can always be found at the top of the stack Stack frames are pushed onto the stack when functions are called When a function terminates, its stack frame is popped and control is passed to the function whose stack frame is currently at the top of the stack A simple recursive algorithm can be devised to provide the desired functionality

What are the different types of hash functions?

Memory Address Interpreting the memory address of a key as an integer Can be extremely effective, but is weak for numeric keys Integer Cast Interpreting the bits of a key as an integer Good for keys whose length is less than or equal to the number of bits associated with the integer data type, i.e., byte, short etc Component Sum Partition the bits of a keys into fixed length parts and then sum the parts Good for keys whose length is less than or equal to the number of bit associated with the integer data type, i.e. byte, short etc. Polynomial Accumulation Partition the bits of keys into fixed length parts, a0,a1...an-1 and evaluate the polynomial p(z) = a0 + a1z + a2z2 + ... + an-1zn-1 for a fixed z Extremely effective for hashing strings

What are the disadvantages of a heap sort?

Not adaptive A nearly sorted input list will have its order destroyed Inserting a completed reversed input list into a heap will have the same runtime as any other input list Unstable The relative order of list elements with equal keys is not inherently maintained by the algorithm

What is java.util.Collections?

Not to be confused with java.util.Collection Defines a variety of generic methods that operate upon classes which implement Collection All methods in java.util.Collections are static Several methods in java.util.Collections are destructive Take extreme care and always read API documentation

What are the disadvantages of a selection sort?

Performance Best and worst case time complexities are O(n2) Unstable The algorithm is not inherently stable, though it is possible to implement a stable variant using appropriate data structures Not adaptive The degree to which the input list is already sorted is not significant

What are the advantages of a radix sort?

Performance The algorithm has a average case running time of O(n*log(n)) The worst case running time of O(n2) is unlikely to be realised Well aligned with modern computer architectures The excellent utilisation of memory hierarchies can yield significant speedup High adaptability though extension A 3-ways partition can be used to design a quick sort algorithm that is highly adaptive Easily parallelised Possible to achieve near-linear speedup Parallelised quick sort implementations have been shown to outperform many other parallelised sorting algorithms

What are the advantages of the merge sort?

Performance The algorithm has a worst case runtime of O(n*log(n)) Predictable Merge sort makes between 0.5*log(n) and log(n) comparisons per element Merge sort makes between log(n) and 1.5*log(n) swaps per element Stable The relative order of list elements with equal keys is maintained Does not require random access Useful in situations where random access is much more expensive than sequential access A natural choice for when dealing with slow-to-access media Easily parallelised Possible to achieve near-linear speedup Parallelised merge sort algorithms have been shown to outperform other parallelised sorting algorithms (except a good parallelised quick sort)

What are the disadvantages of a bubble sort?

Performance The algorithm has an worst case running time of O(n2) The average case time efficiency is no better than the worst case time efficiency Not well aligned with modern hardware The algorithm yields frequent cache misses and requires more memory accesses than other sorting algorithms

What are the advantages of a binary search?

Performance The binary search algorithm has a worst and average case time efficiency of O(log(n)) The sequential search algorithm has a worst and average case time efficiency of O(n) The logarithmic nature of the binary search algorithm makes it an ideal choice in problem domains where large volumes of data must be efficiently searched

What are the disadvantages of an insertion sort?

Performance The insertion sort algorithm has an worst case running time of O(n2) The average case time efficiency is no better than the worst case time efficiency, though some specialised variants can offer average case improvements Despite these issues, the algorithm is generally more efficient than other quadratic sorting algorithm

What are the disadvantages of a bucket sort?

Performance can be difficult to quantify The worst case performance of the bucket sort algorithm can be appealing Difficult to specify "in practice" performance, as the nature of the input list can have a severe impact Specialised The bucket sort algorithm is less generally applicable that some of the other sorting algorithms that we have considered

What balancing strategies exist for BSTs?

Periodically rebalance Modify the insertion and deletion algorithms to maintain balance Avoid BSTs in favour of naturally balanced tree structures, such those that we will encounter in later lectures

Primialit testing:

Primality testing was one of the earliest problems that randomised algorithms were used to solve The well know randomised algorithm run in polynomial time If the algorithm decides that the given number is prime then we can be certain that it is indeed prime If the algorithm decides that the given number is not prime then there is a very high probability that the number is not prime The algorithm can make mistakes, but through some careful thought we can ensure that the margin for error is negligibly small The algorithm is based on Fermat's Lesser Theorem If P is prime and 0 < A < P then AP-1 ≡ 1(mod P) For example, since 67 is prime then 266 ≡ 1(mod 67) This theorem allows us to check whether 2N-1 ≡ 1(mod N) If 2N-1 !≡ 1(mod N) then N is definitely not prime If 2N-1 ≡ 1(mod N) then N is likely to be prime The check can make mistakes, but it always makes the same mistakes There is a fixed set of N for which it does not work, e.g., the number 341 We want to reduce the margin for error here! Reduce the margin of error by picking 1 < A < N-1 at random However, even then there are specific sets of numbers that can fool the AN-1 ≡ 1(mod N) check We can improve the algorithm using a quadratic probing theorem If P is prime and 0 < X < P then the only solutions to X2 ≡ 1(mod P) are X = 1 and X = P-1 It has been shown that the probability this approach making a error with probability 0.25 If we have 50 independent random trials, the probability of an error is 2-100

Dijkstra analysis:

Priority queue operations Every vertex is inserted once into and removed once from the priority queue Each insertion or removal takes O(log(n)) time The key of a vertex in the priority queue is modified at most degree(w) times Each key change taken O(log(n)) time Combining the complexities we have observed for each constituent operation we can easily find the overall running time of the algorithm For a graph with n vertices and m edges Dijkstra's algorithm runs in O((n+m)*log(n)) Since the graph is assumed to be connected the complexity can be expressed as O(m*log(n))

Dynamic programming advantages:

Problem solving power Dynamic programming algorithms can be used to solve problems that would otherwise be too computationally complex to solve within a reasonable time period A pre-computation phase can be used to ensure the operation efficiency of applications that make use of dynamic programming Affinity with recursion If it is possible to formulate a problem as a recursive mathematical expression then it is often straightforward to see how a dynamic programming algorithm could be employed

What are the applications of a stack?

Program Execution Function calls must be tracked to maintain system state Source Code Parsing Structured languages often require that brackets be matched Web Browsers Recording browsing history to enable efficient backtracking

What is the shortest path properties?

Property 1 A subpath of a shortest path is a shortest path itself Property 2 There is a tree of shortest paths from a start vertex to all other vertices in the graph

What is the queue interface?

Queues typically order elements in a first-in-first-out fashion A priority queue is an exception, as it orders elements by priority In addition to the basic Collection operations, the Queue interface provides specialised insertion, deletion and inspection operations

What is a radix sort?

Radix sort is a specialisation of lexicographic sort that uses bucket-sort as the stable sorting algorithm in each dimension Radix sort is applicable to d-tuples where the keys in each dimension are integers in the range [0,N-1] Radix sort has a worst case performance of O(d*(n+N))

What are sorting algorithms frequently used for?

Reduce the running time of other algorithms, e.g., search algorithms Canonicalise data Generating human-readable output

What are the benefits of using the java collection framework?

Reduces programming effort Provides standard data structures and algorithms so that programmers can focus upon the core functionality of their applications Increases program speed and quality Provides high-performance, well-engineered data structures and algorithms which have been thoroughly tested and widely adopted Facilitates interoperability Well defined interfaces ensure that distinct applications can operate upon structures in a consistent fashion Reduces effort required to use new APIs The consistency of the interfaces provided across the framework reduces the learning required to make immediate use of any collection Reduces effort required to design new APIs API designers and software engineers do not have to reinvent existing ideas or rewrite existing software involving collections Promotes software reuse All data structures and algorithms contained within the collections framework are designed to be reused

What are the terms in undirected graphs?

Undirected graphs: node, edge, degree, size, neighbours, adjacent

What are search algorithms used for?

Search algorithms can be used to guide explorations of logical spaces and structures Solve problems that require prohibitive amounts of computation Fundamental to artificial intelligence, data mining and many other fields

B-tree searching:

Searching a B-Tree for an element e is a very similar process to searching a balanced BST for an element e The interesting consideration that we must make when searching a k-ary B-tree is associated with the k-1 elements stored at each node Possible to use efficient search algorithms within nodes The ordering among elements is explicitly maintained, thus efficient searching is facilitated by the B-tree representation There is a significant advantage of a B-tree when it comes to searching for a given element, though it is somewhat offset by one major disadvantage Advantage The average search path is shorter, as the tree is shallower Disadvantage Each element contained within each node must be searched in order to locate a given element... how efficiently can we do this?

What are the disadvantages of linked lists?

Sequential access to stored elements Linear algorithms to access an arbitrary element Additional memory required for references Negligible for many modern systems, though may be relevant for memory constrained platforms (e.g., mobile devices, embedded systems)

Depth-First Search Analysis

Setting / getting a vertex / edge label takes O(1) time Each vertex is labelled twice (UNEXPLORED and EXPLORED) Each edge is labelled twice (UNEXPLORED and DISCOVERED or BACK) The incidentEdges() method is called once for each vertex Worst case time efficiency of O(n+m) provided that the graph has an adjacency list representation

Analysis of breadth first searching?

Setting / getting a vertex / edge label takes O(1) time Each vertex is labelled twice (UNEXPLORED and EXPLORED) Each edge is labelled twice (UNEXPLORED and DISCOVERED or CROSS) Each vertex is inserted once into a sequence Li The incidentEdges() method is called once for each vertex Worst case time efficiency of O(n+m) provided that the graph has an adjacency list representation

What are the advantages of a selection sort?

Simple The algorithm is intuitive and easily implemented Minimises element swaps The algorithm performs fewer swaps that other quadratic algorithms, so it is useful when there is a cost associated with element swapping Low memory consumption In-place sorting requires a constant amount of extra memory

What are the advantages of sequential searching?

Simple The sequential search algorithm can be easily understood and implemented The algorithm can be easily extended / optimised Widely applicable The algorithm makes no assumptions about the nature of the structure or data being searched

What is bottom up dynamic programming?

Solve the subproblems first and use their solutions to arrive at the solutions to increasingly large subproblems Generally implemented in a tabular form, where an overall problem is solved by iteratively finding the solution to successively bigger subproblems

What is the disadvantage of a binary search?

Sorted data are required The binary search algorithm can only be employed in situations where data / elements are in a sorted order The ordering must be known to the algorithm, as the decision regarding problem subdivision must be correct Possible to sort prior to search, but the combined time efficiency of both operations must be considered

What is a bucket sort?

Sorting algorithm that partitions an auxiliary array into a number of buckets and populates it using the input list Following population of the auxiliary array, each bucket is sorted individually and copied back to the original array Possible to sort individual buckets using a different sorting algorithm or by recursively applying bucket sort Bucket sort is not a comparison sort The traditional n*log(n) lower bound is not relevant

What are the advantages of a bucket sort?

Stable The relative ordering of elements with the same key is inherently maintained by the bucket sort algorithm Adaptive Implementations of the bucket sort algorithm can sort a list that is partially sorted very quickly It is interesting to consider how the bucket sort performs when given a list where very few elements are out of place

What is stability?

Stable sorting algorithms maintain the relative ordering of elements with equal value Consider a list containing elements e1 and e2, where e1 and e2 have the same value and e1 occurs before e2 A stable sort guarantees that e1 will still be before e2 after sorting If elements are guaranteed to have different values, or equal elements are indistinguishable, then stability is not a concern

Divide and conquer limitations:

Stack size limitation Divide and conquer approaches are usually recursive in nature, thus the maximum size of a solvable problem instance may be limited by the maximum possible size of the recursion stack Modern machines and efficient divide and conquer algorithms have overcome this limitation, but it remains an issue in some contexts Shared subproblems Often problems will require the same / overlapping subproblems to be solved, which is seemingly wasteful of computation

Describe the A* search

Starting with an initial node, maintain a priority queue of nodes to be explored (also know as the open set) The lower f(x) is for a node x the higher the priority of x At each step the node with the lowest f(x) is removed from the queue, with the f and h values of neighbours being updated to account for the removal The algorithm continues executing until a goal node has a lower f value than any node in the queue or the queue is empty The f value of the goal can be used to determine the shortest path length

What are strongly connected components?

Strongly connected components are maximal subgraphs such that each vertex can reach all other vertices in the subgraph For a graph with n vertices and m edges, strongly connected components can be computed in O(n+m) time

What properties does A* search have?

The A* search first explores routes which appear to be promising Promising is defined with respect to goal nodes This is a general property of all informed search algorithms A* is not just a "greedy" algorithm, The use of g(x) mean that overall distance travelled is taken into account at each step of the search algorithm

How does an AVL tree deletion work?

The AVL tree deletion algorithm consists of the regular BST deletion algorithm, followed by a local restructuring (restores height-balance wherever necessary) The AVL tree deletion algorithm is similar in principle to the AVL tree insertion algorithm, though there are two key differences: There may be a choice of rotations which would work equally well Multiple rotations may be required in order to restore the high-balanced property for tree nodes The key choice that we must make is how we select the nodes to rotate To pick the nodes to rotate, start with the deleted node's parent v. If v and its ancestors remain height-balanced, there is nothing to do Otherwise: The height-unbalanced node closest to v is known as g, i.e., grandparent The child of g with greatest height is know as p, i.e., parent The child of the parent with greatest height is known as c, i.e., child Apply the rotation algorithm defined for AVL insertion to g, p and c

How does the quicksort algorithm work?

The base case for the recursion relies upon the fact that a list of length 0 or 1 will always be sorted The divide and conquer nature of the approach is clear when the algorithm is expressed in natural language Pick an element p, known as the pivot. Reorder the list such that elements less than p come before p and elements greater than p come after p. Recursively sort the sub-list containing lesser elements and the sub-list containing greater elements

How does an AVL tree insertion work?

The AVL tree insertion algorithm consists of the regular BST insertion algorithm, followed by a local restructuring Restructuring restores height-balance wherever necessary When the regular BST insertion algorithm is performed upon a given AVL tree, a null link is replaced by a new leaf node A number of the new node's ancestors may be height-unbalanced The local restructuring focuses on three nodes The newly inserted node, its parent and its grandparent If we take a newly added node, it's parent and grandparent, where A is the node containing the least element, B is the node containing the median element and C be the node containing the greatest element, there are four cases to consider 1. A is the left child of B, which is the left child of C Move C down to become B's right child and move B's right subtree to become C's left subtree 2. B is the right child of A, which is the left child of C Move B upward such that A is it's left child and C is it's right child, and move B's left and right subtrees to become to the subtrees of A and C respectively 3. C is the right child of B, which is the right child of A The mirror image of case 1 4. B is the left child of C, which is the right child of A The mirror image of case 2 After reviewing case 2 and the example associated with case 3, try to deduce what what would happen in this case To make things a little more concrete we will now look at worked examples which demonstrate two insertion cases Case 1: A is the left child of B, which is the left child of C Case 3: C is the right child of B, which is the right child of A You should try to get a feel for concept rather than focusing upon implementation details Restructuring is quite intuitive but difficult to implement

B-tree insertion analysis

The B-tree insertion algorithm initially focused upon finding an appropriate location for element insertion The analysis of the B-tree search algorithm is valid here Approximately log2(n) comparison to search for the insertion location The algorithm then focuses upon performing any node splitting that is required after the insertion (maximum of one per tree depth) Maximum node to split = log2(n+1) / log2(k) In terms of comparisons or splits, element insertion is O(log(n))

Bellman-ford applications:

The Bellman-Ford algorithm has facilitated tremendous advances in situations where analysing graphs with negative weight is unavoidable A distributed form of Bellman-Ford is famously used in distance-vector protocols for packet switched networks The Routing Information Protocol (RIP) is a good example Simple assumptions and optimisations are often made to improve the scalability and performance of the algorithm

Explain sorting using the JCF:

The JCF provides two different algorithms for sorting: Sorting arrays of primitive types using java.util.Arrays uses a modified quick sort algorithm Sorting arrays of objects using java.util.Arrays uses a modified merge sort Sorting collections uses a modified merge sort Implementations of the List interface can also be sorted using the static sort(...) method in java.util.Collections Uses a modified merge sort algorithm Elements stored in a List implementation must implement the java.lang.Comparable interface in order to be sorted Elements must be "mutually comparable" The compareTo(...) method must support all types that could possibly be stored in the List implementation

What is the list interface in java?

The List interface defines a Collection that is ordered Duplicate elements are permitted In addition to methods inherited from Collection, the List interface contains methods for: Positional access Searching Iteration Range-operation

What is a binary search?

The binary search algorithm locates the position of an element in a sorted list by exploiting the ordering between elements We have already studied the algorithm in the context of binary search trees When given an input list A and a search element t, the binary search algorithm compares the middle element, m, of the input list with t If the m matches t then the position of m is returned If m is greater than t, repeat the binary search on the lower half of the list If m is less than t, repeat the binary search on the upper half of the list

What assumptions does the bellman-ford algorithm rely on?

The correctness of a greedy shortest path algorithm depends upon the optimal substructure of the search graph during execution Dependant upon assumption about non-negative weights The Bellman-Ford algorithm has no reliance upon structural assumptions, but can only be used to find the shortest path in graphs with no negative cycles Bellman-Ford can be used to detect negative cycles We have already stated that the Bellman-Ford algorithm can not be used to find a shortest path in any graph containing a negative cycle An implication of this property is that the algorithm must assume directed edges If the algorithm did not assume directed edges then almost all non-trivial graphs would contain a negative cycle and hence render the algorithm useless

What is dijkstra's algorithm?

The distance of a vertex v from a vertex s is the length of a shortest path between s and v Dijkstra's algorithm computes the distances of all vertices from a given start vertex s For each vertex v we store a label d(v) representing the distance of v from s in the subgraph Repeatedly try to improve upon initially stored distances You can think of the algorithm as building a search graph of vertices beginning with the start vertex and eventually covering all vertices At each step: Add vertex u to the search graph, where u is the vertex outside the search graph with the smallest distance label After adding u, update the distance labels of the vertices adjacent to u

What is the A* Heuristic Function

The distance-plus-cost heuristic function can be viewed as the sum of two functions Path-cost function: The cost of exploration from the starting node to the current node (conventionally denoted g(x)) Heuristic estimate function: The estimated distance to the goal (conventionally denoted h(x)). The heuristic estimate function must be "admissible"

What is the difference between a directed and undirected graph?

The edges of an undirected graph have no direction A directed graph is one in which each edge has a direction

What are the assumptions in dijkstra's algoirthm

The graph is connected All edges are undirected All edge weights are non-negative

Heap properties:

The heap property states that an element at a given position is less than or equal to the elements at its child positions

What assumptions are in place to make a BST always balanced?

The height of a node in a tree is the number of linked nodes that must be followed from that node in order to reach its most remote descendant The height of a tree or subtree is the height of its topmost node A node is height-balanced if the heights of its subtrees differ by one at most If a node were to have only one subtree then that subtree should have only one node A tree is height-balanced if all its nodes are height-balanced We can quantify the notion of height-balance by the balance factor The difference between the height of its left and right subtree A node is height-balanced if and only if its balance factor is -1, 0 or 1

How can we improve the bubble sort?

The improved bubble sort algorithm can detect when a list is already sorted After any single pass of the bubble sort algorithm it is only necessary to sort from just below the first exchange to just after the last exchange Everything that was not exchanged must be correctly ordered By recording the highest and lowest locations where there was an exchange we can reduce computation

What should the input and output of a sorting algorithm be?

The input to a sorting algorithm should be a list The output of a sorting algorithm should be a list which satisfies two conditions: The output is in non-decreasing order ("order" may refer to any total ordering among elements) The output is a permutation of the input

What is the map interface in java?

The interfaces defines three "collection views" that can be used to view the contents of a map Set of keys Collection of values (note that this is not a set of values, as two distinct keys may map to the same value) Set of key-value pairs The "order" of map is the order in which the iterators defined over the map's collection views return elements

Randomised algorithm types:

The randomised quick sort algorithm is guaranteed to yield a correct output, though its running time is a random variable Algorithms of this type are known as Las Vegas algorithms The randomised primality testing algorithm will complete in a fixed time with respect to input size, but has a small probability of error Algorithms of this type are known as Monte Carlo algorithms Any Las Vegas algorithm can be converted to a Monte Carlo thanks to Markov's inequality Any Monte Carlo algorithm can be converted to a Las Vegas algorithm by running the Monte Carlo algorithm until a verifiably correct output is found

What is dynamic programming?

The term was first used by Richard Bellman to describe problems which required successive decisions Bellman later refined this notion to refer to problems where smaller decision problems are nested inside larger decision problems Dynamic programming seeks to solve complex problems by breaking them down into a series of simpler steps The idea is to "build" a result based upon simpler steps

Prims algorithm analysis:

The time efficiency of Prim's algorithm on a graph with n vertices and m edges depends upon the adopted representation Searching for an edge with minimum weight is usually the most computationally expensive operation Adjacency matrix The graph has an adjacency matrix representation This representation is inefficient in this context, as the algorithm has an overall running time of O(n2) Binary heap and adjacency list The graph has an adjacency list representation and uses of a binary heap in order to access an edge with minimum weight It has been shown that this representation has an overall running time of O(m*log(n)) The two suggested representations are extremely common There are more efficient alternatives, e.g., Fibonacci heap

What are the disadvantages of the quicksort?

Unstable The algorithm is not stable Possible to augment the quick sort algorithm to impart stability, but this usually incurs a significant performance penalty Not inherently adaptive The basic quick sort algorithm does not take benefit significantly for the degree to which an input list is already sorted Several extension to the basic algorithm have addressed this issue

ADT priority queue operations:

Void add(Object o) adds object o as an element of the priority queue Object removeLeast() removes and returns the least element of the priority queue Object getLeast() returns the least element of the priority queue without removal Integer size() returns the length of the priority queue Boolean isEmpty() returns True if priority queue stores no elements, False otherwise

What operators does a graph ADT have?

Void addNode(Object o) adds a new node containing object o to the graph Void addEdge(Node n1, Node n2) adds an edge connecting nodes n1 and n2 to the graph Void addEdge(Node n1, Node n2, Object a) adds an edge with attribute object a and connects nodes n1 and n2 to the graph Void removeNode(Node n) removes node n and all of its connecting edges from the graph Void removeEdge(Node n1, Node n2) removes the edge connecting nodes n1 and n2 from the graph Boolean containsEdge(Node n1, Node n2) returns True iff there is an edge connecting node n1 and node n2 in the graph Set<Node> getNodes() returns a set of references to all graph nodes Set<Edge> getEdges() returns a set of references to all graph edges Set<Node> getNeighbours(Node n) returns a set of references to all neighbours of node n in the graph Set<Edge> getConnectingEdges(Node n) returns a set of references to all connecting edges of node n in the graph Integer size() returns the size of the graph Integer degree(Node n) returns the degree of node n

How can we use depth first search for path finding?

We can specialise the depth-first search algorithm to find a path connecting two given vertices u and z Call DFS(G,u) with u as the start vertex Use a stack S to keep track of the path between the start vertex and the current vertex When the destination vertex z is encountered, return the path as the contents of the stack

How can we use depth first search for cycle finding?

We can specialise the depth-first search algorithm to find a simple cycle Use a stack S to keep track of the path between the start vertex and the current vertex As soon as a back edge (v, u) is encountered, return the cycle as the portion of the stack from the top of vertex u

What are the benefits of a tree being balanced?

We have seen that a balanced search tree is an efficient representation for sets and maps Depth increases slowly with tree size The size of a binary tree grows with tree depth in a very predictable way The maximum size of binary tree of depth d is 2d+1-1 For a fixed depth, the maximum size of a tree is greater if we allow each node to have more children In a k-ary tree, each node contains up to k-1 elements and has up to k children The maximum size of a k-ary tree of depth d is kd+1-1 We have seen that a binary tree is essentially a 2-ary tree A 32-ary tree of a small depth could still have a large size

What does "f(n) is Θ(g(n))" mean?

When we say that f(n) is Θ(g(n)) we are really saying that the growth rate of f(n) is the same as the growth rate of g(n)

How does a binary search work in a binary tree?

A binary search can locate an element in an sorted array or any other appropriate sorted structure in O(log2n) At each step the element that is currently being inspected is compared with the target element If the current element equals the target then return True Otherwise take the half of the list that is know to contain the target A binary search tree inherently facilitates a binary search We will study the binary search in greater depth in later lectures

What is a binary search tree?

A binary search tree (BST) is a binary tree containing nodes which contain elements such that: The left subtree of a node n contains only nodes with elements whose value is less than that of the element contained within n The right subtree of a node n contains only nodes with elements whose value is greater than that of the element contained within n A subtree may be empty This definition does not account for duplicate elements Must consistently place "equal" elements in one subtree or the other

What is a binary tree?

A binary tree is a data structure consisting of a group of nodes such that: Each node contains an element Each node contains links to a maximum of two other nodes The tree has a header containing a link to the root node Every node, excluding the root node, is the left or right child of exactly one node

What are the three characteristics of data types?

A domain of values A common data representation for the domain of values A set of applicable operations defined over the domain of values

How does the hash function affect a CBHT?

A hash function should be a constant time operation Typically a combination of inexpensive arithmetic operations A hash function should seek to maximise the probability that distinct keys will be evenly distributed amongst buckets Minimise the number of collisions An uneven distribution can lead to a search complexity of O(n)

What is a hash table?

A hash table H is an array of buckets and a hash function hash(...) that can translate keys to array / bucket indices To insert entry <k,v> into H we assign <k,v> to the bucket at the index given by hash(k) To search for entry <k,v> in H we search the bucket with the index given by hash(k) To remove entry <k,v> from H we remove from the bucket with the index given by hash(k)

Explain OBHT's design process:

A high load factor can devastate performance The worst case O(n) search complexity can very easily be reached The fixed capacity of arrays makes selecting an appropriate number of bucket absolutely essential Expanding an array / hash table is an expensive operation

What is a post-condition?

A post-condition is an assertion that is guaranteed to be true after a sequence of statements has been executed A post-condition can be used to describe the effects of an operation >Computed values >Side-effects of computation In the context of ADT operations, a post-condition can express conditions that will be true immediately following the execution of an operation

What is a priority queue?

A priority queue is a sequence of elements with the property that elements are removed in least-first order If a priority queue contains more that one least element then a removal policy must be adopted Remove the least element that was added first Remove a least element at random A priority queue is often said to have a least-first-out property

What are the characterisitics of abstract data types? (ADTs)

Abstract data types (ADTs) are defined as having: A domain of values A set of applicable operations defined over the domain of values ADTs are not concerned with data representation ADTs may have a data representation but it is private Operations may inspect and modify the private data representation Application code may process ADT values through calls to operations

What operations does a tree have?

Node root() returns the root of the tree or null if the tree is empty Node parent(Node n) returns the parent of node n or null if node n is the root Set<Node> children(Node n) returns a set of references to children of node n Int countChilden(Node n) returns the number of children of node n Void makeRoot(Object o) makes the tree consist only of a root node containing object o Void addChild(Node n, Object o) adds a new node containing object o as child of node n Void remove (Node n) removes node n and its descendants from the tree

What operations do lists have?

Void add(Object o) adds object o as the element after the last element of the list Void add(Object o, Integer i) adds object o as the element at index i of the list Void set(Object o, Integer i) overwrites element at index i of the list with object o Object get(Integer i) returns the element at index i of the list Void remove(Integer i) removes the element at index i of the list Void concat(List l) adds all elements of list l after the last element of the list Boolean isEmpty() returns True if the list stores no elements, False otherwise Boolean isEqual(List l) returns True if element-wise equality can be established between list l and the list, False otherwise Integer size() returns the number of elements currently stored in the list

What operations does a queue have?

Void enqueue(Object o) adds object o as the element at the rear of the queue Object dequeue() removes and returns the element at the front of the queue Object front() returns the element at the front of the queue without removing it Integer size() returns the length of the queue Boolean isEmpty() returns True if the queue stores no elements, False otherwise

What operations does a stack have?

Void push(Object o) adds object o as the top element of the stack Object pop() removes and returns the last element added to the stack Object top() returns the last element added to the stack without removing it Integer size() returns the number of elements currently stored on the stack Boolean isEmpty() returns True if the stack stores no elements, False otherwise

Why is hashing used?

We have seen that a map which uses integers from 0 to n-1 as keys can be efficiently represented by an array A of length n Uses keys to index into A, i.e., A[k] = v for <k,v> Allows many common operations to be performed in O(1) time The requirement that all keys used in the map be integers is too restrictive The generality of the Map ADT depends upon arbitrary keys


Ensembles d'études connexes

International Trade and Finance: Chapter 1

View Set

Biology: Quiz 2: Animal-Like Protists

View Set

تعاريف إدارة المواد

View Set

International Business Transactions

View Set

Micro pretest virol10. Which one of the following statements best describes interferon's suspected mode of action in producing resistance to viral infection? a. It stimulates a cell-mediated immunity b. It stimulates humoral immunity c. Its direct antiogy

View Set