COSC241

Lakukan tugas rumah & ujian kamu dengan baik sekarang menggunakan Quizwiz!

Inorder traversal

(inorder) traverse the left subtree of the root, then visit the root and then (inorder) traverse its right subtree. A B E C D F K G H I J C,B,D,A,G,F,I,H,J,E,K

Postorder traversal

(post order) traverse the left subtree of the root, then (post order) traverse its right subtree and visit the root. A B E C D F K G H I J C,D,B,G,I,J,H,F,K,E,A

Functions growth rate

1, log n, √n, n, n log n, n2, n3, 2n, 3n etc.

Preorder traversal

A B E C D F K G H I J Visit the root then (preorder) traverse its left subtree then (preorder) traverse its right subtree. A, B, C, D, E, F, G, H, I. J, K.

Priority Queue Implementation

A heap is the ideal backing for a priority queue. We just need to do a bit of bundling together of items and priorities. We suppose the priorities are supplied as integers. Priority Queue: http://gyazo.com/3add26cc6c28e2c55ae8ad5610caf7d2 Priority Queue Node: http://gyazo.com/520d24ea9fa1c50a12e790572586f36d

Heapsort (1964)

An in place, comparison based, array sorting algorithm that has guarnteed worst case O(n log n) behaviour. The basic idea: - Organise the elements of the array into a heap structure. - Exchange the first (largest) and last element. - Restore the heap structure (except for the final element) - Repeat last two steps until finished.

'True' Random numbers (bits, integers)

Are generated by obsevation of some unpredictable physical process. This is a slow and computationally expensive process that required special purpose hardware until recently.

Generics

Because type checking occurs at compile time in Java, a key question about collections is: collections of what? In early versions of Java any sort of collection was fine provided it was an object, but this caused casting or silly wrapper classes. As of Java 5.0 generic types are specified with collections. It means we can specify the type of object the collection will hold - access to the collection will return objects of that type, and only objects of that type can be added to the collection.

Generating graphs to test algorithms

Direct creation is painful, one line of code per edge and not practical for graphs with lots of vertices or edges. You CAN read from file, or can do something random (fixed edge probability between any two vertices, or add a certain number of edges at each vertex).

Collection in Jav

IN an object oriented context, it's an object that gathers and organizes other objects, and defines the way in which those elements can be accessed and manged. It specifies and limits the ways in which the user may interact with the collection.

Rotations

One approach to maintaining balance is allowing the tree structure to be modified while preserving its BST chracterisitcs. One family of modifications are called rotations. B A A C AL B AL AR AR C A right rotation, helping to balance out when there are long paths in the left subtree of the left child of the root. We can similarly deal with long paths in the right subtree, but the mixed case is more difficult as it requires to rotations.

Why there is so many sorting algorithms

Simplicity vs Speed, simple can win on small amounts of data. Worst case and average case performance, knowing when it matters.

f(n) = O(g(n))

The Big-Oh Notation, used to describe the worst case scenario of an algorithm, in terms of time or space taken to execute.

What is an algorithm?

The description of some basic steps to achieve a specified result.

Height or Depth of a Tree

The length of the longest path of the tree (which necessarily goes from the root to the tree). A full and balanced K-ary tree with n nodes has depth O(logk n)

Tree Classifications!

The order of a tree is the maximum number of children any node has, if the order of a tree is at most K, then it's called a k-ary tree. If every non-leaf node has the same number of children the tree is called a full one.

Representing

There are many possible data structures to represent a graph, but one of which is we can store it as an array of vertex objects. Each vertex object includes an ArrayList<Vertex> data field which are its neighbours http://gyazo.com/05ba1424ca81b5cc2452ae817334cb47 Methods in Vertex: http://gyazo.com/4e1b945767d37dc305bba84a574e7ba1 Some Methods in Graph: http://gyazo.com/b033209e38d88598318920123449949c

Operations on a BST

There are three fundamental operations on BSTs: Search, Add and Delete. The interface! public interface BinarySearchTree<T extends Comparable<T>> { public boolean search(T element) public void add(T element) public void remove(T element) Search: http://gyazo.com/c643599aa33dae0c64ad2e381845752d Add: http://gyazo.com/85adb999360e0e5565ed85cdbc888ce9 Remove: http://gyazo.com/fc029c16f61aad55ee6e27a707d8b544

Heap Algorithms

We add an item by placing it at the first vacant leaf position and letting it float up the branch towards the root so long as it is larger than its parent. Returning the max value is trivial, as that is always the root. Removing the max value is interesting.

Comparable interface

We can import the comparable interface to define an order for user defined types, which makes it useful for sorting etc. To implement the comparable interface you must include a compareTo method that returns a negative if a is less than b, a positive if a is more than b and 0 if they are equal.

Better Insertion Sort

When we inset the element from position i, previous elements are sorted and those greater than a[i] need to be moved forward. So read backwards from position i, moving elements forward until the necessay hole opens up in which to place a[i], which we need to store in advance since it will be written over in the first step. If a is nearly sorted this works really well and performance is more like O(n) than O(n2).

Binary Search

When you check the midpoint, which either finds the value or gives us a new range to look through, which is half the size. You repeat that recursively. The complexity here is O(log n).

Linear Search

When you know nothing about the order in which the values are stored, you have to use linear search, in which you inspect the actual values from left to right - 1, and if one matches the value you're looking for then return its index. In the worst case, not found, you could search through every element O(n)

Heaps implementations

You could use a linked binary search tree to implement a heap. To facilitate both upwards and downwards navigation, we shall probably want to maintain links to parent nodes as well as children. To determine where the next item is to be added, and where the last item is for remova, we probably want to have a datafield for the last item. OR we could use an array. We could use the index trick (the children of an element at index i are at 2i + 1 and 2i + 2 respectively) Navigation is then trivial if we keep track of the current size, and the position of the last item (and the next insertion point) is also known. Only drawback is the need to resize if the heap gets too large. If using an array, you would need a method to expand the capacity if/when needed. A method to swap the values of two positions, and a method to find the index of the larger child or tell us there isn't one.

APIs

application programming interfaces, which much of the Java library is organized into. An API is typically a collection of ADTs (interfaces) together with some specific data structure to implement them. One such is the Java Collections API, which represents specific types of collections. We should bother to learn beyond the API though, as the collection type we want might not be there, or we may have a special implementation is mind because of special conditions. We need to understand the issues involved.

Array Insertion

done by copying all elements from the index position you want to insert at, one place to the right, and then inserting the value into the now empty index. In the worst case, if the index is at the beginning of the array, you are doing n operations so this is O(n).

Singly Linked List toString

http://gyazo.com/70113a081d3a2679e226e0c680108b23 uses the idea of list traversal, starting at the first node and then while that node's not null, doing something and setting the node to the next node. We can add an iterator that allows us to do this generally.

Queue as Singly Linked list

http://gyazo.com/d6b2e17001127965f4dab8440ca084ba a singly linked list seems quite an obvious match for representing the queue ADT in a data structure. By adding a reference to the last element you can make sure that both the enqueue and the dequeue operations are O(1).

The Heap ADT

public interface Heap<T extends Comparable<T>> { public void add(T element) public T get(); public T remove();

Stack Interface:

public void push (T element) public T pop(); public T peek(); public boolean isEmpty(); public int size(); public String toString();

Abstract Data Types (ADTs)

An ADT is a data type whose intended use is specified by its interface. A data structure is the collection of programming structures (methods, data fields) used to implement an ADT (or collection) The data structure that implements an ADT can be changed without changing the interface, and therefore in a way that does not affect any client programs. Concerns of efficiency come about in evaluating which data structures to use when implementing an ADT, and will generally not be one-sided, as there are costs and benefits associated with a particular choice of implementation.

Recursion in trees

A descendant of a node is either one of its children or the descendant of one of its children. An ancestory of a node is either its parent or an ancestor of its parent. A tree is either a single node, or a root node together with a collection of trees whose roots are its children.

Binary Search Tree

A BST is a binry tree whose nodes contain elements of some ordered type. If the value stored at a node is V, then all nodes in its left subtree store values smaller than V and all nodes in its right subtree store values larger than V. So, if we follow a path each time we pass to a left child the value will go down, and each time we pass to a right child it will go up.

Heaps (ADT)

A binary tree is complete if it is full and balanced and all of its leaves are as far left as possible. A heap is a binary tree in which every element is greater than or equal to both of its children. The operations that a heap should support are: adding an item, returning the maximum value in the heap and removing the maximum value from the heap.

Graphs

A graph consists of a set of vertices connected by edges. There is at most one edge beteween any two vertices, and the two endpoints of an edge are distinct (no loops). edges are symmetrical (i.e. two way streets) and two vertices are neighbours if there is an edge between them. http://gyazo.com/c56228aeb319d1a0e42a5a9dba0ac14d

Stack ADT

A last in, first out collection. An analogy is a stack of papers to process, or a stack of dishes to wash. The basic operations are push: which adds an item to the stack, and pop: which removes and returns an item from the stack. These are also usually extended by allowing Peek, which examines the top item of the stack and convenience methods to return the size and an emptiness test.

Simple linked list

A linked list consists of a sequence of nodes connected via links. Each node, except the last has a successor and each node except the first has a predecessor. Each node contains a single element, and links (i.e references) to its successor and/or predecessor. The key difference with an arrray is that by manipulating the lunks we can change the structure of the list - not just the values it stores. A linked list is a dynamic data structure.

Abstraction

A method of hiding certain details at certain times. As a result, the user can focus on mre important issues and not be concerned with the messy details. Often, abstractions focus on the interface to a process or structure rather than the structure itself. For collections, abstractions provide a powerful method of ensuring consistent and clean access and manipulation without having to worry about irrelevant details.

Tree Paths

A path in a tree is a sequence of nodes where each node in the sequence is a child of the preceding node. The length of a path is the number of edges in it (so one less than the number of nodes). For every node in the tree there is a unique path from the root to that node; and the nodes lying on this path are called its ancestors. The level of a node is the length of the path from the root to the node, so the root is at level 0. Nodes that can be reached from a node by following a path starting from it are called descendents. A subtree consists of a node together with all of its descendents and the edges that connect them.

Queue ADT

A queue is an abstract data type that represents item processing in a first in, first out manner. The basic operations are like that of a stack, except that the "remove" and "add" methods (pop and push for stacks) operate on opposite ends of the data instead of the same end. This is a useful abstraction for the breadth first search, in which the order you visit is the order you add, as well as event or order processing and process scheduling + simulation. The priority queue is a common extension. Essentially, in the stack you remove AND add things from/to the start, whereas in a queue you remove something from the hbeginning and add things to the end. Event or order processing is when you want to process things in the order they arrive, and the same thing for process scheduling, you have a bunch of processes that need to be done in order, so you add them to a queue to make surethe necessary pre-processes are done first.

Common things to look for when assessing growth rate

A simple loop from 0 to n (with no internal loops) constitutes O(n) complexity i.e. linear complexity. A nested loop of the same time constitutes O(n2) complexity; i.e. quadratic complexity. A loop in which the controlling parameter is divided by two at each step (and which terminates when it reaches 1) gives O(log n) (logarithmic complexity). The divide and conquer paradigm (later!) which breaks the problem into two instances of size n/2 which must be combined in linear time gives O(n log n).

Dynamic vs Static Data Structure

A static data structure is one whose memory is dixed at the time it is created and cannot be changed, for instance with arrays we cannot change the structure, only its contents. A dynamic data structure is one whose structure CAN change. For instance, with linked lists by changing the links we can change the structure of the list. We sometimes use static structures because they tend to be more efficient, in the background of apparently of apparently dynamic structures (for example using an array to model a stack). Generally, this requires finesse to cope with the clash between the static and dynamic requirements (e.g. when the stack grows beyond the capacity of its backing array).

Tree traversals

A traversal of a data structure is visiting all of its elements in some order or more generally a "visit" to each element in order to take some action. There are three recursive tree traversals: preorder, inorder and postorder. There is also level-order (non-recursive) in which the nodes at each level of accessed from left to right.

A balanced tree

A tree is called balanced when all of it's trees are within one level of one another.

Implementation of the Stack ADT

Cna use an array to keep track of the stack contents, and a variable count to keep track of the size of the stack. Problems include that you have to declare array size and it cannot change, this can be combated by using ArrayList or copying the the contents of the array over to a new array of double the size, once it gets full.

Heap code

Constructors and data fields: http://gyazo.com/9d85873f1f4415cbf3ffe48fc3856b7d Get, expand, swap: http://gyazo.com/fa3cb1e40725a64c837ba6273f7070ce Larger Child Index: http://gyazo.com/db1423069c4fcead32794ff4ee294a1f Add: http://gyazo.com/da0be22e0809633d64fe35e604249f38 Remove: http://gyazo.com/4450f96d1711a134928a6f8ae2b4a27c

Enlargement problems

If we forget to enlarge the array storing a circular queue when the capacity is exceeded, we mess up completely. There are two possibilities of recovery: impose a hard limit on the size of the queue and throw an exception when we overrun (want a constructor that sets size to something other than default), or bite the bullet and write the enlarge method for the array.

Selection Sort

Find the smallest item in the list and swap it with the first item, then find the smallest item in the remainder of the list and swap it with the first item of the remainder. Basically using recursion to create subarrays. Time complexity is O(n2)

Performance analysis of the Stack implementation

For space complexity, we intially allocate O(1) space because of the default capacity, then later when we expand capacity we never have more than twice as much available as needed, so that's O(n) where n is the maximum number of items we ever include in the stack. Time for pop, peek, isEmpty are all clearly O(1), and the push operation is also O(1), except when we need to expand the stack. Although it seems unfair to associate the cost of stack expansion to the single item that caused it, we do have to say that the push operation can be O(n). However, we can practically think of the cost of expanasion as being spread over all elements present when it happens. Since there are n elements present, and the expansion is O(n) is means in an amortized sense the push operation is still O(1). The peek and pop operations need to do somethong on empty stacks, so one solution would be to have them return null in these cases, which is actually not bad. But in the spirit of ADTs these operations should produce exceptions and the user should use a try catch. Each ADT tends to come with its own subclasses of exception to describe the exceptions it might generate.

References as links

In Java, a reference refers to a memory location in which the complete data for an object is stored. In other languages are called pointers and a clear distinction is drawn between objects and references to objects. A reference to an oject of the same or closely related type is often called a link, and a collection of linked objects is called a linked data structure.

Priority Queues

In a priority queue, each element is added with an associated priority. When an element is removed, it's the element with the highest priority, and if more than one shares the highest priority it should be the earliest arrival that is removed. So if all elements have the same priority then it behaves like a normal queue, while if elements are added in strictly an increasing order of priority then it behaves like a stack.

Doubly linked lists

In a singly linked list we can accept the next item directly via it's reference (O(n1)) and it's preceding item by sneaking up on it via list traversal (O(n)). In a doubly linked list you aim to make both O(1) by keeping a second reference. Just two intertwined singly linked lists, so most of the code is reuseable and modifiable. Bit tricky to get working however, as modifying the prev reference of one node means that the next reference of the node it points to must be changed as well.

Types of Collection

In java, there are linear and non-linear collections. In a linear collection, the organization is in a straight line and we have natural notions of precedence and indexing (such as arrays) . In a non-linear collection the organization might be more complex (like in a tree or network) or not present, The way in which elements are organized is usually determined by the sequence in which they are added to the collection as well as some other inherent relationship such as ordering.

The difference between top-down and bottom-up

In the top down version, where elements float up the heap, the elements from larger part of the heap float farthest. In particular, each element at the bottom level, which makes up half the heap, might need to float to the top (log n away) requiring O(n log n) steps. In the bottom up version, the elements in the larger levels are sinking down, and have a shorter distance to travel. In fact, at most, n/2i elements need to sink a distance of i, so the total number of steps = O(n).

Singly Linked List

In which each node stored only one link to the next element in the list, with the actual list object containing a reference to the first node in the list, or a null reference if the list is empty.

Queue as Circular Array

It would be sensible to use a queue as an array if we knew our queue would never grow beyond a fixed size. One approach is to dequeue by returning the element at position 0 and moving everything else down one space, but that makes dequeue O(n). Can we arrange for both dequeue and enqueue to be O(1)??? If we think of the array elements as being arranged on a circle, then we need only keep track of 'first' and 'last' indices. Enqueue involves storing at the last index and incrementing it, whereas Dequeue involves returning the element at the first index and incrementing it. We need to remember to wrap around at the end. http://gyazo.com/d48cf44416aa755a2a55190327f10de1

Binary Tree

One in which every node as at most two children. A full and balanced Binary tree of depth d has at least 2d and at most 2d +1 -1 nodes. A non-full of poorly balanced binary tree could have as few as D + 1 nodes (where every node has a single child). Keeping them as full and balanced as possible will be important, as we search them and follow paths for efficiency.

Implementation of a Tree

Many possibilities! Could store nodes in an array with pre-computed indices. For instance, the root could go at index 0 and then the children of a node at index 2k + 1 and 2k + 2 (this just works, although wastes a lot of space) Alternative array storage simply puts the nodes in an array as needed but includes in their structure the indices of their children (if any) and possiblt parent's index as well. The obvious, recursive representation is that each node contains references i.e. links to its children and possibily parent. These links could be to the left and right subtrees rooted at the children.

Barabasi- Albert

Models a network growing in time by the addition of vertices. Vertices exhibit preferential attachment - i.e. they are more likely to become neighbours of vertices that already have relatively high degree. Formally, each new node connects to an existing node with probability proportional to the current degree of that node divided by the sum of the degrees of all nodes to this point. This gives a power law distribution to the degrees (the number of vertices of degree k is proportional to k -3 in the most basic model) and a small world model. However, the clustering coefficient is relatively low.

Pseudo-random numbers

Numbers generated as a sequence by a specific deterministic mathematical algorithm. Although they appear unpredictable when observed, if the initial seed is known as well as the algorithm, you can predict them 100% accurately. So very fast, but not actually random at all.In java there are three ways to access pseudo randomness, through Math.random() which returns a double value greater than or equal to 0.0 and less than 1.0 java.util.Random class is used to generate a stream of pseudorandom numbers using a 48bit seed. The java.security.SecureRandom class provides a cryptographically strong random number generator.

Iterators

One of the useful features of many classes in the collections framework is that they can be the targets 'foreach' statements such as for(String s : ArrayDeque<String> titles) ... any class can do this providing it implements the iterable interface, which in turn requires an iterator() method which returns an object...that implements the iterator interface. Iterators must support three methods: hasNext() returns a boolean indicating whether there is anything more to return, next() which returns an object from the iterator and remove() which removes the last item returned (although frequently throws an unsupported operation exception.) These are often defined anonymously, i.e., the code that defines the behaviour of the iterator is given directly where the iterator is constructed (not as a seperate, named type). Iterators over dynamic data structures are generally allowed and even expected to behave unpredictably if the structure is modified whilst the iterator is active.

Why random graphs?

Original motivation is theoretical: if for some model of a random graph the probability of X being true is strictly positive then there must be graphs for which X is true. There is a surprisingly large collection of interesting properties X for which this is still the only known way to prove existence. The CURRENT motivation is pragmatic - an enourmous number of situations have an underlying graph or network. Collecting real and complete data about these graphs is very expensive, so random networks can be used as simulation. In that context it's important that the characteristics of the simulation match those of the real data.

Insertion sort

Process the array one element at a time. Insert each new element into its correct position among the previously sorted elements. Time required is O(n2) as we always search up until the position we are currently at and then insert from there until the end of the current subarray/

Erdos-Renyi (1959)

Says a graph is built based on two parameters: the number of vertices (n) and the probability of two vertices being neighbours (p) Each possible edge is considered independently and included with probability P. A very useful model for theory athough not so applicable in practice. A slight variation is to take the second parameter to be the number of edges E and to take the set of edges to be a random subset of size E from among the (n 2) possible edges.

The complexity of choose a card method

Since we might need to loop at every card in the deck, to pick the last one, then the complexity is at worst O(n).

Organizing the heap for a heapsort algorithm

There are two choices, top down or bottom up. The first mimics the algorithm from the previous lecture, effectively treating a growing initial segment of the array as a heap and adding one element at a time, letting it float as high as necessary. The second imagines the tree structure already in place over the whole array, and fixes violations of the heap property beginning from the lowest non-leaf nodes and moving upwards. The first is easier to conceptualize but is O(n log n) The second is actually O(n)

Doing the sort!

Sort: http://gyazo.com/b4169cad5e683bc1b256db078e321f57 Heapifying: http://gyazo.com/7ff04d1ae6762c962088bf1ce37fc797 Sift down: http://gyazo.com/00db22525e5333d29ac012d11abf2fa2

Using a stack to sort

Stacks can be used to (partially) sort an incoming stream of objects, the algorithm being as follows: compare the next incoming item with the top of the stack, if it is smaller push it onto the stack, otherwise pop the stack until the top is larger than the input (or is empty) and then push it onto the stack. When all the input has been processed, pop each remaining item off the stack.

Distance!

The distance between two vertices in a graph is the smallest number of edges required to get from one to the other. If it's impossible, just say distance is -1 or sometihng along those lines. How can we compute efficiently the distance in a graph betwen some fixed vertex (home) and every other vertex? Distance: http://gyazo.com/cfcc5ea129f421061924660e7a211fa1 Algorithm: d[home] 0 enqueue(home) while queue is not empty do v dequeue(); for w from neighbours(v) do if d[w] = "not seen" then d[w] d[v] + 1, enqueue(w) end if end for end while http://gyazo.com/76d85562cce1a70c1cb9f8b39c28cc3f - actual code

Performance issues

The main factor in this is its balance or lack of (the range of depths of its nodes) The wosrst case of a single branch (average depth n/2) and best case of full balanced tree (max depth log2 n approx)

Removing the Maximum in a heap

The maximum is always the root of the tree, so finding it is never an issue. The issue is reconstructing the tree after removing that value whilst maintaining the heap property after the root element is removed. The key idea is in some sense the reverse of addition. Answer: Replace the root value with the value of the last leaf, and then re-establish the heap property by exchanging the root and the larger of its children and do this recursively downwards.

Usefulness of a Heap

There are two main uses for a heap: They can be used for priority queues, a data structure in which the item with the highest priority is always the next one processed. Or as part of the heap sort algorithm which sorts data by adding it item by item to a heap and then simply removing from the heap until nothing is left.

Watts-Strogatz (1998)

Three parameters: N the number of vertices, K the degree of each vertex (assumed even) and B with 0 < B < 1, a magic number. He said to start with a circular graph in which every vertex is a neighbour of the immediately preceding K /2 vertices, and the immediately following K /2 vertices. For each edge (i, j) with i < j, replace it with an edge (i, k) with probability B subject to ensuring there are no loops or duplicated edges. This produces graphs with much higher clustering coefficient than in the ER model, for the same average degree, while relatively large choice of B still ensures that the distance between vertices tends to be small (this is the small world phenomenon, frequently observed in complex networks) ... the degree distribution is peaked sharply at K though which is not common in complex networks.

Recursion

To say that a method uses recursion is to say that it is defined in terms of itself, and that it calls itself until it hits a basecase. One benefit of a recursion is it is a simpler and more elegant way of implementing naturally recursive data structures such as trees. However, a disadvantage is that it is less efficient than the iterative approach in terms of memory and runtime, as the same method is on the call stack multiple times.

Trees

Whilst Stacks queues and lists are all linear data structures, with a left to right ordering of elements, a tree is a complex non-linear data structure in which elements are arranged in a hierarchy. A tree consists of a set of nodes and edges that connect nodes to one another. Each node has a particular level and there is a single node at the top level called the root of the tree. Each node except the root has a single parent which lies one level higher in the tree. A node may have multiple children, although a node without any children is called a leaf. If two nodes have a common parent, they are called siblings, and a node that is neither a leaf nor the root is called an internal node.


Set pelajaran terkait

BI 233- Lecture exam 2 digestive system and metabolism

View Set

Ch. 4: Introduction to Sequential Circuits

View Set

New Media Business Midterm Review(CH 1-7) Ira Rudowsky

View Set

Embedded Systems Interview Questions

View Set

GOHS Level Gov - Final Exam Spring 22

View Set