Data Structures & Algorithms

¡Supera tus tareas y exámenes ahora con Quizwiz!

Understanding the specifics of a data structure:

1. Concept problem that it is targeting / key property. 2. Common operations that it would provide. 3. How operated data is organized and manipulated in memory. 4. How efficient are its common operations (asymptotic analysis).

traversing a linked list might look something like this:

1. Create a pointer to the head 2. While we have not reached the end 3. Visit the data 4. Store the next node's pointer

Priority Queue Implementation: Binary min heaps

A min heap is a tree where: Each node's key is equal or less than its children keys Every subtree is a min heap.

Priority Queue Implementation: Definition

A priority queue is a data structure that stores a sequence of elements. is a kind of queue that is fundamentally designed around this idea, with every object having an associated priority that determines its importance.

Binary Search Trees Removing keys: Algorithm:

Algorithm: • Lookup the node "r" with the key to remove. • If "r" is a leaf => remove it and done. • If r has a unique child that can take r's position => Switch pointers around. • Otherwise, "r" is in an inconvenient place => Rearrange Tree

Two approaches to handle collisions:

Allow more than one key to reside in the same place Separate chaining Store keys in an alternative location Open Addressing

N-ary Trees: Definition

An N-ary tree of order N is either: • empty • a root node, along with exactly N disjoint subtrees, each of which is an N-ary tree of order N

How can we actually implement a tree?

Approach: • Each node is an object, and they are linked by pointers. Two main approaches 1. The parent pointer implementation 2. The list of children implementation

Binary Search Trees Inserting keys: Asymptotic Analysis of Insertion

Asymptotic Analysis of Insertion It takes a lookup operation + node insertion • Degenerate Tree [O(n) time]+[O(1) time] = O(n) time • Perfect Binary Tree [O(log n) time]+[O(1) time] = O(log n) time

Binary Search Trees Removing keys: Asymptotic Analysis

Asymptotic Analysis of Removal It takes a lookup operation + node Removal • Degenerate Tree [O(n) time]+[O(1) time] = O(n) time • Perfect Binary Tree [O(log n) time]+[O(1) time] = O(log n) time

The definition of binary search tree does not restrict shape

Binary search trees containing the keys 1, 2, 3, 4, 5, 6, and 7

Search structures

Data structures oriented to searches • Array-based structures (array and std::vectors) • Linked lists (any variant) • Binary search trees, AVL trees, and skip lists.

Binary Search Trees

Definition: A binary search tree is a binary tree in which every (internal) node stores a unique key. For every node n containing a key k: • All of the nodes in n's left subtree have keys smaller than k • All of the nodes in n's right subtree have keys larger than k Challenge: We are responsible for keeping this internal order in the tree.

Binary trees

Definition: A binary tree is an N-ary tree of order 2, in which one of the two subtrees of each internal node is the left subtree and the other is the right subtree

Handling *Collisions* The problem

Definition: There is a collision involving the keys k1 and k2 when both k1 and k2 are determined to belong at the same index

A look closer at Degenerate Tree

Degenerate shapes are natural constructed for sorted sequences It is a common scenario in real-world programs.

Priority Queue Implementation: std::vector: Structure

Each element is a pair (element, priority)

Priority Queue Implementation: Linked List: Structure

Each node is a pair (element, priority).

Priority Queue Implementation: If linked list unsorted

Enqueue = Θ(1) time could now simply add a new node to the head of the list, which would take Θ(1) time findMin and dequeueMin = Θ(n) time would now have to search the list, looking for the element with the smallest priority. Unless you knew a bound on the lowest possible priority, you would always have to look at every element — because no matter how small of a priority value you find, you might yet find a smaller one if you keep looking — so, in that case, this would take Θ(n) time

Implementing General trees 2. The list of children implementation.

Every node keeps track of where its children are Each node consist of both • a data element. • a "collection of pointers" to its children (array, vector, or linked list.).

Hash Function Design Goals: We look for two qualities:

Fast to compute. It needs to be called in every common operation. It should not depend on the number of keys in the table (n). It spreads the keys evenly in the hash table. It maintains an even number of elements in each linked list

Handling Collisions: Open Addressing: Approach

Find an alternative index to store the key

Implementing General trees 1. The parent pointer implementation: Is not efficient for:

Finding a node's children • Θ(n) time In general, moving downward.

The parent pointer implementation: Efficient for:

Finding a node's parent [Θ(1) time]. • determining if two nodes are siblings [Θ(1) time]. • Finding the root of the tree [Θ(h) time, being h the height of the tree].

Binary Search Tree: Asymptotic analysis of lookup:

For a binary search tree of height h, we would say that this takes *O(h) time*, because we might have to follow the longest path in the tree — the length of which is, by definition, the tree's height — and because we might not have to get all the way to a leaf (or follow the path that's the longest)

Binary trees: Background: Goal

Given their structure properties, use Binary trees to simplify the implementation of solutions

Why do we need some kind of priority in a queue?

Good solutions may not treat all objects equally. Not all objects are equally important Not all object need the same effort/time to process

Handling Collisions Separate Chaining: Analysis:

Hash functions runs in Θ(1) time Bottle neck => Linked list Approaches • Efficiency of hashing/indexing • Capacity of the hash table (re-hashing).

Aiming for a Perfect Binary Search Tree: Goal

Have a Perfect Binary Search Tree: Reason: Performance • Lookup operation = Θ(log n) time. Insert a node • Θ(log n) time.

Main goal is to visit each node in the tree exactly once in some kind of order. • Challenge: Parent may have multiple children, or none at all Questions to ask:

How do we know which way to go when traversing? What do we do when we hit a dead end? How do we avoid visiting the same node twice?

Priority Queue Implementation: std::vector: Asymptotic analysis

If vector sorted by priority. • findMin and dequeueMin = Θ(1) time. • Enqueue = O(n) time. If array unsorted. • Enqueue = Θ(1) time. • findMin and dequeueMin = Θ(n) time.

Why would you need to restrict the shape of a tree?

Improving performance of searches. Improving performance of rearrangements.

Handling Collisions Open Addressing: Linear probing

Insertion algorithm: Hash the key => get the index i with % If index i is empty => Insert the key into i Otherwise, probe backward to i-1, i-2, and so on. If we reach index 0 => wrap around, stating from the index s.

Handling Collisions: Open Addressing

It allows keys to be stored somewhere other than where the hash function says they belong

Handling Collisions: Separate Chaining

It allows more than one key to be stored at the same index in the array Approach • Each cell in the array will be a singly-linked list • It has no size limit

Handling Collisions: Open Addressing: Advantages over Separate Chaining:

It does not use dynamic memory allocation. It actually uses less memory, as we don't need linked lists.

Copy Elision

It is a compiler technique to eliminate some unnecessary operations in the lower level translation of the code.

Implementing General trees 2. The list of children implementation: Which data structure would you choose to store pointers to children?: A linked list.

It would be the least wasteful in terms of memory. it would be more expensive to iterate than arrays or vectors (scattered memory allocation) *still one of the best in some ways, the least wasteful, in terms of memory, since only the actual child pointers would ever be stored. However, it would be more expensive to iterate than arrays or vectors — not from an asymptotic perspective, but from a practical one, because accessing memory near other memory that was recently accessed can usually be much faster than accessing memory spread all over the place, due to the effects of caching.

We can restrict the shape of our tree by

Limiting the number of each node's children (N-ary trees) Limiting the height of subtrees.

Performance of binary search trees vary based on the shape

Look up stake O(h) time with h=height

Handling Collisions: Separate Chaining: Common Operations: The insertion algorithm

Lookup Add a new node in the linked list (front or tail)

Traversing a linear data structure

Main goal is to visit each node once in some order Visiting can be: • Store elements in a file • Send elements across a network • Find characteristics of the elements/sequence

Breadth First Search: A key property of this algorithm

No node on level i is enqueued until its parent is dequeued. In fact, we can observe that no node on level i is enqueued until after all nodes on the level i − 1 are enqueued already. Further, we can observe that no node on level i is enqueued until after all nodes on the level i − 2 have been dequeued (because only then will all nodes on level i − 1 have been enqueued).

Breadth-first tree traversals: Asymptotic Analysis: How much it takes to traverse the tree?

Number of nodes queued = n. • Θ(1) time needed to queue a node. • Θ(n) time spent after all iterations. Number of nodes dequeued = n. • Θ(n) time spent after all iterations. • The data in all n nodes is visited. • Θ(n) time is spent *Θ(n) time to run the complete traversal*

AVL trees guarantees common operations in

O(log n)

Binary trees: Background: Important decisions to make:

Organizing data into the tree. Selecting the data to store. Defining other potential shape restrictions (height?)

Depth-first tree traversals: Two possibilities to traverse a tree:

Preorder: Visit the data in a node first, then traverse its subtrees. • Postorder: Traverse a node's subtrees first, then visit its data.

Handling Collisions Open Addressing: Analysis of Linear probing

Primary clustering: Clusters grow and performance degrades quickly We consider length of the array's clusters • Sequences of contiguous cells in use. Larger clusters => higher changes to hit one during insertion Hitting a clusters make the cluster larger Growing cluster can merge into even larger ones.

Problems with Swap

Set temp's data to be a's data. This is simply a pointer copy, which is quite inexpensive. Set temp's size and capacity to be a's size and capacity. Again, this would be quite inexpensive. Set a's data to nullptr, since it no longer has any contents. Set a's size and capacity to 0.

Skip lists are simpler to implement than AVL trees, therefore the operation is

Skip lists do not guarantee O(log n). Number of nodes and random generator are important factors.

Implementing General trees 1. The parent pointer implementation.

Store nodes in an array-based data structure. Each node consist of both • a data element • a "pointer" to its parent

A min heap is a tree with the following properties:

The key in the root of the tree is less than or equal to the key stored in the root of every subtree Every subtree is a min heap

Handling Collisions: Separate Chaining: Common Operations: The lookup algorithm

The lookup algorithm • Get key's hash => use % to get index • Search the linked list at that index

Priority Queue Implementation: Concept Problem / Key property

The most important element is the one with the smallest priority value In the event of a tie between elements with the same priority, neither is considered more important than the other; an implementation can prefer whichever one it wants

Priority Queue Implementation: It must be a complete binary tree. We can number each node consecutively in level-order starting from 1, if we have the number of some node, we can always find the number of its children or its parent using a simple formula

The number of the left child is 2i The number of the right child is 2i + 1 The number of the parent is ⌊i / 2⌋ (i.e., the "floor" of i / 2)

Priority Queue Implementation: Binary min heaps: Flaws

There's no restriction on the shape the tree might have. It might be a single root node with a very large number of one-node subtrees; it might also be "degenerate" in the way that binary search trees can be degenerate, with every node having a single subtree

A look closer at Degenerate Tree: From a more analytical perspective: Time to run a Lookup? Time to build a degenerate tree?

Time to run a Lookup? = O(n) Time to build a degenerate tree? = 1 + 2 + 3 + ... + n = n(n + 1) / 2 = Θ(n2)

Depth-first tree traversals: Definition

Traverse each entire subtree before starting others.

What are trees?

Tree is a data structure that includes hierarchy. A tree is either: • Empty. • a root node, along with zero or more disjoint subtrees, each of which is a tree.

Breadth-first tree traversals: Asymptotic Analysis: How much memory do we need?

Tree properties: • n nodes, height of h, and a width of w. • Width: maximum number of nodes on any level Maximum amount of memory is proportional to Θ(w). • O(w) would be a valid answer as well.

Depth-first tree traversals: Asymptotic Analysis: Asymptotic Analysis: How much it takes to traverse the tree.

Tree properties: • n nodes, height of h, and a width of w. • height: determines how deep the recursion can go. Maximum amount of memory is proportional to Θ(h). • O(h) would be a valid answer as well.

Depth-first tree traversals: How much memory do we need?

Tree properties: • n nodes, height of h, and a width of w. • height: determines how deep the recursion can go. Maximum amount of memory is proportional to Θ(h). • O(h) would be a valid answer as well.

Hash tables do not order stored keys

We would need to extract all of them and then sort them

Breadth First Search: Diagram

X Q F D C N R S M H L

Priority Queue Implementation: enqueue

adds a new element to the back, with its priority value.

The length of a path is measured

as the number of "links" you follow. • {X,Q,C} is 2. is measured as the number of "links" you follow to get from the first node in the path to the last. So, for example, the length of the path {X, Q, C} is 2, because you follow two links (one from X to Q, another from Q to C) to get from X to C

Implementing General trees 2. The list of children implementation: A vector

cost of eventual reallocations mitigate the wastefulness of the array, at the cost of making some insertions more expensive than others, because of the reallocations done by vectors when their size reaches their capacity. (This would amortize well, though there would be a higher worst-case time than you might be able to get away with.)

rvalue

does not persist beyond the expression is one that does not have storage allocated to it, such as the value of an expression that was calculated temporarily Example: X =3; 3 is an rvalue (value on the right) rvalue references (&&) can be used for "move" constructors and assignment operators

Priority Queue Implementation: Asymptotic analysis using a binary min heap.

findMin = Θ(1) time. Enqueue = O(log n) time DequeueMin = Θ(log n) time.

Priority Queue Implementation: If linked list sorted by priority (lowest priority in the head).

findMin and dequeueMin = Θ(1) time because accessing or removing the first node in a linked list is always a constant-time operation; no searching or iterating is required Enqueue = O(n) time We'd need to search for the appropriate place in which to insert the element, based on its priority. This would involve iterating through the list until we found a node with a larger priority value; we would then insert a new node just before it

hash table

is a data structure that stores a collection of unique search keys and associated values (or pointers) directly in the cells of the array

A path in a tree

is a sequence parent-child-grandchild-...

breadth-first

is between the idea of working our way across before working our way downward

The level or depth of a node

is the length of the (unique) path from the root • The level or depth of C is 2.

The height of a tree

is the length of the longest path from a root to a leaf. (Or, alternatively, it is the height of the root node.) The height of tree above is 3, since the longest path is {X, M, F, L}, whose length is 3. This is often a key measurement, because it tells us the worst-case time it would take to follow a path in the tree, the central action around which many tree algorithms are based

The height of a node

is the length of the longest path from that node to a leaf • The height of F is 2 since the length of the longest path F to a leaf is the path {F, M, L}, which has length 2

std::move

move the value of its parameter into the object being constructed, leaving the parameter essentially valueless (or, at the very least, in some valid state) afterward

lvalue

refers to an object that persists beyond a single expression is one that has storage allocated to it. For example, when you evaluate the name of an existing variable in an expression, that's an lvalue, because variables are stored in memory Example: X =3; X is an lvalue (value on the left) variable references (&) are always lvalues

Priority Queue Implementation: dequeueMin

removes the element with smallest priority value similar to dequeue

Priority Queue Implementation: findMin

returns the element that has the smallest priority value similar to front

depth-first

the idea of working our way downward as far as that takes us before working our way across

The degree of a node

the number of subtrees/children it has

Breadth First Search:

the root first, then all nodes that are children of the root (i.e., at depth 1), then all nodes that are children of those nodes (i.e., at depth 2), and so on

there are lots of ways to traverse a tree, they fall broadly into two categories:

there are lots of ways to traverse a tree, they fall broadly into two categories: breadth-first and depth-first.

Hash tables can lead to...(operation time)

Θ(1) time for common operations Quality of the hash function and the load factor of the hash table are important.

AVL trees and skip lists allow traversal in ascending order in...

Θ(n) time We can use inorder traversal in AVL trees. We can iterate through the level 0 in a skip list.

Implementing General trees 2. The list of children implementation: Which data structure would you choose to store pointers to children: An array.

• Need to know maximum size at compile time. • All children will have the same size statically allocated directly within each Node object, though (a) it would be necessary to know its maximum size at compile time, and (b) every node would store the same number of pointers regardless of how many children it might actually have, which might potentially be quite wasteful if the maximum was large but most nodes, in practice, had few children


Conjuntos de estudio relacionados

AP Psych Chapter 1: history and research

View Set

Math Lesson 65 Dividing Fractions

View Set

States and Capitals: Eastern United States

View Set

🟥A) General Insurance ⛑️

View Set

Unit 4: The Secondary Mortgage Market

View Set

NV - Health Section 5 Field Underwriting Procedures

View Set

Khan Academy - Social Studies Praxis

View Set