Trees and Binary Search Trees
Expression Trees
--Binary tree to represent arithmetic expressions --The leaves of an expression tree are operands, such as constants and variable names, and the other nodes contain operators. InFix: A*B+C A*(B+C) Postfix(RPN): AB*C+ ABC+* Prefix: +*ABC *A+BC *To go from infix to postfix, put the infix into an expression tree, and then traverse it in post order to come out with infix expression.*
Binary Trees
--a tree in which no node can have more than 2 children --Useful in modeling processes where there are comparisons or an experiment has exactly two possible outcomes; is also useful when a test is performed repeatedly (coin toss, decision trees that are often used in AI, encoding/decoding messages in dots/dashes like morse code) *Tree height*: the height of a node is the length of the longest downward path to a leaf from that node (height of leaf nodes is zero, and height of the root node is the height of the tree) *Complete trees*: trees are complete when each level is completely filled except the bottom level. The leftmost positions are filled at the bottom level. Nice for array storage! However, array storage only for complete trees. *Balanced Trees* are binary trees, and this is true for all nodes in the tree: *|left_subTree height - right_subTree height| <= 1*
Complete trees
--each level is completely filled except the bottom level --the leftmost positions are filled at the bottom level --array storage is perfect for them, however if tree is not complete, you need to account for missing nodes, which may be very inefficient
How are trees useful?
--used to implement the file system of several popular operating systems --useful to evaluate arithmetic expressions --support searching operations in O(log N) average time and how to refine these ideas to obtain O(log N) worst case bounds --useful in modelling processes that have (2) outcomes (experiments or comparisons...example, coin toss experiment) --decision trees (expert systems in AI) --encoding/decoding messages in dots/dashes (morse code)
Linked Representation of binary trees
--uses space more efficiently --provides additional flexibility --each node has two links: one to the left child of the node, and one to the right child of the node
M Node Tree: 2-3-4 Tree
-Each node stores at most 3 data values (keys) -each non-leaf node is a 2 node, a 3 node, or a 4 node -All the leaves are on the same level
Insertion 2-3-4
-For a 2-3-4 tree, a new node is added to the top of the tree when the tree is full If tree is empty -create a 2 node containing new item-inital root of the tree else --find leaf node where item should be inserted by repeatedly comparing item with values in node and following appropriate link to child --if room in leaf node, add item, otherwise... 1) split 4 node into 2 nodes, one storing items less than median value, other storing items greater than median. Median value moved to a parent having these 2 nodes as children. 2) if parent has no room for median, split that 4node in the same manner, continuing until either a parent is found with room or reach full root 3) if full root 4 node split into two nodes, create new parent 2-node, the new root
Balanced trees
-binary trees -for all nodes: |left_subTree height - right_subTree height | <= 1
2-3-4 Operations:
-construction, empty, search -insert/delete must maintain as a 2-3-4 tree
BST vs 2-3-4
-insertion in a BST might change the height of a tree if its a leaf node, whereas in a 2-3-4 tree, all leaf nodes are on the same level
M Node Tree
-stores m-1 data values -has links to m subtrees
Non-recursive Inorder
1) Create an empty stack S 2) Initialize current node as root 3) push the current node to S and set current = current -> left until current is NULL 4) if current is NULL and stack is not empty, then a) pop the top item from stack b) print the popped item, set current = popped_item -> right c) go to step 3 5) if current is null and stack is empty, then we are done
AVL basic operations:
1) constructor, search traversal, empty 2) insert: keep balanced! 3) delete: keep balanced! 4) similar to BST
Three possibilities for inductive step:
1) inorder Traverse(root -> left) print (root -> data) traverse(root -> right) ^^ this will result in a print of the data in ascending order 2) Preorder print (root -> data) traverse(root -> left) traverse (root -> right) 3) Postorder traverse(root -> left) traverse (root -> right) print (root -> data)
AVL trees must be balanced after every insertion and deletion
4 cases of imbalance: 1) insertion was in left subtree of left child of N 2) insertion was in right subtree of right child of N 3) Insertion was in right subtree of left child of N 4) insertion was in left subtree of right child of N Case 1 and 2 require a single rotation for balancing Case 3 and 4 require double rotation for balancing
Trees
A data structure! --a tree is a collection of nodes. Unless empty, it begins at the root node --terms, root, siblings, grandparent/grandchild, parent/child, ancestor
What date structure should we use to implement an AVL tree?
An AVL tree has a balance factor, data, and two pointers pointing to the left and right.
What is T(n) of insert algorithm in a BST?
Average complexity O(logN)
What is T(n) of search algorithm in a BST? Why?
Average complexity O(logN)
Run time of Binary Search Trees
Average run time of most operations is O(log N).
BST vs AVL
BST: -no guarantee of balance -potential for lopsided +less complex code AVL: +tree is balanced +guarantee O(logn) -more complicated code -frequent rotations
Balanced Trees: AVL Trees
Balance factor: a node's BF is the difference between the heights of its left and right subtrees AVL Tree: A binary search tree in which, for every node the balance factor is either 0 or +1 or -1 (height of an empty tree is defined as -1)
Binary Search vs Linear Search
Binary search... Requires that data items should be in ascending order, and is usually implemented with an array because it needs direct access. advantages: -usually out performs linear search....O(log n) vs. O(n) disadvantages of binary search: --needs to be sorted --needs direct access of storage structures, NOT good for linked lists HOWEVER it is possible to use a linked structure which can be searched in a binary like manner...aka binary search trees If we do a binary search of a linked list, connecting an "index" to each node, traversing is still O(n) whereas it may still be better than linear search b/c comparisons go down to log(n) In order to do log(n) traversal and comparisons, we need a binary search tree!
Recursion and Binary Trees
Binary trees are recursive in nature Tree Traversal is recursive: if the binary tree is empty, then do nothing else inductive steps: N: visit the root, process data L: traverse the left subtree R: traverse the right subtree
Insertion
Build your BST by repeatedly calling a function to insert elements into a BST. Similar to search, can be done recursively and non recursively.
Binary Search Tree
In a BST, nodes are ordered in the following way: --each node has at most two children --each node contains one key (data) --all keys in the left subtree of a node are less than the key on the node --all the keys in the right subtree of a node are greater than the key in the node
What is the complexity of traversal?
O(n)
Inorder, preorder, and postorder
PreOrder: print node; traverse (left); traverse(right); InOrder: traverse(left); print node; traverse(right); PostOrder: traverse(left); traverse(right); print node;
How to deal with unbalanced trees?
Rebalance binary search tree when a new insertion makes the tree "too unbalanced" 1) AVL trees 2) Red Black Trees OR Allow more than one key per node of a search tree 1) 2-3 trees 2) 2-3-4 trees 3) B trees
Analysis of AVL Trees
Search and insertion are O(log n) Deletion is more complicated but is also O(log N) (deletion, replace node with largest node in left subtree) Disadvantages of AVL? frequent rotation, complexity Height of AVL = 1.01 log2n + .1 for large n
Simple Insertion Explanation:
So in each node, you're allowed at most 3 values. Each node is allowed to have up to 4 children. You'll always have childrenAmt = dataAmount+1. If when inserting, there are already 3 values in a node, you break that node apart by bringing the median node up.
Non Binary Trees
Some applications require more than two children per node aka game trees, genealogical trees
Lopsidedness
The order in which items are inserted into a BST determines the shape of the BST. This will result in balanced or unbalanced trees. *The complexity of a lopsided tree? If balanced, search is T(n) = O(logN) if unbalanced, T(n) = O(n)* Key issue? Attractiveness of binary search tree is marred by the bad (linear) worst case efficiency --e need to keep a BST balanced to maximize its benefits
Remove a node
There are 3 cases: You're removing.... 1) a leaf node (just delete and set to nullptr) 2) a node with one child (bypass deleted node by reconnecting) 3) a node with two children ----find the inorder successor (smallest node in right subtree), copy it into the node to be deleted, and then recursively delete
Non recursive traversal
Use a stack!!
Must a complete tree also be a balanced tree?
Yes
Basic Binary Search Tree Operations
construct an empty BST, isEmpty(), search(int item), insert(int data) while maintaining BST properties, delete(int data) that maintains BST properties, Traverse() with inorder, preorder, and postorder both recursively and non recursively
Given a node with position i in a complete tree, what are the positions of its child nodes?
left child = 2i+1, right child = 2i+2
Search
recursive, non recursive
Tree height
the height of a node is the length of the longest downward path to a leaf from that node you can find the height of any node