Exam 1 Data Structures
2. Explain and compare selection, insertion and bubble sorts. Assume that your application (say a data base application) deals with large records where comparisons are performed by means of integer keys. Assume also that your application requires a stable sorting method, i.e. a method that does not change the relative ordering of elements that are equal. Which of these sorting methods would you choose, and why? (Hint: consider sorting based on the following integer keys 19, 14(1), 7, 12, 14(2), 10)
1) Will first explain all the terms i.e, selection, insertion and bubble sorts. 2) Selection sort: Selection sort is a simple sorting algorithm. This sorting algorithm is an in-place comparison-based algorithm in which the list is divided into two parts, the sorted part at the left end and the unsorted part at the right end. Initially, the sorted part is empty and the unsorted part is the entire list. The smallest element is selected from the unsorted array and swapped with the leftmost element, and that element becomes a part of the sorted array. This process continues moving unsorted array boundary by one element to the right. This algorithm is not suitable for large data sets as its average and worst case complexities are of Ο(n2), where n is the number of items. 3) Insertion sort: This is an in-place comparison-based sorting algorithm. Here, a sub-list is maintained which is always sorted. For example, the lower part of an array is maintained to be sorted. An element which is to be 'insert'ed in this sorted sub-list, has to find its appropriate place and then it has to be inserted there. Hence the name, insertion sort. The array is searched sequentially and unsorted items are moved and inserted into the sorted sub-list (in the same array). This algorithm is not suitable for large data sets as its average and worst case complexity are of Ο(n2), where n is the number of items. 4) Bubble sort: Bubble sort is a simple sorting algorithm. This sorting algorithm is comparison-based algorithm in which each pair of adjacent elements is compared and the elements are swapped if they are not in order. This algorithm is not suitable for large data sets as its average and worst case complexity are of Ο(n2) where n is the number of items. 5) Now, lets compare all the three one by one. 6) The insertion sort usually performs the insert operation. On the contrary, the selection sort carries out the selection and positioning of the required elements. Insertion sort is said to be stable while selection sort is not a stable algorithm. In insertion sort algorithm the elements are previously known. In contrast, the selection sort contains the location beforehand. Insertion sort is a live sorting technique where the arriving elements are immediately sorted in the list whereas selection sort cannot work well with immediate data. The insertion sort has the O(n) running time in the best case. As against, the best case run time complexity of selection sort is O(n2). 7) In the bubble sort, each element and its adjacent element is compared and swapped if required. On the other hand, selection sort works by selecting the element and swapping that particular element with the last element. The selected element could be largest or smallest depending on the order i.e., ascending or descending. The worst case complexity is same in both the algorithms, i.e., O(n2), but best complexity is different. Bubble sort takes an order of n time whereas selection sort consumes an order of n2 time. Bubble sort is a stable algorithm, in contrast, selection sort is unstable. Selection sort algorithm is fast and efficient as compared to bubble sort which is very slow and inefficient. 8) Even though both the bubble sort and insertion sort algorithms have average case time complexities of O(n2), bubble sort is almost all the time outperformed by the insertion sort. This is due to the number of swaps needed by the two algorithms (bubble sorts needs more swaps). But due to the simplicity of bubble sort, its code size is very small. Also there is a variant of insertion sort called the shell sort, which has a time complexity of O(n3/2), which would allow it to be used practically. Furthermore, insertion sort is very efficient for sorting "nearly sorted" lists, when compared with the bubble sort.
10. Describe the List ADT (give a definition, set of operations). Describe different versions of the List ADT (ordered, unordered) and their implementations (array implementation, linked list implementation and doubly linked list implementation). Define the Double-Ended Queue ADT.
Abstract Data type (ADT) is a type (or class) for objects whose behaviour is defined by a set of value and a set of operations. Definition of List ADT: The data is generally stored in key sequence in a list which has a head structure consisting of count, pointers and address of compare function needed to compare the data in the list. The data node contains the pointer to a data structure and a self-referential pointer which points to the next node in the list. Set of Operations: get() - Return an element from the list at any given position. insert() - Insert an element at any position of the list. remove() - Remove the first occurrence of any element from a non-empty list. removeAt() - Remove the element at a specified location from a non-empty list. replace() - Replace an element at any position by another element. size() - Return the number of elements in the list. isEmpty() - Return true if the list is empty, otherwise return false. isFull() - Return true if the list is full, otherwise return false. Ordered ADT: OrderedList() creates a new ordered list that is empty. It needs no parameters and returns an empty list. add(item) adds a new item to the list making sure that the order is preserved. It needs the item and returns nothing. Assume the item is not already in the list. remove(item) removes the item from the list. It needs the item and modifies the list. Assume the item is present in the list. search(item) searches for the item in the list. It needs the item and returns a boolean value. Unordered ADT: List() creates a new list that is empty. It needs no parameters and returns an empty list. add(item) adds a new item to the list. It needs the item and returns nothing. Assume the item is not already in the list. remove(item) removes the item from the list. It needs the item and modifies the list. Assume the item is present in the list. search(item) searches for the item in the list. It needs the item and returns a boolean value. Array Implementation: A linked list is a sequence of data structures, which are connected together via links. Linked List is a sequence of links which contains items. Each link contains a connection to another link. Linked list is the second most-used data structure after array. A linked list is a sequence of data structures, which are connected together via links. Linked List is a sequence of links which contains items. Each link contains a connection to another link. Linked list is the second most-used data structure after array. Following are the important terms to understand the concept of Linked List.• Link − each link of a linked list can store a data called an element.• Next − each link of a linked list contains a link to the next link called Next.• Linked List − A Linked List contains the connection link to the first link called First. Linked List Implementation: Doubly linked list implementation: Doubly linked list is a type of data structure that is made up of nodes that are created using self referential structures. Each of these nodes contain three parts, namely the data and the reference to the next list node and the reference to the previous list node. Double-Ended Queue ADT: A double ended queue can be defined as a homogeneous list in which elements can be added or removed from both the ends i.e. element can be added or removed from both front and rear end. It is also called as deque. ADT for double ended queue: i. insertFirst(e): insert e at the beginning of the deque ii. insertLast(e): insert e at the end of the deque iii. removeFirst(): remove and return the first element iv. removeLast(): remove and return the last element v. first(): return the first element vi. last(): return the last element vii. isEmpty(): return true if deque is empty; false otherwise viii. size(): return the number of objects in the deque.
7. What is an Abstract Data Type and what is a data structure? Explain the levels of abstraction in the data specification. Consider the Matrix ADT example to illustrate your explanation.
An abstract data type, sometimes abbreviated ADT, is a logical description of how we view the data and the operations that are allowed without regard to how they will be implemented. This means that we are concerned only with what the data is representing and not with how it will eventually be constructed. By providing this level of abstraction, we are creating an encapsulation around the data. The idea is that by encapsulating the details of the implementation, we are hiding them from the user's view. This is called information hiding. The implementation of an abstract data type, often referred to as a Data structure, will require that we provide a physical view of the data using some collection of programming constructs and primitive data types. The Abstraction level : three level of abstraction can be achieved. 1)Physical: =The lowest level of abstraction describes how the data are actually stored. The physical level describes complex low-level data structures in detail. 2)Logical: The next-higher level of abstraction describes what data are stored in the database, and what relationships exist among those data. The logical level thus describes the entire database in terms of a small number of relatively simple structures. 3)View level: The highest level of abstraction describes only part of the entire database. The variety of information stored in a large database. Many users of the database system do not need all this information; instead, they need to access only a part of the database. The view level of abstraction exists to simplify their interaction with the system.
3. Describe the divide-and-conquer search algorithm and explain its efficiency. Consider two different Split functions: Split = Lo, and Split = (Lo + Hi) / 2. Draw the trees of recursive calls. Assume that we are searching a large data base with 4 million items represented as an ordered array. How many probes into the list will the binary search require before finding the target or concluding that it cannot be found? Assume that it takes 1 microsecond to access an item, estimate the execution time of the binary search.
In computer science, divide and conquer (D&C) is an algorithm design paradigm based on multi-branched recursion. A divide and conquer algorithm works by recursively breaking down a problem into two or more sub-problems of the same (or related) type (divide), until these become simple enough to be solved directly (conquer). The solutions to the sub-problems are then combined to give a solution to the original problem. This divide and conquer technique is the basis of efficient algorithms for all kinds of problems, such as sorting (e.g., quicksort, merge sort), multiplying large numbers (e.g. Karatsuba), syntactic analysis (e.g., top-down parsers), and computing the discrete Fourier transform (FFTs). Understanding and designing D&C algorithms is a complex skill that requires a good understanding of the nature of the underlying problem to be solved. As when proving a theorem by induction, it is often necessary to replace the original problem with a more general or complicated problem in order to initialize the recursion, and there is no systematic method for finding the proper generalization. These D&C complications are seen when optimizing the calculation of a Fibonacci number with efficient double recursion. The name "divide and conquer" is sometimes applied also to algorithms that reduce each problem to only one sub-problem, such as the binary search algorithm for finding a record in a sorted list (or its analog in numerical computing, the bisection algorithm for root finding).[1] These algorithms can be implemented more efficiently than general divide-and-conquer algorithms; in particular, if they use tail recursion, they can be converted into simple loops. Under this broad definition, however, every algorithm that uses recursion or loops could be regarded as a "divide and conquer algorithm". Therefore, some authors consider that the name "divide and conquer" should be used only when each problem may generate two or more subproblems.[2] The name decrease and conquer has been proposed instead for the single-subproblem class.[3] Algorithm efficiency The divide-and-conquer paradigm often helps in the discovery of efficient algorithms. It was the key, for example, to Karatsuba's fast multiplication method, the quicksort and mergesort algorithms, the Strassen algorithm for matrix multiplication, and fast Fourier transforms. In all these examples, the D&C approach led to an improvement in the asymptotic cost of the solution. For example, if the base cases have constant-bounded size, the work of splitting the problem and combining the partial solutions is proportional to the problem's size n, and there are a bounded number p of subproblems of size ~ n/p at each stage, then the cost of the divide-and-conquer algorithm will be O(n log n). The binary search algorithm begins by comparing the target value to the value of the middle element of the sorted array. If the target value is equal to the middle element's value, then the position is returned and the search is finished. If the target value is less than the middle element's value, then the search continues on the lower half of the array; or if the target value is greater than the middle element's value, then the search continues on the upper half of the array. This process continues, eliminating half of the elements, and comparing the target value to the value of the middle element of the remaining elements - until the target value is either found (and its associated element position is returned), or until the entire array has been searched (and "not found" is returned).
1. Explain why it is important to know the run time efficiency of algorithms, and how this knowledge helps us decide which algorithm is the best for a given application. Define the big-O notation and explain how well it allows us to classify algorithms from a real- world perspective. Give some peculiar examples to show the potential problems.
In computer science, the analysis of algorithms is the determination of the number of resources (such as time and storage) necessary to execute them. Most algorithms are designed to work with inputs of arbitrary length. Usually the efficiency or running time of an algorithm is stated as a function relating the input length to the number of steps (time complexity) or storage locations (space complexity). Algorithm analysis is an important part of a broader computational complexity theory, which provides theoretical estimates for the resources needed by any algorithm which solves a given computational problem. These estimates provide an insight into reasonable directions of search for efficient algorithms. In theoretical analysis of algorithms it is common to estimate their complexity in the asymptotic sense, i.e., to estimate the complexity function for arbitrarily large input. BIG O NOTATION : Big O notation, Big-omega notation and Big-theta notation are used to this end. For instance, binary search is said to run in a number of steps proportional to the logarithm of the length of the list being searched, or in O(log(n)), colloquially "in logarithmic time". Usually asymptotic estimates are used because different implementations of the same algorithm may differ in efficiency. However the efficiencies of any two "reasonable" implementations of a given algorithm are related by a constant multiplicative factor called a hidden constant.
8. Describe the Queue ADT (give a definition, set of operations). Explain and compare array and linked list implementations of the Queue ADT. Explain how queues are used in the Radix sort, and explain the Radix sort itself. Discuss the efficiency of the Radix sort.
Queue is an ordered collection of items into which items may be inserted at one end called rear of the queue and remove from another end called front of the queue. Queue is an Abstract Data Type(ADT) , it follows first in first out (FIFO) order. Set of Operations performed on Queue are: Enqueue means inserting elements from rear/back side of queue. Dequeue means deleting elements from front side of the queue. Traversal means traversing the queue from one end to another. * Sorting is not possible in a Queue.
4. Explain how the efficiency of recursive algorithms is defined. Consider two examples: the recursive binary search, and the tower of Hanoi problem. Draw the tree of recursive calls for both problems, and compare their efficiencies.
Recursion is a process by which a function calls itself directly or indirectly then function is called as recursive function and approach is called recursive approach to solve any problem. Efficiency of recursive algorithm is determine by no of recursive call that can be used to solve any problem.
5. Describe Shell sort (example is always helpful). What advantage do the relatively prime values of the increments have over other values? Provide examples of best case and worst case data sets for Shell sort, and compare its efficiency in these cases to the best case and worst case efficiency of Quick sort.
Shellsort, also known as the diminishing increment sort, is one of the oldest sorting algorithms. It is fast, easy to understand and easy to implement. However, its complexity analysis is a little more sophisticated. Advantage of Shell sort is that its only efficient for medium size lists. For bigger lists, the algorithm is not the best choice. Best Case: The best case in the shell sort is when the array is already sorted in the right order. The number of comparisons is less. The running time of Shell sort depends on the choice of increment sequence. The problem with Shell's increments is that pairs of increments are not necessarily relatively prime and smaller increments can have little effect . Complexity of Shell Sort: For the Shell's sequence the complexity is O(n2). Quick sort: Quick Sort, as the name suggests, sorts any list very quickly. Quick sort is not stable search, but it is very fast and requires very less additional space Worst Case Time Complexity : O(n2) Best Case Time Complexity : O(n log n) Average Time Complexity : O(n log n) Space Complexity : O(n log n) The Answer is reasonably short.
9. Describe the Stack ADT (give a definition, set of operations). Explain and compare array and linked list implementations of the Stack ADT. Describe one stack application -- your choice. Suggestions: converting expressions from infix to postfix form, evaluation of arithmetic or logical expressions, using stacks in Java Virtual Machine.
Stack Operations: isFull(), This is used to check whether stack is full or not isEmpty(), This is used to check whether stack is empty or not push(x), This is used to push x into the stack pop(), This is used to delete one element from top of the stack peek(), This is used to get the top most element of the stack size(), this function is used to get number of elements present into the stack
12. Describe the Positional Sequence ADT (give a definition, set of operations). Explain the notion of a position. Explain how Positional Sequence ADT is different from Ranked Sequence ADT.
When working with array-based sequences, integer indices provide an excellent means for describing the location of an element, or the location at which an insertion or deletion should take place. However, numeric indices are not a good choice for describing positions within a linked list because, knowing only an element's index, the only way to reach it is to traverse the list incrementally from its beginning or end, counting elements along the way. Example of a positional ADT structure includes a text document which can be viewed as a long sequence of characters. A word processor uses the abstraction of a cursor to describe a position within the document without explicit use of an integer index, allowing operations such as "delete the character at the cursor" or "insert a new character just after the cursor." Notion of position: To provide a general abstraction for the location of an element within a structure, we define a simple position abstract data type. A position supports the following single method: getElement():Returns the element stored at this position. A position acts as a marker or token within a broader positional list. A position p, which is associated with some element e in a list L, does not change, even if the index of e changes in L due to insertions or deletions elsewhere in the list. Nor does position p change if we replace the element e stored at p with another element. The Positional List Abstract Data Type: first(): Returns the position of the first element of L (or null if empty). last(): Returns the position of the last element of L (or null if empty). before(p): Returns the position of L immediately before position p (or null if p is the first position). after(p): Returns the position of L immediately after position p (or null if p is the last position). isEmpty(): Returns true if list L does not contain any elements. size(): Returns the number of elements in list L. How positional sequnce ADT is different from Ranked sequence ADT: 1. The Ranked Sequence ADT Definition A ranked sequence is a collection of items arranged in a linear order. Here, each item has a rank defining the relative. 2. The Positional Sequence ADT Definition A positional sequence is a collection of items arranged in a linear order, where each item has a position defined in terms of the neighboring nodes. Positions (contrary to ranks) are defined relatively to each other, and are not tied to items.
11. Describe the Ranked Sequence ADT (give a definition, set of operations). Compare the array-based and the doubly linked list implementation of the Ranked Sequence ADT (in details for each operation).
let us start with the definition: A Vector or Rank based sequence is a dynamic sequential list of elements. Each elements e in a vector is assigned a rank.which indicate a number of element in front of e in a vector.Rank can also be viewd as a current address or index. The rank of an element e in a sequence S is the number of elements that are before e in S.A linear sequence that supports access to its elements by their ranks is called a vector.The vector ADT extends the notion of array by storing a sequence of arbitrary objects .An element can be accessed, inserted or removed by specifying its rank (number of elements preceding it).An exception is thrown if an incorrect rank is specified (e.g., a negative rank). Main vector ADT operations:elemAtRank(int r): returns the element at rank r without removing itreplaceAtRank(int r, Object o): replace the element at rank r with oinsertAtRank(int r, Object o): insert a new element o to have rank rremoveAtRank(int r): removes the element at rank r Additional operations: size() and isEmpty() used intercahngibly. Rank-based ADT require searching from header or trailer while keeping track of ranks; hence, run in linear time. A ranked sequence S (with n elements) supports the following methods: elemAtRank(r):Return the element of S with rank r; an error occurs if r < 0 or r > n -1Input: Integer; Output: Object insertElemAtRank(r,e):Insert a new element into S which will have rank r; an error occurs if r < 0 or r > n - 1Input: Integer r, Object e; Output: Object removeElemAtRank(r):Remove from S the element at rank r; an error occurs if r < 0 or r > n - 1Input: Integer; Output: Object
6. Describe and compare Quick sort and Merge sort. How important is the choice of the pivot in Quick sort? Using trees of recursive calls, analyze and compare best and worst case efficiencies of Quick and Merge sorts.
Quick sort: divide-and-conquer recursive algorithm One of the fastest sorting algorithms Average running time O(NlogN) Worst-case running time O(N2) The Basic Idea is : Pick one element in the array, which will be the pivot. Make one pass through the array, called a partition step, re-arranging the entries so that: the pivot is in its proper place. entries smaller than the pivot are to the left of the pivot. entries larger than the pivot are to its right. Recursively apply quicksort to the part of the array that is to the left of the pivot,and to the right part of the array. Choice of Pivot in Quick sort: Choosing the pivot is an essential step.Depending on the pivot the algorithm may run very fast, or in quadric time: This is a bad choice - the pivot may turn to be the smallest or the largest element, then one of the partitions will be empty. Some fixed element: e.g. the first, the last, the one in the middle Randomly chosen (by random generator ) - still a bad choice. The median of the array (if the array has N numbers, the median is the [N/2] largest number. This is difficult to compute - increases the complexity. The median-of-three choice: take the first, the last and the middle element.Choose the median of these three elements. Complexity of Quicksort Worst-case: O(N2) This happens when the pivot is the smallest (or the largest) element.Then one of the partitions is empty, and we repeat recursively the procedure for N-1 elements. Best-case O(NlogN) The best case is when the pivot is the median of the array, and then the left and the right part will have same size.There are logN partitions, and to obtain each partitions we do N comparisons(and not more than N/2 swaps). Hence the complexity is O(NlogN). Merge Sort: Merge sort is based on Divide and conquer method. It takes the list to be sorted and divide it in half to create two unsorted lists. The two unsorted lists are then sorted and merged to get a sorted list. The two unsorted lists are sorted by continually calling the merge-sort algorithm; we eventually get a list of size 1 which is already sorted. The two lists of size 1 are then merged. Complexity of merge sort. Best case - When the array is already sorted O(nlogn). Worst case - When the array is sorted in reverse order O(nlogn). Average case - O(nlogn). Extra space is required, so space complexity is O(n) for arrays and O(logn) for linked lists. Quick Sort & Merge Sort: The Quick Sort and the Merge Sort are two well-known sorting algorithms used in computer languages. Both sorting algorithms have an average Big O of "n log n" where n is the number of items to be sorted. They both also have sorting algorithms that takes an approach similar to a "divide and conquer" setup. However, both sorting algorithms are quite different. The Merge Sort is sometimes known as an external sort. This sorting algorithm is usually required when sorting a set that is too large to hold or handle in internal memory. Basically the merge sort will divide the set into a number of subsets of one element and then repeatedly merge the subsets into increasingly larger subsets with the elements sorted correctly until one set is left. Usually this method means that the actually sorting ultimately deals with only portions of the complete set. The Quick Sort operates quite differently from the Merge Sort. In many cases, implementing the Quick Sort often yields a faster sort than other Big O "n log n" sorting algorithms. The Quick Sort operates by selecting a single element from the set and labeling it the pivot. The set is then reorder to ensure that all elements of lesser value than the pivot come before it and all elements of greater value come after it. This operation is recursively applied to the subsets on both sides of pivot until the entire set has been deemed, sorted.