GOOG tech

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

Selection Sort

A sort algorithm that repeatedly scans for the smallest item in the list and swaps it with the element at the current index. The index is then incremented, and the process repeats until the last two elements are sorted. Divides the input list into two parts: the sublist of items already sorted, which is built up from left to right at the front (left) of the list, and the sublist of items remaining to be sorted that occupy the rest of the list.

External sort

A sort using external storage (such as disk) in addition to main memory.

class

In object-oriented programming, a category of objects. For example, there might be a class called shape that contains objects which are circles, rectangles, and triangles. The class defines all the common properties of the different objects that belong to it.

Collisions

Since a hash function gets us a small number for a big key, there is possibility that two keys result in same value. The situation where a newly inserted key maps to an already occupied slot in hash table is called __________________ and must be handled using some technique, e.g. chaining, open addressing.

Space complexity

The memory required by an algorithm to execute a program and produce output. The amount of memory space required to solve an instance of the computational problem as a function of the size of the input.

Vertices

The nodes in a graph, which are connected by edges.

Recursion

The process of a method calling itself in order to solve a problem.

Mutable

changeable

object-oriented programming

designing a program by discovering objects, their properties, and their relationships modular approach to coding facilitate larger, more complex, and more quickly and reliably delivered software products.

system architecture

the arrangement of software, machinery, and tasks in an information system needed to achieve a specific functionality

Immutable

unchangeable

breadth-first or level order traversal

(Gotcha) traversing the nodes level by level; use a queue to do that; put the elements in 1 level in a queue and keep pulling elements off the queue until it's done

Basic operations performed in a stack

* Push: Adds an item in the stack. If the stack is full, then it is said to be an Overflow condition. * Pop: Removes an item from the stack. The items are popped in the reversed order in which they are pushed. If the stack is empty, then it is said to be an Underflow condition. * Peek or Top: Returns top element of stack. * isEmpty: Returns true if stack is empty, else false. (All take O(1) time)

Goals of using managed trees

1. A data structure that has some version of a sorted list of numbers where the left and right subtrees are relatively equal size 2. Goal: O(lg n) insert and lookup times 3. Pay a bit of a penalty on insert time (from constant time to lg n time) but get a HUGE savings on lookup times (from O(n) linear time down to lg n time); that's why these trees are good

Operations on a queue

1. Enqueue: Adds an item to the queue. If the queue is full, then it is said to be an Overflow condition. 2. Dequeue: Removes an item from the queue. The items are popped in the same order in which they are pushed. If the queue is empty, then it is said to be an Underflow condition. 3. Front: Get the front item from queue. 4. Rear: Get the last item from queue. 5. peek() − Gets the element at the front of the queue without removing it. 6. isfull() − Checks if the queue is full. 7. isempty() − Checks if the queue is empty.

Red Black Tree: properties

1. Every node is either red or black 2. The root is black 3. Every leaf (NIL) is black 4. If a node is red, then both its children are black 5. All simple paths from node to child leaves contain the same # of black nodes

Examples of stack applications

1. Redo-undo features at many places like editors, photoshop. 2. Forward and backward feature in web browsers 3. Used in many algorithms like Tower of Hanoi, tree traversals, stock span problem, histogram problem.

Benefits of tree data structures

1. Trees store information that naturally forms a hierarchy. For example, the file system on a computer. 2. Trees (with some ordering e.g., BST) provide moderate access/search (quicker than Linked List and slower than arrays). 3. Trees provide moderate insertion/deletion (quicker than Arrays and slower than Unordered Linked Lists). 4. Like Linked Lists and unlike Arrays, Trees don't have an upper limit on number of nodes as nodes are linked using pointers.

2 ways to implement a stack

1. Using array 2. Using linked list

Merge sort

A "divide and conquer" algorithm Divides input array in two halves, calls itself for the two halves and then merges the two sorted halves Most implementations produce a stable sort

Quicksort

A "divide and conquer" algorithm a sorting technique that: picks an element as pivot and partitions the given array around the picked pivot, then moves elements around the pivot and recursively sorts the elements to the left and the right of the pivot Internal sort

Heapsort

A comparison-based sorting technique based on Binary Heap data structure. It is similar to selection sort where we first find the maximum element and place the maximum element at the end. We repeat the same process for remaining element.

Hash Table

A data structure where the calculated value is used to mark the position in the table where the data item should be stored, enabling it to be accessed directly, rather than forcing a sequential search. the most important data structure to know deeply for technical interviews

Labeled graph

A graph where vertices and/or edges are identified with a unique number, letter, or name.

Components of a hash table

A hash function and an array Hash function's job is simply take a key of some input and map that key to an index in the array Usually "do something to turn the key into a number and then mod it by the size of the array"

Edge

A line connecting two nodes in a graph. Can be directed or undirected.

Graphs

A non-linear data structure consisting of nodes and edges. The nodes are sometimes also referred to as vertices and the edges are lines or arcs that connect any two nodes in the _________. Formal definition: consists of a finite set of vertices (or nodes) and set of edges which connect a pair of nodes. Good for modeling real-world networks. Example: - On Facebook, the people are considered nodes of the ___________ and the edges are friendship links. - Networks e.g. paths in a city or telephone network or circuit network

Map Reduce

A programming model that allows for massive scaling of data often using many (hundreds, thousands) of servers in Hadoop clusters. It involves a map job, where data is converted into paired data where each pair has a key and a value. In a common example, the key could be city and the value could be a temperature high for a particular day. The reduce job takes the data after it has been "mapped" and takes the data tuples (pairs) created by the map process and combines it into a smaller set of tuples.

Depth First Search (Tree)

A search in which children of a node are considered (recursively) before siblings are considered. AKA - each node in one branch is visited before exploring the next branch. AKA - Starts at the root node and explores as far as possible along each branch before backtracking. Implemented with a stack.

Red-Black Tree

A self-balancing binary search tree (BST) in which nodes are "colored" red or black. The balancing of the tree is not perfect, but it is good enough to allow it to guarantee searching in O(log n) time, where n is the total number of elements in the tree. The insertion and deletion operations, along with the tree rearrangement and recoloring, are also performed in O(log n) time. The longest path from the root to a leaf is no more than twice the length of the shortest path.

AVL Tree

A self-balancing binary search tree (BST), in which the heights of subtrees differ by at most 1. (named after its Russian inventors, Adelson-Velsky and Landis, first self-balancing BST) ___________ are often compared with red-black trees because both support the same set of operations and take O(log n) time for the basic operations. ____________ are beneficial in the cases where you are designing some database where insertions and deletions are not that frequent but you have to frequently look-up for the items present in there, because they are more strictly balanced. E.g. looking up trains in a rail system.

Splay Tree

A self-balancing binary tree that places recently accessed elements near the top of the tree for fast access. Kept balanced with rotations. Not strictly balanced (unlike AVL trees), so faster. Easy to implement, popular. Ex: Caches

B-tree

A self-balancing tree data structure that maintains sorted data and allows searches, sequential access, insertions, and deletions in logarithmic time. Unlike self-balancing binary search trees, it is optimized for systems that read and write large blocks of data. It is most commonly used in database and file systems. _____________ are designed to be stored on disks in which the cost of reading or writing a disk page is significantly higher than the cost of performing simple computations. If you're planning on storing a huge volume of data that can't fit into main memory, _____________ are an excellent choice for a data structure. On the other hand, in main memory, _____________ with a very large branching factor will be slower than BSTs or 2-3 trees because each _____________ insertion or deletion can require a large number of pointer reassignments.

Insertion sort

A simple sorting algorithm that builds the final sorted array one item at a time. Much less efficient on large lists than more advanced algorithms such as quicksort, heapsort, or merge sort. Internal sort

Stable sorting

A sorting algorithm is said to be _____________ if two objects with equal keys appear in the same order in sorted output as they appear in the input unsorted array. Ex: Some sorting algorithms are __________ by nature like Insertion sort, Merge Sort, Bubble Sort,

Binary Search Tree

A tree in which nodes are inserted systematically in natural order, with the final property of each left child being less than or equal to its parent, and each right child being greater than its parent. (Example: We observe that the root node key (27) has all less-valued keys on the left sub-tree and the higher valued keys on the right sub-tree.)

Interior node (aka non-leaf node)

A tree node of a tree that has children.

Leaf node

A tree node that has no children

Comparison sort

A type of sorting technique that arranges data by comparing two pieces at a time (e.g. bubble and merge sorting techniques)

O (1)

Aka constant time Big o level that is the most efficient; does not depend on the data inputs, the upper bound of run-time is constant Example: accessing a single element in an array, finding node value in min or max heap

O (x^N), e.g. O(2^N)

Aka exponential time "Not computable" or "non-polynomial": non-solvable problems; will take longer than the time in the universe to be able to solve that problem, once we get to n as values larger than 10 or 15 Example: Calculating combinations

O (N!)

Aka factorial time "Not computable" or "non-polynomial": non-solvable problems; will take longer than the time in the universe to be able to solve that problem, once we get to n as values larger than 10 or 15 Example: Calculating permutations

O (N)

Aka linear time The running time increases at most linearly with the size of the input. Example: Procedure that adds up all elements of a list requires time proportional to the length of the list, if the adding time is constant.

O (log N)

Aka logarithmic time Big O level that is highly efficient; the ratio of the number of operations to the size of the input decreases and tends to zero when n increases. Examples: operations on managed binary trees or when using binary search; finding a word in a dictionary

O (N^x)

Aka polynomial time Superset of quadratic time. Running time is upper bounded by a polynomial expression in the size of the input for the algorithm. E.g. Common sorting algorithms, e.g comparison sort

O (N^2)

Aka quadratic time (One type of polynomial time) Big O level that is inefficient: Run-time is a simple polynomial function of the size of the input. Example: common with algorithms that involve nested iterations over the data set; simple sorting algorithms, such as bubble sort, selection sort and insertion sort

O (N log N)

Aka quasilinear time Example: Fastest possible comparison sort; heapsort and merge sort

Bubble sort

Aka sinking sort; Simple sorting algorithm that repeatedly steps through the list, compares adjacent pairs and swaps them if they are in the wrong order. The pass through the list is repeated until the list is sorted. Internal sort

sorting algorithms

Algorithm that puts elements of a list in a certain order. The most frequently used orders are numerical order and lexicographical order. Important for optimizing the efficiency of other algorithms (such as search and merge algorithms), which require input data to be in sorted lists.

Internal sort

An internal sort is any data sorting process that takes place entirely within the main memory of a computer. This is possible whenever the data to be sorted is small enough to all be held in the main memory.

Dijkstra's algorithm

An optimal greedy algorithm to find the minimum distance and shortest path in a weighted graph from a give start node.

Hash function

Any function that can be used to map data of arbitrary size onto data of a fixed size. The values returned by a(n) _________________ are called hash values, hash codes, digests, or simply hashes.

Breadth First Search (Tree), aka Level Order Tree Traversal

Begins at a root node and inspects all the neighboring nodes. Then for each of those neighbor nodes in turn, it inspects their neighbor nodes which were unvisited, and so on. AKA - search through one level of children nodes, then through the level of grandchildren nodes, etc. Implemented with a queue.

Heaps

Data structure where the highest (or lowest) priority element is always stored at the root. Two flavors: Min and max However, not a sorted structure; it can be regarded as being partially ordered. Useful when it is necessary to repeatedly remove the object with the highest (or lowest) priority. Big O - Constant time access to the smallest or largest element - O (lg n) for insert or delete

Linear Sorting Algorithms

Describes the group of sorting algorithms that iterate over the data in place, working on the collection as a whole. E.g. bubble, insertion, selection sorts.

Polymorphism

Generally, the ability to appear in many forms. In OOP, refers to a programming language's ability to process objects differently depending on their data type or class. More specifically: ability to redefine methods for derived classes. For example, given a base class shape, polymorphism enables the programmer to define different area methods for any number of derived classes, such as circles, rectangles and triangles. No matter what shape an object is, applying the area method to it will return the correct results. In this picture, all objects have a method Speak() but each has a different implementation. _____________ allows you to do this, you can declare an action for a class and its subclasses.

Binary tree

Hierarchal data structure that consists of nodes, with one root node at the base of the tree, and two nodes (left child and right child) extending from the root, and from each child node. Most basic type. *Has no pre-defined structure* Can be "gotcha" questions: how do you do this with __________? Have to remember that there's no predefined structure so don't assume it's balanced or binary search tree or anything like that. "Useless" Pro: Constant time insert bc it doesn't matter where you put the element: O(1) Con: Linear lookup bc you generally have to look at all the elements: O(n) (Photo example: A labeled binary tree of size 9 and height 3, with a root node whose value is 2. The above tree is unbalanced and not sorted.)

Adaptive sorting

If order of the elements to be sorted of an input array matters (or) affects the time complexity of a sorting algorithm, then that algorithm is called _______________ Ex: Bubble Sort, Insertion Sort, Quick Sort

Undirected Graph

In a(n) ______________ ______________, the order of the vertices in the pairs in the Edge set doesn't matter. Edges are "two-way streets" Example: In Facebook friend is a bidirectional relationship (A is B's Friend => B is A's friend) so the graph is a(n) ____________ ________________.

Pointer

In computer science, a(n) ___________ is a programming language object that stores the memory address of another value located in computer memory. A(n) ___________ references a location in memory, and obtaining the value stored at that location is known as dereferencing the ___________. As an analogy, a page number in a book's index could be considered a(n) ___________ to the corresponding page; dereferencing such a(n) ___________ would be done by flipping to the page with the given page number and reading the text found on that page. The actual format and content of a(n) ___________ variable is dependent on the underlying computer architecture.

Method

In object-oriented programming (OOP), a procedure associated with an object class. An object is made up of behavior and data. Data is represented as properties of the object and behavior as ___________. ____________ are also the interface an object presents to the outside world.

Machine Learning

Leverages massive amounts of data so that computers can act and improve on their own without additional programming.

Array

Linear data structure In computer science, a(n) ___________ is a data structure consisting of a collection of elements (values or variables), each identified by at least one ____________ index or key. A(n) _________ is stored such that the position of each element can be computed from its index tuple by a mathematical formula. The simplest type of data structure is a linear array, also called one-dimensional _____________. _____________ are useful mostly because the element indices can be computed at run time.

Queue

Linear data structure that operates on a First In First Out (FIFO) basis, i.e. the data item stored first will be accessed first. Open at both ends; one end of the database is always used to insert data (enqueue) and the other end (the head) is used to remove data (dequeue). Real life example: a single-lane one-way road, where the vehicle enters first, exits first. More real-world examples can be seen as queues at the ticket windows and bus-stops.

Stack

Linear data structure which only allows access to top element, and follows a particular order in which the operations are performed = Last In First Out (LIFO) Real life example: plates stacked over one another. Plate on top is the first one removed, plate at bottom remains in stack for longest time.

Linked lists

Linear data structure; collection of data elements, called nodes, each pointing to the next node by means of a pointer. Each node comprised of two items - the data and a reference to the next node. Can be used to implement other data structures A __________ is a linear data structure, in which the elements are not stored at contiguous memory locations. The elements in a __________ are linked using pointers as shown in attached image. In simple words, a __________ consists of nodes where each node contains a data field and a reference(link) to the next node in the list. So __________ provides the following two advantages over arrays 1) Dynamic size 2) Ease of insertion/deletion __________ have following drawbacks: 1) Random access is not allowed. We have to access elements sequentially starting from the first node. So we cannot do a binary search with linked lists. 2) Extra memory space for a pointer is required with each element of the list. 3) Arrays have better cache locality that can make a pretty big difference in performance.

Advantages of using pointers

Major advantages of pointers are: (i) It allows management of structures which are allocated memory dynamically. (ii) It allows passing of arrays and strings to functions more efficiently. (iii) It makes possible to pass address of structure instead of entire structure to the functions. (iv) It makes possible to return more than one value from the function

Greedy algorithm

Mathematical process that looks for simple, easy-to-implement solutions to complex, multi-step problems by deciding which next step will provide the most obvious benefit

Trie (aka Prefix Tree)

Most efficient data structure (at least for text) Interior nodes (e.g. I -> In -> Inn) can actually have valuable information and not just the leaf nodes; so you don't have all this repeated data Space: as you're looking for a match, instead of doing a string comparison like you do in a hash table; common prefixes are stored only once. Typical usage for something like predictive text (start typing into google search bar, and see all the typeahead suggestions), all that stuff is stored in a(n) _______________ because you can dump all that stuff out after a given node.

Primary big O notation elvels

O (1) == constant time O (log N) == logarithmic time O (N) == linear time O (N log N) == quasilinear time O (N^2) == polynomial time

Clustering

Occurs after a hash collision causes two of the records in the hash table to hash to the same position, and causes one of the records to be moved to the next location in its probe sequence (in the case of linear probing). Once this happens, the _____________ formed by this pair of records is more likely to grow by the addition of even more colliding records, regardless of whether the new records hash to the same location as the first two. This phenomenon causes searches for keys within the _____________________ to be longer.

Pros/cons of chaining

Pros: - Easy to understand & implement (no complicated algorithms for deletion and collision handling) - The hash table can be of theoretically infinite size - The hash table will not get polluted - When searching, it will never search irrelevant information Cons: Requires additional memory outside the hash table

Pros/cons of open addressing

Pros: More efficient bc don't have to deference pointers and jump around memory locations; all you're doing is walking through a single array's memory, which is very fast. Cons: the number of stored entries cannot exceed the number of slots in the bucket array. Performance degrades when load factor grows beyond ~0.7. Then, need dynamic resizing with its attendant costs. ________________ is better used for hash tables with small records that can be stored within the table (internal storage) and fit in a cache line.

Pros/cons of double hashing

Pros: Virtually no clustering Cons: Poor cache performance; requires more computation time as two hash functions need to be computed.

Pros/cons of quadratic probing

Pros: better cache performance than double hashing, less susceptible to clustering than linear probing Cons: worse cache performance than linear probing, but more susceptible to clustering than linear probing

Pros/cons of linear probing

Pros: has the best cache performance and is easiest to compute Cons: Most susceptible to clustering

Weighted graph

Special type of labeled graph, where there is a number associated with each edge [its weight].

Max heap

Specialized tree-based data structure which is essentially an almost complete tree. Largest value is at the root and accessible in constant time. Children are smaller than the parent.

Min heap

Specialized tree-based data structure which is essentially an almost complete tree. Smallest value is at the root and accessible in constant time. Children are larger than the parent.

Tradeoffs of a hash table

Suck up a whole bunch of memory but give you constant time for inserts, lookups and deletes; "Magic" Speed/space tradeoff: if memory isn't a concern, ___________________ will likely help you speed up a solution. If memory is limited, you probably won't be able to use a(n) _____________________.

Chaining

Technique for dealing with collisions in hash tables. Make each cell of hash table point to a linked list of records that have same hash function value. In other words, get to a bucket in an array and if there's already some stuff in there you build a linked list off of it and start putting elements inside of that. Technically could also do this by putting a HT within a HT, which is really fast but uses a lot of memory

Open Addressing

Technique for dealing with collisions in hash tables. Places all the data in the hash table rather than relying on some way of storing some of the data outside the table. Uses probes to find an open location to store data. Types: Linear Probing, Quadratic Probing, Double Hashing

Depth First Traversal (Graph)

Technique for traversing a graph. Sticks with one path, following that path down a graph structure until it ends. Optimized not to tell us if a path is the shortest or not, but rather to tell us if the path even exists!

Breadth First Traversal (Graph)

Technique for traversing a graph. Visits the neighbor vertices before visiting the child vertices. Evaluates all the possible paths from a given node equally, checking all potential vertices from one node together, and comparing them simultaneously. Helps us determine one (of sometimes many) shortest path between two nodes in the graph. A queue is usually implemented

Non-adaptive sorting

The order of the elements in the input array doesn't matter Ex: Selection Sort, Merge Sort, Heap Sort

Directed Graph

This is a graph where an edge has a direction associated with it, for example, a plane flight that takes off in one location and arrives in another. The return flight would be considered a separate edge. Edges are "one-way streets" Example: A network like Google+ or Twitter would be considered a(n) _____________ ___________ since the direction of the relationship has meaning here.

inorder traversal

Type of depth-first tree traversal gives nodes in non-decreasing order; returns values from the underlying set in order (Left, Root, Right) : 4 2 5 1 3

preorder traversal

Type of depth-first tree traversal used to create a copy of the tree. Use if you want a node to be processed before its children. If you're searching for a node, for instance, it's a waste of time checking children before you check the node you're currently at. (Root, Left, Right) : 1 2 4 5 3

postorder traversal

Type of depth-first tree traversal used to delete the tree; if you don't use ___________ _______________ during deletion, then you lose the references you need for deleting the child trees (Left, Right, Root) : 4 5 2 3 1

Quadratic probing

Type of open addressing, as a way to deal with hash table collisions. Look in slot 1, then 2, then 4, then 8 etc The interval between probes increases quadratically

Linear probing

Type of open addressing, as a way to deal with hash table collisions. The interval between probes is fixed — often set to 1

Double Hashing

Type of open addressing, as a way to deal with hash table collisions. The process of using two hash functions to determine where to store the data. Probing interval decided using a second, independent hash function.

Non-comparison sorts

Usually faster than sorting because of not doing the comparison. The limit of speed for comparison-based sorting algorithm is O(NlogN) while for non-comparison based algorithms its O(n) i.e. linear time

Rehashing

Way to deal with hash table collisions. Expanding the table: double table size, find closest prime number. Rehash each element for the new table size.

Big O notation

Way to measure how well a computer algorithm scales as the amount of data involved increases Expresses worst-case run-time of an algorithm, i.e. the "upper bound" of complexity Useful for comparing the speed of two algorithms. AKA time complexity

Service-oriented architecture (SOA)

a business-driven enterprise architecture that supports integrating a business as linked, repeatable activities, tasks, or services a way of designing computer programs so they can be flexibly combined for cloud processing

Inheritance

a feature that represents the "is a" relationship between different classes. allows a class to have the same behavior as another class and extend or tailor that behavior to provide special action for specific needs.

Load Factor

in a hash table, the fraction of the table's capacity that is filled. in other words, the proportion of the slots in the array that are used.

Three-tier architecture

user tier, server tier, database tier


Ensembles d'études connexes

Demential, Delirium, and Other Neurodegenerative Disorders evolve

View Set

FIN 780 exam 1 chapter 2 concepts

View Set

Waldonomics Exam Review xoxo final?

View Set

Chapter 46: Cerebral Dysfunction NCLEX

View Set

Building Vocabulary: Word Roots, Affixes, and Reference Materials

View Set

Chapter 12- ENV 101 Straighterline

View Set

NURS 344: 8 Personality Disorders

View Set

Statistics: Descriptive- Chapter 2

View Set