Data Structures
R-tree*
space partitioning or binary space partitioning data structure. tree data structures used for spatial access methods, i.e., for indexing multi-dimensional information such as geographical coordinates, rectangles or polygons. The R-tree was proposed by Antonin Guttman in 1984[1] and has found significant use in both theoretical and applied contexts.[2] A common real-world usage for an R-tree might be to store spatial objects such as restaurant locations or the polygons that typical maps are made of: streets, buildings, outlines of lakes, coastlines, etc. and then find answers quickly to queries such as "Find all museums within 2 km of my current location", "retrieve all road segments within 2 km of my location" (to display them in a navigation system) or "find the nearest gas station" (although not taking roads into account). The R-tree can also accelerate nearest neighbor search[3] for various distance metrics, including great-circle distance. -The key idea of the data structure is to group nearby objects and represent them with their minimum bounding rectangle in the next higher level of the tree; the "R" in R-tree is for rectangle. Since all objects lie within this bounding rectangle, a query that does not intersect the bounding rectangle also cannot intersect any of the contained objects. At the leaf level, each rectangle describes a single object; at higher levels the aggregation of an increasing number of objects. This can also be seen as an increasingly coarse approximation of the data set. -Similar to the B-tree, the R-tree is also a balanced search tree (so all leaf nodes are at the same height), organizes the data in pages, and is designed for storage on disk (as used in databases). Each page can contain a maximum number of entries, often denoted as {\displaystyle M} M. It also guarantees a minimum fill (except for the root node), however best performance has been experienced with a minimum fill of 30%-40% of the maximum number of entries (B-trees guarantee 50% page fill, and B*-trees even 66%). The reason for this is the more complex balancing required for spatial data as opposed to linear data stored in B-trees. -As with most trees, the searching algorithms (e.g., intersection, containment, nearest neighbor search) are rather simple. The key idea is to use the bounding boxes to decide whether or not to search inside a subtree. In this way, most of the nodes in the tree are never read during a search. Like B-trees, this makes R-trees suitable for large data sets and databases, where nodes can be paged to memory when needed, and the whole tree cannot be kept in main memory. -The key difficulty of R-trees is to build an efficient tree that on one hand is balanced (so the leaf nodes are at the same height) on the other hand the rectangles do not cover too much empty space and do not overlap too much (so that during search, fewer subtrees need to be processed). For example, the original idea for inserting elements to obtain an efficient tree is to always insert into the subtree that requires least enlargement of its bounding box. Once that page is full, the data is split into two sets that should cover the minimal area each. Most of the research and improvements for R-trees aims at improving the way the tree is built and can be grouped into two objectives: building an efficient tree from scratch (known as bulk-loading) and performing changes on an existing tree (insertion and deletion). -R-trees do not guarantee good worst-case performance, but generally perform well with real-world data.[5] While more of theoretical interest, the (bulk-loaded) Priority R-tree variant of the R-tree is worst-case optimal,[6] but due to the increased complexity, has not received much attention in practical applications so far. -When data is organized in an R-tree, the k nearest neighbors (for any Lp-Norm) of all points can efficiently be computed using a spatial join.[7] This is beneficial for many algorithms based on the k nearest neighbors, for example the Local Outlier Factor. DeLi-Clu,[8] Density-Link-Clustering is a cluster analysis algorithm that uses the R-tree structure for a similar kind of spatial join to efficiently compute an OPTICS clustering.
BSP tree (binary space partitioning)*
space partitioning or binary space partitioning data structure. BSP is a method for recursively subdividing a space into convex sets (the region such that, for every pair of points within the region, every point on the straight line segment that joins the pair of points is also within the region) by hyperplanes (a subspace of one dimension less than its ambient space). This subdivision gives rise to a representation of objects within the space by means of a tree data structure known as a BSP tree. -Binary space partitioning was developed in the context of 3D computer graphics,[1][2] where the structure of a BSP tree allows spatial information about the objects in a scene that is useful in rendering, such as their ordering from front-to-back with respect to a viewer at a given location, to be accessed rapidly. Other applications include performing geometrical operations with shapes (constructive solid geometry) in CAD,[3] collision detection in robotics and 3-D video games, ray tracing and other computer applications that involve handling of complex spatial scenes. -Binary space partitioning is a generic process of recursively dividing a scene into two until the partitioning satisfies one or more requirements. It can be seen as a generalisation of other spatial tree structures such as k-d trees and quadtrees, one where hyperplanes that partition the space may have any orientation, rather than being aligned with the coordinate axes as they are in k-d trees or quadtrees. When used in computer graphics to render scenes composed of planar polygons, the partitioning planes are frequently (but not always) chosen to coincide with the planes defined by polygons in the scene. -The specific choice of partitioning plane and criterion for terminating the partitioning process varies depending on the purpose of the BSP tree. For example, in computer graphics rendering, the scene is divided until each node of the BSP tree contains only polygons that can render in arbitrary order. When back-face culling is used, each node therefore contains a convex set of polygons, whereas when rendering double-sided polygons, each node of the BSP tree contains only polygons in a single plane. In collision detection or ray tracing, a scene may be divided up into primitives on which collision or ray intersection tests are straightforward. -Binary space partitioning arose from the computer graphics need to rapidly draw three-dimensional scenes composed of polygons. A simple way to draw such scenes is the painter's algorithm, which produces polygons in order of distance from the viewer, back to front, painting over the background and previous polygons with each closer object. This approach has two disadvantages: time required to sort polygons in back to front order, and the possibility of errors in overlapping polygons. Fuchs and co-authors[2] showed that constructing a BSP tree solved both of these problems by providing a rapid method of sorting polygons with respect to a given viewpoint (linear in the number of polygons in the scene) and by subdividing overlapping polygons to avoid errors that can occur with the painter's algorithm. A disadvantage of binary space partitioning is that generating a BSP tree can be time-consuming. Typically, it is therefore performed once on static geometry, as a pre-calculation step, prior to rendering or other realtime operations on a scene. The expense of constructing a BSP tree makes it difficult and inefficient to directly implement moving objects into a tree. -BSP trees are often used by 3D video games, particularly first-person shooters and those with indoor environments. Game engines utilising BSP trees include the Doom engine (probably the earliest game to use a BSP data structure was Doom), the Quake engine and its descendants. In video games, BSP trees containing the static geometry of a scene are often used together with a Z-buffer, to correctly merge movable objects such as doors and characters onto the background scene. While binary space partitioning provides a convenient way to store and retrieve spatial information about polygons in a scene, it does not solve the problem of visible surface determination. -The canonical use of a BSP tree is for rendering polygons (that are double-sided, that is, without back-face culling) with the painter's algorithm. Each polygon is designated with a front side and a back side which could be chosen arbitrarily and only affects the structure of the tree but not the required result
B sharp tree
b-tree data structure. No article.
Suffix array
tree data structure where each tree node compares a bit slice of key values. a sorted array of all suffixes of a string. It is a data structure used, among others, in full text indices, data compression algorithms and within the field of bioinformatics.[1] -Suffix arrays were introduced by Manber & Myers (1990) as a simple, space efficient alternative to suffix trees. They have independently been discovered by Gaston Gonnet in 1987 under the name PAT array (Gonnet, Baeza-Yates & Snider 1992).
AList
b-tree data structure. No article.
Binary tree
tree data structure.
Fibonacci heap
heap data structure. a data structure for priority queue operations, consisting of a collection of heap-ordered trees. It has a better amortized running time than many other priority queue data structures including the binary heap and binomial heap. Michael L. Fredman and Robert E. Tarjan developed Fibonacci heaps in 1984 and published them in a scientific journal in 1987. They named Fibonacci heaps after the Fibonacci numbers, which are used in their running time analysis. -For the Fibonacci heap, the find-minimum operation takes constant (O(1)) amortized time.[1] The insert and decrease key operations also work in constant amortized time.[2] Deleting an element (most often used in the special case of deleting the minimum element) works in O(log n) amortized time, where n is the size of the heap.[2] This means that starting from an empty data structure, any sequence of a insert and decrease key operations and b delete operations would take O(a + b log n) worst case time, where n is the maximum heap size. In a binary or binomial heap such a sequence of operations would take O((a + b) log n) time. A Fibonacci heap is thus better than a binary or binomial heap when b is smaller than a by a non-constant factor. It is also possible to merge two Fibonacci heaps in constant amortized time, improving on the logarithmic merge time of a binomial heap, and improving on binary heaps which cannot handle merges efficiently. -Using Fibonacci heaps for priority queues improves the asymptotic running time of important algorithms, such as Dijkstra's algorithm for computing the shortest path between two nodes in a graph, compared to the same algorithm using other slower priority queue data structures. |Structure| -A Fibonacci heap is a collection of trees satisfying the minimum-heap property, that is, the key of a child is always greater than or equal to the key of the parent. This implies that the minimum key is always at the root of one of the trees. Compared with binomial heaps, the structure of a Fibonacci heap is more flexible. The trees do not have a prescribed shape and in the extreme case the heap can have every element in a separate tree. This flexibility allows some operations to be executed in a lazy manner, postponing the work for later operations. For example, merging heaps is done simply by concatenating the two lists of trees, and operation decrease key sometimes cuts a node from its parent and forms a new tree. -However at some point some order needs to be introduced to the heap to achieve the desired running time. In particular, degrees of nodes (here degree means the number of children) are kept quite low: every node has degree at most O(log n) and the size of a subtree rooted in a node of degree k is at least Fk+2, where Fk is the kth Fibonacci number. This is achieved by the rule that we can cut at most one child of each non-root node. When a second child is cut, the node itself needs to be cut from its parent and becomes the root of a new tree (see Proof of degree bounds, below). The number of trees is decreased in the operation delete minimum, where trees are linked together. -As a result of a relaxed structure, some operations can take a long time while others are done very quickly. For the amortized running time analysis we use the potential method, in that we pretend that very fast operations take a little bit longer than they actually do. This additional time is then later combined and subtracted from the actual running time of slow operations. The amount of time saved for later use is measured at any given moment by a potential function. The potential of a Fibonacci heap is given by Potential = t + 2m where t is the number of trees in the Fibonacci heap, and m is the number of marked nodes. A node is marked if at least one of its children was cut since this node was made a child of another node (all roots are unmarked). The amortized time for an operation is given by the sum of the actual time and c times the difference in potential, where c is a constant (chosen to match the constant factors in the O notation for the actual time). -Thus, the root of each tree in a heap has one unit of time stored. This unit of time can be used later to link this tree with another tree at amortized time 0. Also, each marked node has two units of time stored. One can be used to cut the node from its parent. If this happens, the node becomes a root and the second unit of time will remain stored in it as in any other root.
Ternary heap
heap data structure. 3-heap data structure. the nodes have 3 children.
Brodal queue
heap data structure. a heap/priority queue structure with very low worst case time bounds: O(1) for insertion, find-minimum, meld (merge two queues) and decrease-key and O(log (n)) for delete-minimum and general deletion; they are the first heap variant with these bounds. Brodal queues are named after their inventor Gerth Stølting Brodal. -While having better asymptotic bounds than other priority queue structures, they are, in the words of Brodal himself, "quite complicated" and "[not] applicable in practice."[1] Brodal and Okasaki describe a persistent (purely functional) version of Brodal queues
X-tree (for eXtended node tree)
space partitioning or binary space partitioning data structure. an index tree structure based on the R-tree used for storing data in many dimensions. It appeared in 1996,[2] and differs from R-trees (1984), R+-trees (1987) and R*-trees (1990) because it emphasizes prevention of overlap in the bounding boxes, which increasingly becomes a problem in high dimensions. In cases where nodes cannot be split without preventing overlap, the node split will be deferred, resulting in super-nodes. In extreme cases, the tree will linearize, which defends against worst-case behaviors observed in some other data structures.
M-tree
space partitioning or binary space partitioning data structure. tree data structures that are similar to R-trees and B-trees. It is constructed using a metric and relies on the triangle inequality for efficient range and k-nearest neighbor (k-NN) queries. While M-trees can perform well in many conditions, the tree can also have large overlap and there is no clear strategy on how to best avoid overlap. In addition, it can only be used for distance functions that satisfy the triangle inequality, while many advanced dissimilarity functions used in information retrieval do not satisfy this
B-trie
tree data structure where each tree node compares a bit slice of key values. Several trie variants are suitable for maintaining sets of strings in external memory, including suffix trees. A trie/B-tree combination called the B-trie has also been suggested for this task; compared to suffix trees, they are limited in the supported operations but also more compact, while performing update operations faster
And-or tree
multiway tree data structure. a graphical representation of the reduction of problems (or goals) to conjunctions and disjunctions of subproblems (or subgoals).
Rose tree (multi-way tree)
multiway tree data structure. a tree data structure with a variable and unbounded number of branches per node.[1][better source needed] The name rose tree for this structure is prevalent in the functional programming community, e.g., in the context of the Bird-Meertens formalism.[2] It was coined by Lambert Meertens to evoke the similarly-named, and similarly-structured, common rhododendron
Exponential tree
multiway tree data structure. almost identical to a binary search tree, with the exception that the dimension of the tree is not the same at all levels. In a normal binary search tree, each node has a dimension (d) of 1, and has 2d children. In an exponential tree, the dimension equals the depth of the node, with the root node having a d = 1. So the second level can hold two nodes, the third can hold eight nodes, the fourth 64 nodes, and so on.
Trie (digital tree and sometimes radix tree or prefix tree)
tree data structure where each tree node compares a bit slice of key values. a kind of search tree -- an ordered tree data structure that is used to store a dynamic set or associative array where the keys are usually strings. Unlike a binary search tree, no node in the tree stores the key associated with that node; instead, its position in the tree defines the key with which it is associated. All the descendants of a node have a common prefix of the string associated with that node, and the root is associated with the empty string. Values are not necessarily associated with every node. Rather, values tend only to be associated with leaves, and with some inner nodes that correspond to keys of interest. For the space-optimized presentation of prefix tree, see compact prefix tree. -In the example shown, keys are listed in the nodes and values below them. Each complete English word has an arbitrary integer value associated with it. A trie can be seen as a tree-shaped deterministic finite automaton. Each finite language is generated by a trie automaton, and each trie can be compressed into a deterministic acyclic finite state automaton. -Though tries are usually keyed by character strings, they need not be. The same algorithms can be adapted to serve similar functions of ordered lists of any construct, e.g. permutations on a list of digits or shapes. In particular, a bitwise trie is keyed on the individual bits making up any fixed-length binary datum, such as an integer or memory address. -As discussed below, a trie has a number of advantages over binary search trees.[6] A trie can also be used to replace a hash table, over which it has the following advantages: 1. Looking up data in a trie is faster in the worst case, O(m) time (where m is the length of a search string), compared to an imperfect hash table. An imperfect hash table can have key collisions. A key collision is the hash function mapping of different keys to the same position in a hash table. The worst-case lookup speed in an imperfect hash table is O(N) time, but far more typically is O(1), with O(m) time spent evaluating the hash. 2. There are no collisions of different keys in a trie. 3. Buckets in a trie, which are analogous to hash table buckets that store key collisions, are necessary only if a single key is associated with more than one value. 4. There is no need to provide a hash function or to change hash functions as more keys are added to a trie. 5. A trie can provide an alphabetical ordering of the entries by key. -Tries do have some drawbacks as well: 1. Tries can be slower in some cases than hash tables for looking up data, especially if the data is directly accessed on a hard disk drive or some other secondary storage device where the random-access time is high compared to main memory.[7] 2. Some keys, such as floating point numbers, can lead to long chains and prefixes that are not particularly meaningful. Nevertheless, a bitwise trie can handle standard IEEE single and double format floating point numbers. 3. Some tries can require more space than a hash table, as memory may be allocated for each character in the search string, rather than a single chunk of memory for the whole entry, as in most hash tables. Dictionary Representation -A common application of a trie is storing a predictive text or autocomplete dictionary, such as found on a mobile telephone. Such applications take advantage of a trie's ability to quickly search for, insert, and delete entries; however, if storing dictionary words is all that is required (i.e., storage of information auxiliary to each word is not required), a minimal deterministic acyclic finite state automaton (DAFSA) would use less space than a trie. This is because a DAFSA can compress identical branches from the trie which correspond to the same suffixes (or parts) of different words being stored. -Tries are also well suited for implementing approximate matching algorithms,[8] including those used in spell checking and hyphenation[4] software. Term Indexing -A discrimination tree term index stores its information in a trie data structure.
data types
Primitive types, Composite types, Abstract data types. a classification identifying one of various types of data that determines the possible values for that type, the operations that can be done on values of that type, the meaning of the data, and the way values of that type can be stored. Data types are used within type systems, which offer various ways of defining, implementing and using them. Different type systems ensure varying degrees of type safety.
XOR linked list*
List data structure. a data structure that takes advantage of the bitwise XOR operation to decrease storage requirements for doubly linked lists. A bitwise XOR takes two bit patterns of equal length and performs the logical exclusive OR operation on each pair of corresponding bits. The result in each position is 1 if only the first bit is 1 or only the second bit is 1, but will be 0 if both are 0 or both are 1. In this we perform the comparison of two bits, being 1 if the two bits are different, and 0 if they are the same. 0101 (decimal 5) XOR 0011 (decimal 3) = 0110 (decimal 6) The bitwise XOR may be used to invert selected bits in a register (also called toggle or flip). Any bit may be toggled by XORing it with 1. For example, given the bit pattern 0010 (decimal 2) the second and fourth bits may be toggled by a bitwise XOR with a bit pattern containing 1 in the second and fourth positions: 0010 (decimal 2) XOR 1010 (decimal 10) = 1000 (decimal 8) This technique may be used to manipulate bit patterns representing sets of Boolean states. -Assembly language programmers sometimes use XOR as a short-cut to setting the value of a register to zero. Performing XOR on a value against itself always yields zero, and on many architectures this operation requires fewer clock cycles and memory than loading a zero value and saving it to the register. -An ordinary doubly linked list stores addresses of the previous and next list items in each list node, requiring two address fields: ... A B C D E ... -> next -> next -> next -> <- prev <- prev <- prev <- An XOR linked list compresses the same information into one address field by storing the bitwise XOR (here denoted by ⊕) of the address for previous and the address for next in one field: ... A B C D E ... <-> A⊕C <-> B⊕D <-> C⊕E <-> When you traverse the list from left to right: supposing you are at C, you can take the address of the previous item, B, and XOR it with the value in the link field (B⊕D). You will then have the address for D and you can continue traversing the list. The same pattern applies in the other direction. -To start traversing the list in either direction from some point, you need the address of two consecutive items, not just one. If the addresses of the two consecutive items are reversed, you will end up traversing the list in the opposite direction. -Drawbacks: ...General-purpose debugging tools cannot follow the XOR chain, making debugging more difficult; [1] ...The price for the decrease in memory usage is an increase in code complexity, making maintenance more expensive; ...Most garbage collection schemes do not work with data structures that do not contain literal pointers; ...XOR of pointers is not defined in some contexts (e.g., the C language), although many languages provide some kind of type conversion between pointers and integers; ...The pointers will be unreadable if one isn't traversing the list — for example, if the pointer to a list item was contained in another data structure; ...While traversing the list you need to remember the address of the previously accessed node in order to calculate the next node's address. ...XOR linked lists do not provide some of the important advantages of doubly linked lists, such as the ability to delete a node from the list knowing only its address or the ability to insert a new node before or after an existing node when knowing only the address of the existing node. -Computer systems have increasingly cheap and plentiful memory, and storage overhead is not generally an overriding issue outside specialized embedded systems. Where it is still desirable to reduce the overhead of a linked list, unrolling provides a more practical approach (as well as other advantages, such as increasing cache performance and speeding random access).
Self-organizing list
List data structure. a list that reorders its elements based on some self-organizing heuristic to improve average access time. The aim of a self-organizing list is to improve efficiency of linear search by moving more frequently accessed items towards the head of the list. A self-organizing list achieves near constant time for element access in the best case. A self-organizing list uses a reorganizing algorithm to adapt to various query distributions at runtime.
Unrolled linked list*
List data structure. a variation on the linked list which stores multiple elements in each node. It can dramatically increase cache performance, while decreasing the memory overhead associated with storing list metadata such as references. It is related to the B-tree.
machine data type
All data in computers based on digital electronics is represented as bits (alternatives 0 and 1) on the lowest level. The smallest addressable unit of data is usually a group of bits called a byte (usually an octet, which is 8 bits). The unit processed by machine code instructions is called a word (as of 2011, typically 32 or 64 bits). Most instructions interpret the word as a binary number, such that a 32-bit word can represent unsigned integer values from 0 to (2^32) - 1 or signed integer values from -2^31 to (2^31) - 1. Because of two's complement, the machine language and machine doesn't need to distinguish between these unsigned and signed data types for the most part.
Merkle Tree (hash tree)
tree data structure where each tree node compares a bit slice of key values. a tree in which every non-leaf node is labelled with the hash of the labels or values (in case of leaves) of its child nodes. Hash trees allow efficient and secure verification of the contents of large data structures. Hash trees are a generalization of hash lists and hash chains. -Demonstrating that a leaf node is a part of the given hash tree requires processing an amount of data proportional to the logarithm of the number of nodes of the tree;[1] this contrasts with hash lists, where the amount is proportional to the number of nodes. -The concept of hash trees is named after Ralph Merkle who patented it in 1979
Directed acyclic graph (DAG)
Graph data structure. a finite directed graph with no directed cycles. That is, it consists of finitely many vertices and edges, with each edge directed from one vertex to another, such that there is no way to start at any vertex v and follow a consistently-directed sequence of edges that eventually loops back to v again. Equivalently, a DAG is a directed graph that has a topological ordering, a sequence of the vertices such that every edge is directed from earlier to later in the sequence. -The graph-structured stack is an essential part of Tomita's algorithm, where it replaces the usual stack of a pushdown automaton. This allows the algorithm to encode the nondeterministic choices in parsing an ambiguous grammar, sometimes with greater efficiency. -look at
Dynamic array (or growable array, resizable array, dynamic table, mutable array, or array list)
array data structure. a random access, variable-size list data structure that allows elements to be added or removed. It is supplied with standard libraries in many modern mainstream programming languages. -A dynamic array is not the same thing as a dynamically allocated array, which is an array whose size is fixed when the array is allocated, although a dynamic array may use such a fixed-size array as a back end. -The simplest dynamic array is constructed by allocating a fixed-size array and then dividing it into two parts: the first stores the elements of the dynamic array and the second is reserved, or unused. We can then add or remove elements at the end of the dynamic array in constant time by using the reserved space, until this space is completely consumed. The number of elements used by the dynamic array contents is its logical size or size, while the size of the underlying array is called the dynamic array's capacity or physical size, which is the maximum possible size without relocating data. -In applications where the logical size is bounded, the fixed-size data structure suffices. This may be short-sighted, as more space may be needed later. A philosophical programmer may prefer to write the code to make every array capable of resizing from the outset, then return to using fixed-size arrays during program optimization. Resizing the underlying array is an expensive task, typically involving copying the entire contents of the array. -To avoid incurring the cost of resizing many times, dynamic arrays resize by a large amount, such as doubling in size, and use the reserved space for future expansion. -The dynamic array has performance similar to an array, with the addition of new operations to add and remove elements: 1) Getting or setting the value at a particular index (constant time) 2) Iterating over the elements in order (linear time, good cache performance) 3) Inserting or deleting an element in the middle of the array (linear time) 4) Inserting or deleting an element at the end of the array (constant amortized time) -Dynamic arrays benefit from many of the advantages of arrays, including good locality of reference and data cache utilization, compactness (low memory use), and random access. They usually have only a small fixed additional overhead for storing information about the size and capacity. This makes dynamic arrays an attractive tool for building cache-friendly data structures. However, in languages like Python or Java that enforce reference semantics, the dynamic array generally will not store the actual data, but rather it will store references to the data that resides in other areas of memory. In this case, accessing items in the array sequentially will actually involve accessing multiple non-contiguous areas of memory, so the many advantages of the cache-friendliness of this data structure are lost. -Compared to linked lists, dynamic arrays have faster indexing (constant time versus linear time) and typically faster iteration due to improved locality of reference; however, dynamic arrays require linear time to insert or delete at an arbitrary location, since all following elements must be moved, while linked lists can do this in constant time. This disadvantage is mitigated by the gap buffer and tiered vector variants discussed under Variants below. Also, in a highly fragmented memory region, it may be expensive or impossible to find contiguous space for a large dynamic array, whereas linked lists do not require the whole data structure to be stored contiguously. -A balanced tree can store a list while providing all operations of both dynamic arrays and linked lists reasonably efficiently, but both insertion at the end and iteration over the list are slower than for a dynamic array, in theory and in practice, due to non-contiguous storage and tree traversal/manipulation overhead.
Suffix tree (also called PAT tree or, in an earlier form, position tree)
tree data structure where each tree node compares a bit slice of key values. a compressed trie containing all the suffixes of the given text as their keys and positions in the text as their values. Suffix trees allow particularly fast implementations of many important string operations. The construction of such a tree for the string S takes time and space linear in the length of S. Once constructed, several operations can be performed quickly, for instance locating a substring in S, locating a substring if a certain number of mistakes are allowed, locating matches for a regular expression pattern etc. Suffix trees also provide one of the first linear-time solutions for the longest common substring problem. These speedups come at a cost: storing a string's suffix tree typically requires significantly more space than storing the string itself.
Multimap (sometimes also multihash)
abstract data type. a generalization of a map or associative array abstract data type in which more than one value may be associated with and returned for a given key. Both map and multimap are particular cases of containers (for example, see C++ Standard Template Library containers). Often the multimap is implemented as a map with lists or sets as the map values. C++, Dart, Java, OCaml, Scala.
Lookup table
array data structure. an array that replaces runtime computation with a simpler array indexing operation. The savings in terms of processing time can be significant, since retrieving a value from memory is often faster than undergoing an "expensive" computation or input/output operation. The tables may be precalculated and stored in static program storage, calculated (or "pre-fetched") as part of a program's initialization phase (memorization), or even stored in hardware in application-specific platforms. Lookup tables are also used extensively to validate input values by matching against a list of valid (or invalid) items in an array and, in some programming languages, may include pointer functions (or offsets to labels) to process the matching input. -Examples 1) Simple lookup in an array, an associative array or a linked list (unsorted list) This is known as a linear search or brute-force search, each element being checked for equality in turn and the associated value, if any, used as a result of the search. This is often the slowest search method unless frequently occurring values occur early in the list. For a one-dimensional array or linked list, the lookup is usually to determine whether or not there is a match with an 'input' data value. 2) Binary search in an array or an associative array (sorted list) An example of a "divide and conquer algorithm", binary search involves each element being found by determining which half of the table a match may be found in and repeating until either success or failure. This is only possible if the list is sorted but gives good performance even if the list is lengthy. 3) Trivial hash function For a trivial hash function lookup, the unsigned raw data value is used directly as an index to a one-dimensional table to extract a result. For small ranges, this can be amongst the fastest lookup, even exceeding binary search speed with zero branches and executing in constant time.
Control table
array data structure. tables that control the control flow or play a major part in program control. There are no rigid rules about the structure or content of a control table—its qualifying attribute is its ability to direct control flow in some way through "execution" by a processor or interpreter. The design of such tables is sometimes referred to as table-driven design[1][2] (although this typically refers to generating code automatically from external tables rather than direct run-time tables). In some cases, control tables can be specific implementations of finite-state-machine-based automata-based programming. If there are several hierarchical levels of control table they may behave in a manner equivalent to UML state machines -control flow (or alternatively, flow of control) is the order in which individual statements, instructions or function calls of an imperative program are executed or evaluated. The emphasis on explicit control flow distinguishes an imperative programming language from a declarative programming language. Within an imperative programming language, a control flow statement is a statement whose execution results in a choice being made as to which of two or more paths should be followed. For non-strict functional languages, functions and language constructs exist to achieve the same result, but they are not necessarily called control flow statements.
Top tree
binary tree data structure. a data structure based on a binary tree for unrooted dynamic trees that is used mainly for various path-related operations. It allows simple divide-and-conquer algorithms. It has since been augmented to maintain dynamically various properties of a tree such as diameter, center and median. -A top tree R is defined for an underlying tree T and a set delta T (?) of at most two vertices called as External Boundary Vertices -The treap was first described by Cecilia R. Aragon and Raimund Seidel in 1989;[1][2] its name is a portmanteau of tree and heap. It is a Cartesian tree in which each key is given a (randomly chosen) numeric priority. As with any binary search tree, the inorder traversal order of the nodes is the same as the sorted order of the keys. The structure of the tree is determined by the requirement that it be heap-ordered: that is, the priority number for any non-leaf node must be greater than or equal to the priority of its children. Thus, as with Cartesian trees more generally, the root node is the maximum-priority node, and its left and right subtrees are formed in the same manner from the subsequences of the sorted order to the left and right of that node. An equivalent way of describing the treap is that it could be formed by inserting the nodes highest-priority-first into a binary search tree without doing any rebalancing. Therefore, if the priorities are independent random numbers (from a distribution over a large enough space of possible priorities to ensure that two nodes are very unlikely to have the same priority) then the shape of a treap has the same probability distribution as the shape of a random binary search tree, a search tree formed by inserting the nodes without rebalancing in a randomly chosen insertion order. Because random binary search trees are known to have logarithmic height with high probability, the same is true for treaps. Aragon and Seidel also suggest assigning higher priorities to frequently accessed nodes, for instance by a process that, on each access, chooses a random number and replaces the priority of the node with that number if it is higher than the previous priority. This modification would cause the tree to lose its random shape; instead, frequently accessed nodes would be more likely to be near the root of the tree, causing searches for them to be faster. Naor and Nissim[3] describe an application in maintaining authorization certificates in public-key cryptosystems.
Pagoda
binary tree data structure. a priority queue implemented with a variant of a binary tree. The root points to its children, as in a binary tree. Every other node points back to its parent and down to its leftmost (if it is a right child) or rightmost (if it is a left child) descendant leaf. The basic operation is merge or meld, which maintains the heap property. An element is inserted by merging it as a singleton. The root is removed by merging its right and left children. Merging is bottom-up, merging the leftmost edge of one with the rightmost edge of the other.
Arrays data structure
linear data structure. a data structure consisting of a collection of elements (values or variables), each identified by at least one array index or key. An array is stored so that the position of each element can be computed from its index tuple by a mathematical formula.[1][2][3] The simplest type of data structure is a linear array, also called one-dimensional array.
Minimax tree (sometimes MinMax or MM)
Application-specific tree data structure. a decision rule used in decision theory, game theory, statistics and philosophy for minimizing the possible loss for a worst case (maximum loss) scenario. Originally formulated for two-player zero-sum game theory, covering both the cases where players take alternate moves and those where they make simultaneous moves, it has also been extended to more complex games and to general decision-making in the presence of uncertainty.
Application-specific trees
Abstract syntax tree Parse tree Decision tree Alternating decision tree Minimax tree Expectiminimax tree Finger tree Expression tree Log-structured merge-tree
B-trees
B-tree B+ tree B*-tree B sharp tree Dancing tree 2-3 tree 2-3-4 tree Queap Fusion tree Bx-tree AList
Doubly linked list*
List data structure. a linked data structure that consists of a set of sequentially linked records called nodes. Each node contains two fields, called links, that are references to the previous and to the next node in the sequence of nodes. The beginning and ending nodes' previous and next links, respectively, point to some kind of terminator, typically a sentinel node or null, to facilitate traversal of the list. If there is only one sentinel node, then the list is circularly linked via the sentinel node. It can be conceptualized as two singly linked lists formed from the same data items, but in opposite sequential orders. -The two node links allow traversal of the list in either direction. While adding or removing a node in a doubly linked list requires changing more links than the same operations on a singly linked list, the operations are simpler and potentially more efficient (for nodes other than first nodes) because there is no need to keep track of the previous node during traversal or no need to traverse the list to find the previous node, so that its link can be modified. The concept is also the basis for the mnemonic link system memorization technique.
Multiset (or bag)
abstract data type. similar to a set but allows repeated ("equal") values (duplicates). This is used in two distinct senses: either equal values are considered identical, and are simply counted, or equal values are considered equivalent, and are stored as distinct items.
Array data type
composite data type. a data type that is meant to describe a collection of elements (values or variables), each selected by one or more indices (identifying keys) that can be computed at run time by the program. Such a collection is usually called an array variable, array value, or simply array.[1] By analogy with the mathematical concepts of vector and matrix, array types with one and two indices are often called vector type and matrix type, respectively.
data structures
concrete representations of data from the point of view of an implementer, not a user
Lists data structure
linear data structure.
built-in type
a data type for which the programming language provides built-in support.
basic type
a data type provided by a programming language as a basic building block. Most languages allow more complicated composite types to be recursively constructed starting from basic types.
Log-structured merge-tree (or LSM tree)
Application-specific tree data structure. a data structure with performance characteristics that make it attractive for providing indexed access to files with high insert volume, such as transactional log data. LSM trees, like other search trees, maintain key-value pairs. LSM trees maintain data in two or more separate structures, each of which is optimized for its respective underlying storage medium; data is synchronized between the two structures efficiently, in batches. -One simple version of the LSM tree is a two-level LSM tree.[1] As described by Patrick O'Neil, a two-level LSM tree comprises two tree-like structures, called C0 and C1. C0 is smaller and entirely resident in memory, whereas C1 is resident on disk. New records are inserted into the memory-resident C0 component. If the insertion causes the C0 component to exceed a certain size threshold, a contiguous segment of entries is removed from C0 and merged into C1 on disk. The performance characteristics of LSM trees stem from the fact that each component is tuned to the characteristics of its underlying storage medium, and that data is efficiently migrated across media in rolling batches, using an algorithm reminiscent of merge sort. -Most LSM trees used in practice employ multiple levels. Level 0 is kept in main memory, and might be represented using a tree. The on-disk data is organized into sorted runs of data. Each run contains data sorted by the index key. A run can be represented on disk as a single file, or alternatively as a collection of files with non-overlapping key ranges. To perform a query on a particular key to get its associated value, one must search in the Level 0 tree, as well as each run. -A particular key may appear in several runs, and what happens depends on the application. Some applications simply want the newest key-value pair with a given key. Some applications must combine the values in some way to get the proper aggregate value to return. For example, in Apache Cassandra, each value represents a row in a database, and different versions of the row may have different sets of columns.[2] -In order to keep down the cost of queries, the system must avoid a situation where there are too many runs. -Extensions to the 'levelled' method to incorporate B+ structures have been suggested, for example bLSM[3] and Diff-Index.[4] -LSM trees are used in database management systems such as BigTable, HBase, LevelDB, MongoDB, SQLite4, RocksDB, WiredTiger,[5] Apache Cassandra, and InfluxDB
2-3-4 tree (also called a 2-4 tree)
b-tree data structure. a self-balancing data structure that is commonly used to implement dictionaries. -The numbers mean a tree where every node with children (internal node) has either two, three, or four child nodes: 1) a 2-node has one data element, and if internal has two child nodes; 2) a 3-node has two data elements, and if internal has three child nodes; 3) a 4-node has three data elements, and if internal has four child nodes. -2-3-4 trees are B-trees of order 4;[1] like B-trees in general, they can search, insert and delete in O(log n) time. One property of a 2-3-4 tree is that all external nodes are at the same depth. -2-3-4 trees are an isometry of red-black trees, meaning that they are equivalent data structures. In other words, for every 2-3-4 tree, there exists at least one red-black tree with data elements in the same order. Moreover, insertion and deletion operations on 2-3-4 trees that cause node expansions, splits and merges are equivalent to the color-flipping and rotations in red-black trees. Introductions to red-black trees usually introduce 2-3-4 trees first, because they are conceptually simpler. 2-3-4 trees, however, can be difficult to implement in most programming languages because of the large number of special cases involved in operations on the tree. Red-black trees are simpler to implement,[2] so tend to be used instead.
lists data structure
Doubly linked list Array list Linked list Self-organizing list Skip list Unrolled linked list VList Conc-Tree list Xor linked list Zipper Doubly connected edge list Difference list Free list
Doubly connected edge list (DCEL, also known as half-edge data structure)*
List data structure. a data structure to represent an embedding of a planar graph in the plane, and polytopes in 3D -This data structure provides efficient manipulation of the topological information associated with the objects in question (vertices, edges, faces). It is used in many algorithms of computational geometry to handle polygonal subdivisions of the plane, commonly called planar straight-line graphs (PSLG).[1] For example, a Voronoi diagram is commonly represented by a DCEL inside a bounding box. -This data structure was originally suggested by Muller and Preparata[2] for representations of 3D convex polyhedra. -Later a somewhat different data structuring was suggested, but the name "DCEL" was retained. -For simplicity, only connected graphs are considered, however the DCEL structure may be extended to handle disconnected graphs as well. -DCEL is more than just a doubly linked list of edges. In the general case, a DCEL contains a record for each edge, vertex and face of the subdivision. Each record may contain additional information, for example, a face may contain the name of the area. Each edge usually bounds two faces and it is therefore convenient to regard each edge as two half-edges. Each half-edge bounds a single face and thus has a pointer to that face. A half-edge has a pointer to the next half-edge and previous half-edge of the same face. To reach the other face, we can go to the twin of the half-edge and then traverse the other face. Each half-edge also has a pointer to its origin vertex (the destination vertex can be obtained by querying the origin of its twin, or of the next half-edge). -Each vertex contains the coordinates of the vertex and also stores a pointer to an arbitrary edge that has the vertex as its origin. Each face stores a pointer to some half-edge of its outer boundary (if the face is unbounded then pointer is null). It also has a list of half-edges, one for each hole that may be incident within the face. If the vertices or faces do not hold any interesting information, there is no need to store them, thus saving space and reducing the data structure's complexity.
Space-partitioning trees
These are data structures used for space partitioning or binary space partitioning. Segment tree Interval tree Range tree Bin Kd-tree Implicit kd-tree Min/max kd-tree Relaxed Kd-tree Adaptive k-d tree Quadtree Octree Linear octree Z-order UB-tree R-tree R+ tree R* tree Hilbert R-tree X-tree Metric tree Cover tree M-tree VP-tree BK-tree Bounding interval hierarchy BSP tree Rapidly exploring random tree
linear data structures
arrays and lists. its elements form a sequence
Threaded binary tree
binary tree data structure. a binary tree variant that allows fast traversal: given a pointer to a node in a threaded tree, it is possible to cheaply find its in-order successor (and/or predecessor). -types of threaded binary trees: 1) Single Threaded: each node is threaded towards either the in-order predecessor or successor (left or right). 2) Double threaded: each node is threaded towards both the in-order predecessor and successor (left and right).
Double-precision floating-point
primitive data type. a computer number format that occupies 8 bytes (64 bits) in computer memory and represents a wide, dynamic range of values by using a floating point. Double-precision floating-point format usually refers to binary64, as specified by the IEEE 754 standard, not to the 64-bit decimal format decimal64. In older computers, different floating-point formats of 8 bytes were used, e.g., GW-BASIC's double-precision data type was the 64-bit MBF floating-point format. Double-precision binary floating-point is a commonly used format on PCs, due to its wider range over single-precision floating point, in spite of its performance and bandwidth cost. As with single-precision floating-point format, it lacks precision on integer numbers when compared with an integer format of the same size. It is commonly known simply as double. The IEEE 754 standard specifies a binary64 as having: Sign bit: 1 bit Exponent width: 11 bits Significand precision: 53 bits (52 explicitly stored)
B*-tree
b-tree data structure. balances more neighboring internal nodes to keep the internal nodes more densely packed (Comer 1979, p. 129). This variant requires non-root nodes to be at least 2/3 full instead of 1/2 (Knuth 1998, p. 488). To maintain this, instead of immediately splitting up a node when it gets full, its keys are shared with a node next to it. When both nodes are full, then the two nodes are split into three. Deleting nodes is somewhat more complex than inserting however.
Parallel array (SoA)
array data structure. a data structure for representing arrays of records. It keeps a separate, homogeneous data array for each field of the record, each having the same number of elements. Then, objects located at the same index in each array are implicitly the fields of a single record. Pointers from one object to another are replaced by array indices. This contrasts with the normal approach of storing all fields of each record together in memory (also known as AoS). For example, one might declare an array of 100 names, each a string, and 100 ages, each an integer, associating each name with the age that has the same index.
Set
abstract data type. an abstract data type that can store certain values, without any particular order, and no repeated values. It is a computer implementation of the mathematical concept of a finite set. Unlike most other collection types, rather than retrieving a specific element from a set, one typically tests a value for membership in a set. -Some set data structures are designed for static or frozen sets that do not change after they are constructed. Static sets allow only query operations on their elements — such as checking whether a given value is in the set, or enumerating the values in some arbitrary order. Other variants, called dynamic or mutable sets, allow also the insertion and deletion of elements from the set. -An abstract data structure is a collection, or aggregate, of data. The data may be booleans, numbers, characters, or other data structures. If one considers the structure yielded by packaging or indexing, there are four basic data structures: unpackaged, unindexed: bunch packaged, unindexed: set unpackaged, indexed: string (sequence) packaged, indexed: list (array)
Gap buffer
array data structure. a dynamic array that allows efficient insertion and deletion operations clustered near the same location. Gap buffers are especially common in text editors, where most changes to the text occur at or near the current location of the cursor. The text is stored in a large buffer in two contiguous segments, with a gap between them for inserting new text. Moving the cursor involves copying text from one side of the gap to the other (sometimes copying is delayed until the next operation that changes the text). Insertion adds new text at the end of the first segment; deletion deletes it. Text in a gap buffer is represented as two strings, which take very little extra space and which can be searched and displayed very quickly, compared to more sophisticated data structures such as linked lists. However, operations at different locations in the text and ones that fill the gap (requiring a new gap to be created) may require copying most of the text, which is especially inefficient for large files. The use of gap buffers is based on the assumption that such recopying occurs rarely enough that its cost can be amortized over the more common cheap operations. This makes the gap buffer a simpler alternative to the rope for use in text editors such as Emacs.
primitive data types
a basic type or a built-in type. In most programming languages, all basic data types are built-in. In addition, many languages also provide a set of composite data types. Opinions vary as to whether a built-in type that is not basic should be considered "primitive". Boolean, true or false Character Floating-point, single-precision real number values Double, a wider floating-point size Integer, integral or fixed-precision values Enumerated type, a small set of uniquely-named values
abstract data type (ADT)
a mathematical model of the data objects that make up a data type as well as the functions that operate on these objects. a data type is defined by its behavior (semantics), specifically in terms of possible values, possible operations on data of this type, and the behavior of these operations. This contrasts with data structures, which are concrete representations of data, and are the point of view of an implementer, not a user. Container List Associative array Multimap Set Multiset Stack Queue Double-ended queue Priority queue Tree Graph
Tree data structure
abstract data type. a widely used abstract data type (ADT)—or data structure implementing this ADT—that simulates a hierarchical tree structure, with a root value and subtrees of children with a parent node, represented as a set of linked nodes. -a (possibly non-linear) data structure made up of nodes or vertices and edges without having any cycle. The tree with no nodes is called the null or empty tree. A tree that is not empty consists of a root node and potentially many levels of additional nodes that form a hierarchy. -A tree data structure can be defined recursively (locally) as a collection of nodes (starting at a root node), where each node is a data structure consisting of a value, together with a list of references to nodes (the "children"), with the constraints that no reference is duplicated, and none points to the root. -Alternatively, a tree can be defined abstractly as a whole (globally) as an ordered tree, with a value assigned to each node. Both these perspectives are useful: while a tree can be analyzed mathematically as a whole, when actually represented as a data structure it is usually represented and worked with separately by node (rather than as a list of nodes and an adjacency list of edges between nodes, as one may represent a digraph, for instance).
Image (or system image)
array data structure. a serialized copy of the entire state of a computer system stored in some non-volatile form such as a file. A system is said to be capable of using system images if it can be shut down and later restored to exactly the same state. In such cases, system images can be used for backup. Hibernation is an example that uses an image of the entire machine's RAM. -If a system has all its state written to a disk, then a system image can be produced by simply copying that disk to a file elsewhere, often with disk cloning applications. On many systems a complete system image cannot be created by a disk cloning program running within that system because information can be held outside of disks and volatile memory, for example in non-volatile memory like boot ROMs.
Splay tree*
binary tree data structure. a self-adjusting binary search tree with the additional property that recently accessed elements are quick to access again. It performs basic operations such as insertion, look-up and removal in O(log n) amortized time. For many sequences of non-random operations, splay trees perform better than other search trees, even when the specific pattern of the sequence is unknown. The splay tree was invented by Daniel Sleator and Robert Tarjan in 1985.[1] -All normal operations on a binary search tree are combined with one basic operation, called splaying. Splaying the tree for a certain element rearranges the tree so that the element is placed at the root of the tree. One way to do this is to first perform a standard binary tree search for the element in question, and then use tree rotations in a specific fashion to bring the element to the top. Alternatively, a top-down algorithm can combine the search and the tree reorganization into a single phase.
Scapegoat tree
binary tree data structure. a self-balancing binary search tree, invented by Arne Andersson[1] and again by Igal Galperin and Ronald L. Rivest.[2] It provides worst-case O(log n) lookup time, and O(log n) amortized insertion and deletion time. -Unlike most other self-balancing binary search trees that provide worst case O(log n) lookup time, scapegoat trees have no additional per-node memory overhead compared to a regular binary search tree: a node stores only a key and two pointers to the child nodes. This makes scapegoat trees easier to implement and, due to data structure alignment, can reduce node overhead by up to one-third.
Alternating decision tree (ADTree)
Application-specific tree data structure. a machine learning method for classification. It generalizes decision trees and has connections to boosting. -An ADTree consists of an alternation of decision nodes, which specify a predicate condition, and prediction nodes, which contain a single number. An instance is classified by an ADTree by following all paths for which all decision nodes are true, and summing any prediction nodes that are traversed. -An alternating decision tree consists of decision nodes and prediction nodes. Decision nodes specify a predicate condition. Prediction nodes contain a single number. ADTrees always have prediction nodes as both root and leaves. An instance is classified by an ADTree by following all paths for which all decision nodes are true and summing any prediction nodes that are traversed. This is different from binary classification trees such as CART (Classification and regression tree) or C4.5 in which an instance follows only one path through the tree.
Hash tree
Hash data structure. may refer to: -Hashed array tree -Hash tree (persistent data structure), an implementation strategy for sets and maps -Merkle tree
Composite types
a data type. a composite data type or compound data type is any data type that can be constructed in a program using the programming language's primitive data types and other composite types. It is sometimes called a structure or aggregate data type, although the latter term may also refer to arrays, lists, etc. The act of constructing a composite type is known as composition. Array Record (also called tuple or struct) Union Tagged union (also called variant, variant record, discriminated union, or disjoint union)
trees data structure
a widely used abstract data type (ADT)—or data structure implementing this ADT—that simulates a hierarchical tree structure, with a root value and subtrees of children with a parent node, represented as a set of linked nodes. binary trees, b-trees, heaps, trees, multiway trees, space-partitioning trees, application-specific trees
two's complement
a mathematical operation on binary numbers, as well as a binary signed number representation based on this operation. Its wide use in computing makes it the most important example of a radix complement. The two's-complement system has the advantage that the fundamental arithmetic operations of addition, subtraction, and multiplication are identical to those for unsigned binary numbers (as long as the inputs are represented in the same number of bits and any overflow beyond those bits is discarded from the result). This property makes the system both simpler to implement and capable of easily handling higher precision arithmetic. Also, zero has only a single representation, obviating the subtleties associated with negative zero, which exists in ones'-complement systems.
Queue
abstract data type. a particular kind of abstract data type or collection in which the entities in the collection are kept in order and the principal (or only) operations on the collection are the addition of entities to the rear terminal position, known as enqueue, and removal of entities from the front terminal position, known as dequeue. This makes the queue a First-In-First-Out (FIFO) data structure. In a FIFO data structure, the first element added to the queue will be the first one to be removed. This is equivalent to the requirement that once a new element is added, all elements that were added before have to be removed before the new element can be removed. Often a peek or front operation is also entered, returning the value of the front element without dequeuing it. A queue is an example of a linear data structure, or more abstractly a sequential collection. -Queues provide services in computer science, transport, and operations research where various entities such as data, objects, persons, or events are stored and held to be processed later. In these contexts, the queue performs the function of a buffer. -Queues are common in computer programs, where they are implemented as data structures coupled with access routines, as an abstract data structure or in object-oriented languages as classes. Common implementations are circular buffers and linked lists.
Variable-length array (VLA or also called variable-sized, runtime-sized)
array data structure. an array data structure of automatic storage duration whose length is determined at run time (instead of at compile time). supported in older programming languages like Ada, Algol 68, APL, C99, etc.
Binary search tree (BST, sometimes called ordered or sorted binary trees)
binary tree data structure. BST are a particular type of containers: data structures that store "items" (such as numbers, names etc.) in memory. They allow fast lookup, addition and removal of items, and can be used to implement either dynamic sets of items, or lookup tables that allow finding an item by its key (e.g., finding the phone number of a person by name). -Binary search trees keep their keys in sorted order, so that lookup and other operations can use the principle of binary search: when looking for a key in a tree (or a place to insert a new key), they traverse the tree from root to leaf, making comparisons to keys stored in the nodes of the tree and deciding, based on the comparison, to continue searching in the left or right subtrees. On average, this means that each comparison allows the operations to skip about half of the tree, so that each lookup, insertion or deletion takes time proportional to the logarithm of the number of items stored in the tree. This is much better than the linear time required to find items by key in an (unsorted) array, but slower than the corresponding operations on hash tables. -Several variants of the binary search tree have been studied in computer science; this article deals primarily with the basic type, making references to more advanced types when appropriate.
Cartesian tree*
binary tree data structure. a binary tree derived from a sequence of numbers; it can be uniquely defined from the properties that it is heap-ordered and that a symmetric (in-order) traversal of the tree returns the original sequence. Introduced by Vuillemin (1980) in the context of geometric range searching data structures, Cartesian trees have also been used in the definition of the treap and randomized binary search tree data structures for binary search problems. The Cartesian tree for a sequence may be constructed in linear time using a stack-based algorithm for finding all nearest smaller values in a sequence. -Cartesian trees may be used as part of an efficient data structure for range minimum queries, a range searching problem involving queries that ask for the minimum value in a contiguous subsequence of the original sequence.[2] In a Cartesian tree, this minimum value may be found at the lowest common ancestor of the leftmost and rightmost values in the subsequence.
AVL tree
binary tree data structure. a self-balancing binary search tree. It was the first such data structure to be invented.[2] In an AVL tree, the heights of the two child subtrees of any node differ by at most one; if at any time they differ by more than one, rebalancing is done to restore this property. Lookup, insertion, and deletion all take O(log n) time in both the average and worst cases, where n is the number of nodes in the tree prior to the operation. Insertions and deletions may require the tree to be rebalanced by one or more tree rotations. -The AVL tree is named after its two Soviet inventors, Georgy Adelson-Velsky and Evgenii Landis, who published it in their 1962 paper "An algorithm for the organization of information". -AVL trees are often compared with red-black trees because both support the same set of operations and take O(log n) time for the basic operations. For lookup-intensive applications, AVL trees are faster than red-black trees because they are more rigidly balanced. Similar to red-black trees, AVL trees are height-balanced. Both are in general not weight-balanced nor μ-balanced for any μ≤1⁄2;[5] that is, sibling nodes can have hugely differing numbers of descendants. -Both AVL trees and red-black trees are self-balancing binary search trees and they are very similar mathematically.[9] The operations to balance the trees are different, but both occur on the average in O(1) with maximum in O(log n). The real difference between the two is the limiting height. -AVL trees are more rigidly balanced than red-black trees, leading to faster retrieval but slower insertion and deletion.
Free list
List data structure. a data structure used in a scheme for dynamic memory allocation. It operates by connecting unallocated regions of memory together in a linked list, using the first word of each unallocated region as a pointer to the next. It is most suitable for allocating from a memory pool, where all objects have the same size. -Free lists make the allocation and deallocation operations very simple. To free a region, one would just link it to the free list. To allocate a region, one would simply remove a single region from the end of the free list and use it. If the regions are variable-sized, one may have to search for a region of large enough size, which can be expensive. -Free lists have the disadvantage, inherited from linked lists, of poor locality of reference and so poor data cache utilization, and they do not automatically consolidate adjacent regions to fulfill allocation requests for large regions, unlike the buddy allocation system. Nevertheless, they're still useful in a variety of simple applications where a full-blown memory allocator is unnecessary or requires too much overhead.
Graph data structure
abstract data type. an abstract data type that is meant to implement the undirected graph and directed graph concepts from mathematics. -A graph data structure consists of a finite (and possibly mutable) set of vertices or nodes or points, together with a set of unordered pairs of these vertices for an undirected graph or a set of ordered pairs for a directed graph. These pairs are known as edges, arcs, or lines for an undirected graph and as arrows, directed edges, directed arcs, or directed lines for a directed graph. The vertices may be part of the graph structure, or may be external entities represented by integer indices or references. -A graph data structure may also associate to each edge some edge value, such as a symbolic label or a numeric attribute (cost, capacity, length, etc.). -Different data structures for the representation of graphs are used in practice: 1) Adjacency list Vertices are stored as records or objects, and every vertex stores a list of adjacent vertices. This data structure allows the storage of additional data on the vertices. Additional data can be stored if edges are also stored as objects, in which case each vertex stores its incident edges and each edge stores its incident vertices. 2) Adjacency matrix A two-dimensional matrix, in which the rows represent source vertices and columns represent destination vertices. Data on edges and vertices must be stored externally. Only the cost for one edge can be stored between each pair of vertices. 3) Incidence matrix A two-dimensional Boolean matrix, in which the rows represent the vertices and columns represent the edges. The entries indicate whether the vertex at a row is incident to the edge at a column. -look at table for time complexity cost -Adjacency lists are generally preferred because they efficiently represent sparse graphs. An adjacency matrix is preferred if the graph is dense, that is the number of edges |E | is close to the number of vertices squared, |V |2, or if one must be able to quickly look up if there is an edge connecting two vertices. -some properties of abstract data types: order, unique, associative
Stack
abstract data type. an abstract data type that serves as a collection of elements, with two principal operations: push, which adds an element to the collection, and pop, which removes the most recently added element that was not yet removed. The order in which elements come off a stack gives rise to its alternative name, LIFO (for last in, first out). Additionally, a peek operation may give access to the top without modifying the stack. -The name "stack" for this type of structure comes from the analogy to a set of physical items stacked on top of each other, which makes it easy to take an item off the top of the stack, while getting to an item deeper in the stack may require taking off multiple other items first. -Considered as a linear data structure, or more abstractly a sequential collection, the push and pop operations occur only at one end of the structure, referred to as the top of the stack. This makes it possible to implement a stack as a singly linked list and a pointer to the top element.
Sparse array
array data structure. an array in which most of the elements have the default value (usually 0 or null). The occurrence of zero-value elements in a large array is inefficient for both computation and storage. An array in which there is a large number of zero elements is referred to as being sparse. -In the case of sparse arrays, one can ask for a value from an "empty" array position. If one does this, then for an array of numbers, a value of zero should be returned, and for an array of objects, a value of null should be returned. -A naive implementation of an array may allocate space for the entire array, but in the case where there are few non-default values, this implementation is inefficient. Typically the algorithm used instead of an ordinary array is determined by other known features (or statistical features) of the array. For instance, if the sparsity is known in advance or if the elements are arranged according to some function (e.g., the elements occur in blocks). A heap memory allocator in a program might choose to store regions of blank space in a linked list rather than storing all of the allocated regions in, say a bit array. -An obvious question that might be asked is why we need a linked list to represent a sparse array if we can represent it easily using a normal array. The answer to this question lies in the fact that while representing a sparse array as a normal array, a lot of space is allocated for zero or null elements. For example, consider the following array declaration: double arr[1000][1000] -When we define this array, enough space for 1,000,000 doubles is allocated. If each double requires 8 bytes of memory, this array will require 8 million bytes of memory. Because this is a sparse array, most of its elements will have a value of zero (or null). Hence, defining this array will soak up all this space and waste memory (compared to an array in which memory has been allocated only for the nonzero elements). An effective way to overcome this problem is to represent the array using a linked list which requires less memory as only elements having non-zero value are stored. This involves a time-space trade-off: though less memory is used, average access and insertion time becomes linear in the number of elements stored because the previous elements in the list must be traversed to find the desired element. A normal array has constant access and insertion time. -A sparse array as a linked list contains nodes linked to each other. In a one-dimensional sparse array, each node includes the non-zero element's "index" (position), the element's "value", and a node pointer "next" (for linking to the next node). Nodes are linked in order as per the index. In the case of a two-dimensional sparse array, each node contains a row index, a column index (which together give its position), a value at that position and a pointer to the next node.
Character
primitive data type. a unit of information that roughly corresponds to a grapheme (the smallest unit of a writing system of any given language), grapheme-like unit, or symbol, such as in an alphabet or syllabary in the written form of a natural language. Examples of characters include letters, numerical digits, common punctuation marks (such as "." or "-"), and whitespace. The concept also includes control characters, which do not correspond to symbols in a particular natural language, but rather to other bits of information used to process text in one or more languages. Examples of control characters include carriage return or tab, as well as instructions to printers or other devices that display or otherwise process text. Characters are typically combined into strings. Computers and communication equipment represent characters using a character encoding that assigns each character to something — an integer quantity represented by a sequence of digits, typically — that can be stored or transmitted through a network. Two examples of usual encodings are ASCII and the UTF-8 encoding for Unicode. While most character encodings map characters to numbers and/or bit sequences, Morse code instead represents characters using a series of electrical impulses of varying length.
floating point
primitive data type. single-precision real number values. In mathematics, a real number is a value that represents a quantity along a continuous line (vs imaginary). A number is, in general, represented approximately to a fixed number of significant digits (the significand) and scaled using an exponent in some fixed base; the base for the scaling is normally two, ten, or sixteen. The term floating point refers to the fact that a number's radix point (decimal point, or, more commonly in computers, binary point) can "float"; that is, it can be placed anywhere relative to the significant digits of the number. This position is indicated as the exponent component, and thus the floating-point representation can be thought of as a kind of scientific notation.
fractional cascading
variation in binary search. a technique to speed up a sequence of binary searches for the same value in a sequence of related data structures. The first binary search in the sequence takes a logarithmic amount of time, as is standard for binary searches, but successive searches in the sequence are faster. The original version of fractional cascading, introduced in two papers by Chazelle and Guibas in 1986 (Chazelle & Guibas 1986a; Chazelle & Guibas 1986b), combined the idea of cascading, originating in range searching data structures of Lueker (1978) and Willard (1978), with the idea of fractional sampling, which originated in Chazelle (1983). Later authors introduced more complex forms of fractional cascading that allow the data structure to be maintained as the data changes by a sequence of discrete insertion and deletion events. -In general, fractional cascading begins with a catalog graph, a directed graph in which each vertex is labeled with an ordered list. A query in this data structure consists of a path in the graph and a query value q; the data structure must determine the position of q in each of the ordered lists associated with the vertices of the path. For the simple example above, the catalog graph is itself a path, with just four nodes. It is possible for later vertices in the path to be determined dynamically as part of a query, in response to the results found by the searches in earlier parts of the path. -To handle queries of this type, for a graph in which each vertex has at most d incoming and at most d outgoing edges for some constant d, the lists associated with each vertex are augmented by a fraction of the items from each outgoing neighbor of the vertex; the fraction must be chosen to be smaller than 1/d, so that the total amount by which all lists are augmented remains linear in the input size. Each item in each augmented list stores with it the position of that item in the unaugmented list stored at the same vertex, and in each of the outgoing neighboring lists. In the simple example above, d = 1, and we augmented each list with a 1/2 fraction of the neighboring items. -A query in this data structure consists of a standard binary search in the augmented list associated with the first vertex of the query path, together with simpler searches at each successive vertex of the path. If a 1/r fraction of items are used to augment the lists from each neighboring item, then each successive query result may be found within at most r steps of the position stored at the query result from the previous path vertex, and therefore may be found in constant time without having to perform a full binary search.
Binary trees
AA tree AVL tree Binary search tree Binary tree Cartesian tree Left-child right-sibling binary tree Order statistic tree Pagoda Randomized binary search tree Red-black tree Rope Scapegoat tree Self-balancing binary search tree Splay tree T-tree Tango tree Threaded binary tree Top tree Treap WAVL tree Weight-balanced tree
Array list (dynamic array?)
List data structure.
Ctrie (concurrent hash-trie)
*tree data structure where each tree node compares a bit slice of key values. *hash data structure. a concurrent thread-safe lock-free implementation of a hash array mapped trie. It is used to implement the concurrent map abstraction. It has particularly scalable concurrent insert and remove operations and is memory-efficient.[3] It is the first known concurrent data-structure that supports O(1), atomic, lock-free snapshots. |Advantages| Ctries have been shown to be comparable in performance with concurrent skip lists,[2][4] concurrent hash tables and similar data structures in terms of the lookup operation, being slightly slower than hash tables and faster than skip lists due to the lower level of indirections. However, they are far more scalable than most concurrent hash tables where the insertions are concerned.[1] Most concurrent hash tables are bad at conserving memory - when the keys are removed from the hash table, the underlying array is not shrunk. Ctries have the property that the allocated memory is always a function of only the current number of keys in the data-structure.[1] -Ctries have logarithmic complexity bounds of the basic operations, albeit with a low constant factor due to the high branching level (usually 32). -Ctries support a lock-free, linearizable, constant-time snapshot operation,[2] based on the insight obtained from persistent data structures. This is a breakthrough in concurrent data-structure design, since existing concurrent data-structures do not support snapshots. The snapshot operation allows implementing lock-free, linearizable iterator, size and clear operations - existing concurrent data-structures have implementations which either use global locks or are correct only given that there are no concurrent modifications to the data-structure. In particular, Ctries have an O(1) iterator creation operation, O(1) clear operation, O(1) duplicate operation and an amortized O(logn) size retrieval operation. |Problems| Most concurrent data structures require dynamic memory allocation, and lock-free concurrent data structures rely on garbage collection on most platforms. The current implementation[4] of the Ctrie is written for the JVM, where garbage collection is provided by the platform itself. While it's possible to keep a concurrent memory pool for the nodes shared by all instances of Ctries in an application or use reference counting to properly deallocate nodes, the only implementation so far to deal with manual memory management of nodes used in Ctries is the common-lisp implementation cl-ctrie, which implements several stop-and-copy and mark-and-sweep garbage collection techniques for persistent, memory-mapped storage. Hazard pointers are another possible solution for a correct manual management of removed nodes. Such a technique may be viable for managed environments as well, since it could lower the pressure on the GC. A Ctrie implementation in Rust makes use of hazard pointers for this purpose
Recursion
A common method of simplification is to divide a problem into subproblems of the same type. As a computer programming technique, this is called divide and conquer and is key to the design of many important algorithms. Divide and conquer serves as a top-down approach to problem solving, where problems are solved by solving smaller and smaller instances. A contrary approach is dynamic programming. This approach serves as a bottom-up approach, where problems are solved by solving larger and larger instances, until the desired size is reached. -Recursion in computer programming is exemplified when a function is defined in terms of simpler, often smaller versions of itself. The solution to the problem is then devised by combining the solutions obtained from the simpler versions of the problem.
Decision tree
Application-specific tree data structure. a decision support tool that uses a tree-like graph or model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. It is one way to display an algorithm. -Decision trees are commonly used in operations research, specifically in decision analysis, to help identify a strategy most likely to reach a goal, but are also a popular tool in machine learning.
Finger tree
Application-specific tree data structure. a purely functional data structure used in efficiently implementing other functional data structures. A finger tree gives amortized constant time access to the "fingers" (leaves) of the tree, where data is stored, and also stores in each internal node the result of applying some associative operation to its descendants. This "summary" data stored in the internal nodes can be used to provide the functionality of data structures other than trees. For example, a priority queue can be implemented by labeling the internal nodes by the minimum priority of its children in the tree, or an indexed list/array can be implemented with a labeling of nodes by the count of the leaves in their children. -Finger trees can provide amortized O(1) pushing, reversing, popping, O(log n) append and split; and can be adapted to be indexed or ordered sequences. And like all functional data structures, it is inherently persistent; that is, older versions of the tree are always preserved. -They have since been used in the Haskell core libraries (in the implementation of Data.Sequence), and an implementation in OCaml exists[1] which was derived from a proven-correct Coq specification;[2] and a C# implementation of finger trees was published in 2008; the Yi text editor specializes finger trees to finger strings for efficient storage of buffer text. Finger trees can be implemented with or without lazy evaluation,[3] but laziness allows for simpler implementations. -They were first published in 1977 by Leonidas J. Guibas,[4] and periodically refined since (e.g. a version using AVL trees,[5] non-lazy finger trees, simpler 2-3 finger trees,[6] B-Trees and so on)
Expectiminimax tree
Application-specific tree data structure. a specialized variation of a minimax game tree for use in artificial intelligence systems that play two-player zero-sum games such as backgammon, in which the outcome depends on a combination of the player's skill and chance elements such as dice rolls. In addition to "min" and "max" nodes of the traditional minimax tree, this variant has "chance" ("move by nature") nodes, which take the expected value of a random event occurring.[1] In game theory terms, an expectiminimax tree is the game tree of an extensive-form game of perfect, but incomplete information. -In the traditional minimax method, the levels of the tree alternate from max to min until the depth limit of the tree has been reached. In an expectiminimax tree, the "chance" nodes are interleaved with the max and min nodes. Instead of taking the max or min of the utility values of their children, chance nodes take a weighted average, with the weight being the probability that that child is reached.[1] -The interleaving depends on the game. Each "turn" of the game is evaluated as a "max" node (representing the AI player's turn), a "min" node (representing a potentially-optimal opponent's turn), or a "chance" node (representing a random effect or player).[1] -For example, consider a game in which each round consists of a single dice throw, and then decisions made by first the AI player, and then another intelligent opponent. The order of nodes in this game would alternate between "chance", "max" and then "min"
Expression tree
Application-specific tree data structure. a specific kind of a binary tree used to represent expressions. Two common types of expressions that a binary expression tree can represent are algebraic[1] and boolean. These trees can represent expressions that contain both unary (an operation with only one operand, i.e. a single input) and binary (a function that takes two inputs) operators.[1] -Each node of a binary tree, and hence of a binary expression tree, has zero, one, or two children. This restricted structure simplifies the processing of expression trees. -The leaves of a binary expression tree are operands, such as constants or variable names, and the other nodes contain operators. These particular trees happen to be binary, because all of the operations are binary, and although this is the simplest case, it is possible for nodes to have more than two children. It is also possible for a node to have only one child, as is the case with the unary minus operator. An expression tree, T, can be evaluated by applying the operator at the root to the values obtained by recursively evaluating the left and right subtrees
Abstract syntax tree (AST, or just syntax tree)
Application-specific tree data structure. a tree representation of the abstract syntactic structure of source code written in a programming language. Each node of the tree denotes a construct occurring in the source code. The syntax is "abstract" in not representing every detail appearing in the real syntax. For instance, grouping parentheses are implicit in the tree structure, and a syntactic construct like an if-condition-then expression may be denoted by means of a single node with three branches. -This distinguishes abstract syntax trees from concrete syntax trees, traditionally designated parse trees, which are often[citation needed] built by a parser during the source code translation and compiling process. Once built, additional information is added to the AST by means of subsequent processing, e.g., contextual analysis. -Abstract syntax trees are also used in program analysis and program transformation systems.
Parse tree (or parsing tree or derivation tree or (concrete) syntax tree)
Application-specific tree data structure. an ordered, rooted tree that represents the syntactic structure of a string according to some context-free grammar. The term parse tree itself is used primarily in computational linguistics; in theoretical syntax the term syntax tree is more common. Parse trees are distinct from the abstract syntax trees used in computer programming, in that their structure and elements more concretely reflect the syntax of the input language. They are also distinct from (although based on similar principles to) the sentence diagrams (such as Reed-Kellogg diagrams) sometimes used for grammar teaching in schools. -Parse trees are usually constructed based on either the constituency relation of constituency grammars (phrase structure grammars) or the dependency relation of dependency grammars. Parse trees may be generated for sentences in natural languages (see natural language processing), as well as during processing of computer languages, such as programming languages. -A related concept is that of phrase marker or P-marker, as used in transformational generative grammar. A phrase marker is a linguistic expression marked as to its phrase structure. This may be presented in the form of a tree, or as a bracketed expression. Phrase markers are generated by applying phrase structure rules, and themselves are subject to further transformational rules.
Adjacency list
Graph data structure. a collection of unordered lists used to represent a finite graph. Each list describes the set of neighbors of a vertex in the graph. This is one of several commonly used representations of graphs for use in computer programs. |Data Structures| For use as a data structure, the main alternative to the adjacency list is the adjacency matrix. Because each entry in the adjacency matrix requires only one bit, it can be represented in a very compact way, occupying only |V|^2/8 bytes of contiguous space, where | V| is the number of vertices of the graph. Besides avoiding wasted space, this compactness encourages locality of reference. -However, for a sparse graph, adjacency lists require less space, because they do not waste any space to represent edges that are not present. Using a naïve array implementation on a 32-bit computer, an adjacency list for an undirected graph requires about 2(32/8)| E| = 8| E| bytes of space, where | E| is the number of edges of the graph. -Noting that an undirected simple graph can have at most | V|^2/2 edges, allowing loops, we can let d = |E|/| V|^2 denote the density of the graph. Then, 8| E | > | V |^2/8 when | E|/| V|^2 > 1/64, that is the adjacency list representation occupies more space than the adjacency matrix representation when d > 1/64. Thus a graph must be sparse enough to justify an adjacency list representation. -Besides the space trade-off, the different data structures also facilitate different operations. Finding all vertices adjacent to a given vertex in an adjacency list is as simple as reading the list. With an adjacency matrix, an entire row must instead be scanned, which takes O(| V |) time. Whether there is an edge between two given vertices can be determined at once with an adjacency matrix, while requiring time proportional to the minimum degree of the two vertices with the adjacency list.
Binary decision diagram (BDD or branching program)
Graph data structure. a data structure that is used to represent a Boolean function. On a more abstract level, BDDs can be considered as a compressed representation of sets or relations. Unlike other compressed representations, operations are performed directly on the compressed representation, i.e. without decompression. Other data structures used to represent a Boolean function include negation normal form (NNF), and propositional directed acyclic graph (PDAG). |Definition| A Boolean function can be represented as a rooted, directed, acyclic graph, which consists of several decision nodes and terminal nodes. There are two types of terminal nodes called 0-terminal and 1-terminal. Each decision node N is labeled by Boolean variable V_N and has two child nodes called low child and high child. The edge from node V_N to a low (or high) child represents an assignment of V_N to 0 (resp. 1). Such a BDD is called 'ordered' if different variables appear in the same order on all paths from the root. A BDD is said to be 'reduced' if the following two rules have been applied to its graph: 1.Merge any isomorphic subgraphs. 2.Eliminate any node whose two children are isomorphic. -In popular usage, the term BDD almost always refers to Reduced Ordered Binary Decision Diagram (ROBDD in the literature, used when the ordering and reduction aspects need to be emphasized). The advantage of an ROBDD is that it is canonical (unique) for a particular function and variable order.[1] This property makes it useful in functional equivalence checking and other operations like functional technology mapping. -A path from the root node to the 1-terminal represents a (possibly partial) variable assignment for which the represented Boolean function is true. As the path descends to a low (or high) child from a node, then that node's variable is assigned to 0 (resp. 1).
Graph-structured stack
Graph data structure. a directed acyclic graph (DAG) where each directed path represents a stack. -The graph-structured stack is an essential part of Tomita's algorithm, where it replaces the usual stack of a pushdown automaton. This allows the algorithm to encode the nondeterministic choices in parsing an ambiguous grammar, sometimes with greater efficiency. |Tomita's algorithm| -A GLR parser (GLR standing for "generalized LR", where L stands for "left-to-right" and R stands for "rightmost (derivation)") is an extension of an LR parser algorithm to handle nondeterministic and ambiguous grammars. The theoretical foundation was provided in a 1974 paper[1] by Bernard Lang (along with other general Context-Free parsers such as GLL). It describes a systematic way to produce such algorithms, and provides uniform results regarding correctness proofs, complexity with respect to grammar classes, and optimization techniques. The first actual implementation of GLR was described in a 1984 paper by Masaru Tomita, it has also been referred to as a "parallel parser". Tomita presented five stages in his original work,[2] though in practice it is the second stage that is recognized as the GLR parser. -Though the algorithm has evolved since its original forms, the principles have remained intact. As shown by an earlier publication,[3] Lang was primarily interested in more easily used and more flexible parsers for extensible programming languages. Tomita's goal was to parse natural language text thoroughly and efficiently. Standard LR parsers cannot accommodate the nondeterministic and ambiguous nature of natural language, and the GLR algorithm can. |Advantages of GLR| Recognition using the GLR algorithm has the same worst-case time complexity as the CYK algorithm and Earley algorithm: O(n3).[citation needed] However, GLR carries two additional advantages: 1.The time required to run the algorithm is proportional to the degree of nondeterminism in the grammar: on deterministic grammars the GLR algorithm runs in O(n) time (this is not true of the Earley[citation needed] and CYK algorithms, but the original Earley algorithms can be modified to ensure it) 2.The GLR algorithm is "online" - that is, it consumes the input tokens in a specific order and performs as much work as possible after consuming each token. -In practice, the grammars of most programming languages are deterministic or "nearly deterministic", meaning that any nondeterminism is usually resolved within a small (though possibly unbounded) number of tokens. Compared to other algorithms capable of handling the full class of context-free grammars (such as Earley or CYK), the GLR algorithm gives better performance on these "nearly deterministic" grammars, because only a single stack will be active during the majority of the parsing process. -GLR can be combined with the LALR(1) algorithm, in a hybrid parser, allowing still higher performance
And-inverter graph (AIG)
Graph data structure. a directed, acyclic graph that represents a structural implementation of the logical functionality of a circuit or network. An AIG consists of two-input nodes representing logical conjunction, terminal nodes labeled with variable names, and edges optionally containing markers indicating logical negation. This representation of a logic function is rarely structurally efficient for large circuits, but is an efficient representation for manipulation of boolean functions. Typically, the abstract graph is represented as a data structure in software. -Conversion from the network of logic gates to AIGs is fast and scalable. It only requires that every gate be expressed in terms of AND gates and inverters. This conversion does not lead to unpredictable increase in memory use and runtime. This makes the AIG an efficient representation in comparison with either the binary decision diagram (BDD) or the "sum-of-product" (ΣoΠ) form,[citation needed] that is, the canonical form in Boolean algebra known as the disjunctive normal form (DNF). The BDD and DNF may also be viewed as circuits, but they involve formal constraints that deprive them of scalability. For example, ΣoΠs are circuits with at most two levels while BDDs are canonical, that is, they require that input variables be evaluated in the same order on all paths.
Scene graph
Graph data structure. a general data structure commonly used by vector-based graphics editing applications and modern computer games, which arranges the logical and often (but not necessarily) spatial representation of a graphical scene. Examples of such programs include Acrobat 3D, Adobe Illustrator, AutoCAD, CorelDRAW, OpenSceneGraph, OpenSG, VRML97, X3D, Hoops and Open Inventor. -A scene graph is a collection of nodes in a graph or tree structure. A tree node (in the overall tree structure of the scene graph) may have many children but often only a single parent, with the effect of a parent applied to all its child nodes; an operation performed on a group automatically propagates its effect to all of its members. In many programs, associating a geometrical transformation matrix (see also transformation and matrix) at each group level and concatenating such matrices together is an efficient and natural way to process such operations. A common feature, for instance, is the ability to group related shapes/objects into a compound object that can then be moved, transformed, selected, etc. as easily as a single object. -It also happens that in some scene graphs, a node can have a relation to any node including itself, or at least an extension that refers to another node (for instance Pixar's PhotoRealistic RenderMan because of its usage of Reyes rendering algorithm, or Adobe Systems's Acrobat 3D for advanced interactive manipulation). -The term scene graph is sometimes confused with Canvas (GUI), since some canvas implementations include scene graph functionality.
Hypergraph
Graph data structure. a generalization of a graph in which an edge can connect any number of vertices. Formally, a hypergraph {\displaystyle H} H is a pair {\displaystyle H=(X,E)} H = (X,E) where {\displaystyle X} X is a set of elements called nodes or vertices, and {\displaystyle E} E is a set of non-empty subsets of {\displaystyle X} X called hyperedges or edges. Therefore, {\displaystyle E} E is a subset of {\displaystyle {\mathcal {P}}(X)\setminus \{\emptyset \}} \mathcal{P}(X) \setminus\{\emptyset\}, where {\displaystyle {\mathcal {P}}(X)} {\mathcal {P}}(X) is the power set of {\displaystyle X} X. -While graph edges are pairs of nodes, hyperedges are arbitrary sets of nodes, and can therefore contain an arbitrary number of nodes. However, it is often desirable to study hypergraphs where all hyperedges have the same cardinality; a k-uniform hypergraph is a hypergraph such that all its hyperedges have size k. (In other words, one such hypergraph is a collection of sets, each such set a hyperedge connecting k nodes.) So a 2-uniform hypergraph is a graph, a 3-uniform hypergraph is a collection of unordered triples, and so on. -A hypergraph is also called a set system or a family of sets drawn from the universal set X. The difference between a set system and a hypergraph is in the questions being asked. Hypergraph theory tends to concern questions similar to those of graph theory, such as connectivity and colorability, while the theory of set systems tends to ask non-graph-theoretical questions, such as those of Sperner theory. -There are variant definitions; sometimes edges must not be empty, and sometimes multiple edges, with the same set of nodes, are allowed. -Hypergraphs can be viewed as incidence structures. In particular, there is a bipartite "incidence graph" or "Levi graph" corresponding to every hypergraph, and conversely, most, but not all, bipartite graphs can be regarded as incidence graphs of hypergraphs. -Hypergraphs have many other names. In computational geometry, a hypergraph may sometimes be called a range space and then the hyperedges are called ranges.[1] In cooperative game theory, hypergraphs are called simple games (voting games); this notion is applied to solve problems in social choice theory. In some literature edges are referred to as hyperlinks or connectors.[2] -Special kinds of hypergraphs include, besides k-uniform ones, clutters, where no edge appears as a subset of another edge; and abstract simplicial complexes, which contain all subsets of every edge. -The collection of hypergraphs is a category with hypergraph homomorphisms as morphisms.
Multigraph
Graph data structure. a graph which is permitted to have multiple edges (also called parallel edges[1]), that is, edges that have the same end nodes. Thus two vertices may be connected by more than one edge. -There are two distinct notions of multiple edges: 1.Edges without own identity: The identity of an edge is defined solely by the two nodes it connects. In this case, the term "multiple edges" means that the same edge can occur several times between these two nodes. 2.Edges with own identity: Edges are primitive entities just like nodes. When multiple edges connect two nodes, these are different edges. A multigraph is different from a hypergraph, which is a graph in which an edge can connect any number of nodes, not just two. -For some authors, the terms pseudograph and multigraph are synonymous. For others, a pseudograph is a multigraph with loops.
Directed graph (or digraph)
Graph data structure. a graph, or set of vertices connected by edges, where the edges have a direction associated with them. In formal terms, a directed graph is an ordered pair G = (V, A) (sometimes G = (V, E)) where[1] 1.V is a set whose elements are called vertices, nodes, or points; 2.A is a set of ordered pairs of vertices, called arrows, directed edges (sometimes simply edges with the corresponding set named E instead of A), directed arcs, or directed lines. -It differs from an ordinary or undirected graph, in that the latter is defined in terms of unordered pairs of vertices, which are usually called edges, arcs, or lines. -A directed graph is called a simple digraph if it has no multiple arrows (two or more edges that connect the same two vertices in the same direction) and no loops (edges that connect vertices to themselves). A directed graph is called a directed multigraph or multidigraph if it may have multiple arrows (and sometimes loops). In the latter case the arrow set forms a multiset, rather than a set, of ordered pairs of vertices.
Adjacency matrix
Graph data structure. a square matrix used to represent a finite graph. The elements of the matrix indicate whether pairs of vertices are adjacent or not in the graph. -In the special case of a finite simple graph, the adjacency matrix is a (0,1)-matrix with zeros on its diagonal. If the graph is undirected, the adjacency matrix is symmetric. The relationship between a graph and the eigenvalues and eigenvectors of its adjacency matrix is studied in spectral graph theory. -The adjacency matrix should be distinguished from the incidence matrix (a matrix that shows the relationship between two classes of objects) for a graph, a different matrix representation whose elements indicate whether vertex-edge pairs are incident or not.
Zero-suppressed decision diagram (ZSDD or ZDD)
Graph data structure. a type of binary decision diagram (BDD) where instead of nodes being introduced when the positive and the negative part are different, they are introduced when positive part is different from constant 0. A zero-suppressed decision diagram is also commonly referred to as a zero-suppressed binary decision diagram (ZBDD). -They are useful when dealing with functions that are almost everywhere 0. -In a 2011 talk "All Questions Answered",[1] Donald Knuth referred to ZDD as the most beautiful construct in computer science. -In The Art of Computer Programming, volume 4, Knuth introduces his Simpath algorithm for constructing a ZDD representing all simple paths between two vertices in a graph.
Koorde
Hash data structure. In peer-to-peer networks, Koorde is a Distributed hash table (DHT) system based on the Chord DHT and the De Bruijn graph (De Bruijn sequence). Inheriting the simplicity of Chord, Koorde meets O(log n) hops per node (where n is the number of nodes in the DHT), and O(log n/ log log n) hops per lookup request with O(log n) neighbors per node. -The Chord concept is based on a wide range of identifiers (e.g. 2^160) in a structure of a ring where an identifier can stand for both node and data. Node-successor is responsible for the whole range of IDs between itself and its predecessor.
Distributed hash table (DHT)
Hash data structure. a class of a decentralized distributed system that provides a lookup service similar to a hash table: (key, value) pairs are stored in a DHT, and any participating node can efficiently retrieve the value associated with a given key. Responsibility for maintaining the mapping from keys to values is distributed among the nodes, in such a way that a change in the set of participants causes a minimal amount of disruption. This allows a DHT to scale to extremely large numbers of nodes and to handle continual node arrivals, departures, and failures. -DHTs form an infrastructure that can be used to build more complex services, such as anycast, cooperative Web caching, distributed file systems, domain name services, instant messaging, multicast, and also peer-to-peer file sharing and content distribution systems. Notable distributed networks that use DHTs include BitTorrent's distributed tracker, the Coral Content Distribution Network, the Kad network, the Storm botnet, the Tox instant messenger, Freenet and the YaCy search engine.
Double Hashing
Hash data structure. a computer programming technique used in hash tables to resolve hash collisions (a situation that occurs when two distinct pieces of data have the same hash value, checksum, fingerprint, or cryptographic digest. Collisions are unavoidable whenever members of a very large set (such as all possible person names, or all possible computer files) are mapped to a relatively short bit string. This is merely an instance of the pigeonhole principle, which states that if n items are put into m containers, with n > m, then at least one container must contain more than one item), in cases when two different values to be searched for produce the same hash key. It is a popular collision-resolution technique in open-addressed hash tables. Double hashing is implemented in many popular libraries. -Like linear probing, it uses one hash value as a starting point and then repeatedly steps forward an interval until the desired value is located, an empty location is reached, or the entire table has been searched; but this interval is decided using a second, independent hash function (hence the name double hashing). Unlike linear probing and quadratic probing, the interval depends on the data, so that even values mapping to the same location have different bucket sequences; this minimizes repeated collisions and the effects of clustering. -Given two randomly, uniformly, and independently selected hash functions h_{1} and h_{2}, the ith location in the bucket sequence for value k in a hash table T is: h(i,k)=(h_1(k) + i * h_2(k)) mod |T|. Generally, h_{1} and h_{2} are selected from a set of universal hash functions.
Prefix hash tree (PHT)
Hash data structure. a distributed data structure that enables more sophisticated queries over a distributed hash table (DHT). The prefix hash tree uses the lookup interface of a DHT to construct a trie-based data structure that is both efficient (updates are doubly logarithmic in the size of the domain being indexed), and resilient (the failure of any given node in a prefix hash tree does not affect the availability of data stored at other nodes).
Rolling hash (also known as recursive hashing or rolling checksum)
Hash data structure. a hash function where the input is hashed in a window that moves through the input. -A few hash functions allow a rolling hash to be computed very quickly—the new hash value is rapidly calculated given only the old hash value, the old value removed from the window, and the new value added to the window—similar to the way a moving average function can be computed much more quickly than other low-pass filters. -One of the main applications is the Rabin-Karp string search algorithm, which uses the rolling hash described below. -Another popular application is rsync program which uses a checksum based on Mark Adler's adler-32 as its rolling hash. -Another application is the Low Bandwidth Network Filesystem (LBFS), which uses a Rabin fingerprint as its rolling hash. -At best, rolling hash values are pairwise independent[1] or strongly universal. They cannot be 3-wise independent, for example.
Count-Min sketch (CM sketch)
Hash data structure. a probabilistic data structure that serves as a frequency table of events in a stream of data. It uses hash functions to map events to frequencies, but unlike a hash table uses only sub-linear space, at the expense of overcounting some events due to collisions. The count-min sketch was invented in 2003 by Graham Cormode and S. Muthu Muthukrishnan[1] and described by them in a 2005 paper.[2] Count-min sketches are essentially the same data structure as the counting Bloom filters introduced in 1998 by Fan et al.[3] However, they are used differently and therefore sized differently: a count-min sketch typically has a sublinear number of cells, related to the desired approximation quality of the sketch, while a counting Bloom filter is more typically sized to match the number of elements in the set.
Dynamic perfect hash table
Hash data structure. a programming technique for resolving collisions in a hash table data structure.[1][2][3] While more memory-intensive than its hash table counterparts, this technique is useful for situations where fast queries, insertions, and deletions must be made on a large set of elements.
Bloom filter
Hash data structure. a space-efficient probabilistic data structure, conceived by Burton Howard Bloom in 1970, that is used to test whether an element is a member of a set. False positive matches are possible, but false negatives are not, thus a Bloom filter has a 100% recall rate. In other words, a query returns either "possibly in set" or "definitely not in set". Elements can be added to the set, but not removed (though this can be addressed with a "counting" filter). The more elements that are added to the set, the larger the probability of false positives. Bloom proposed the technique for applications where the amount of source data would require an impractically large amount of memory if "conventional" error-free hashing techniques were applied. He gave the example of a hyphenation algorithm for a dictionary of 500,000 words, out of which 90% follow simple hyphenation rules, but the remaining 10% require expensive disk accesses to retrieve specific hyphenation patterns. With sufficient core memory, an error-free hash could be used to eliminate all unnecessary disk accesses; on the other hand, with limited core memory, Bloom's technique uses a smaller hash area but still eliminates most unnecessary accesses. For example, a hash area only 15% of the size needed by an ideal error-free hash still eliminates 85% of the disk accesses, an 85-15 form of the Pareto principle.[1] More generally, fewer than 10 bits per element are required for a 1% false positive probability, independent of the size or number of elements in the set.
Hash array mapped trie (HAMT)
Hash data structure. an implementation of an associative array that combines the characteristics of a hash table and an array mapped trie.[1] It is a refined version of the more general notion of a hash tree. |Operation| A HAMT is an array mapped trie where the keys are first hashed in order to ensure an even distribution of keys and a constant key length. -In a typical implementation of HAMT's array mapped trie, each node contains a table with some fixed number N of slots with each slot containing either a nil pointer or a pointer to another node. N is commonly 32. As allocating space for N pointers for each node would be expensive, each node instead contains a bitmap which is N bits long where each bit indicates the presence of a non-nil pointer. This is followed by an array of pointers equal in length to the number of ones in the bitmap, (its Hamming weight). |Advantages of HAMTs| The hash array mapped trie achieves almost hash table-like speed while using memory much more economically. Also, a hash table may have to be periodically resized, an expensive operation, whereas HAMTs grow dynamically. Generally, HAMT performance is improved by a larger root table with some multiple of N slots; some HAMT variants allow the root to grow lazily[1] with negligible impact on performance.
Hash list
Hash data structure. typically a list of hashes of the data blocks in a file or set of files. Lists of hashes are used for many different purposes, such as fast table lookup (hash tables) and distributed databases (distributed hash tables). This article covers hash lists that are used to guarantee data integrity. -A hash list is an extension of the old concept of hashing an item (for instance, a file). A hash list is usually sufficient for most needs, but a more advanced form of the concept is a hash tree. -Hash lists can be used to protect any kind of data stored, handled and transferred in and between computers. An important use of hash lists is to make sure that data blocks received from other peers in a peer-to-peer network are received undamaged and unaltered, and to check that the other peers do not "lie" and send fake blocks. -Usually a cryptographic hash function such as SHA-256 is used for the hashing. If the hash list only needs to protect against unintentional damage less secure checksums such as CRCs can be used. -Hash lists are better than a simple hash of the entire file since, in the case of a data block being damaged, this is noticed, and only the damaged block needs to be redownloaded. With only a hash of the file, many undamaged blocks would have to be redownloaded, and the file reconstructed and tested until the correct hash of the entire file is obtained. Hash lists also protect against nodes that try to sabotage by sending fake blocks, since in such a case the damaged block can be acquired from some other source.
Lightmap
Other data structure. a data structure used in lightmapping, a form of surface caching in which the brightness of surfaces in a virtual scene is pre-calculated and stored in texture maps for later use. Lightmaps are most commonly applied to static objects in realtime 3d graphics applications, such as video games, in order to provide lighting effects such as global illumination at a relatively low computational cost.
Routing table (routing information base or RIB)
Other data structure. in computer networking, a routing table is a data table stored in a router or a networked computer that lists the routes to particular network destinations, and in some cases, metrics (distances) associated with those routes. The routing table contains information about the topology of the network immediately around it. The construction of routing tables is the primary goal of routing protocols. Static routes are entries made in a routing table by non-automatic means and which are fixed rather than being the result of some network topology "discovery" procedure.
Associative array (or map, symbol table, or dictionary)
abstract data type. an abstract data type composed of a collection of (key, value) pairs, such that each possible key appears at most once in the collection. -Operations associated with this data type allow: the addition of a pair to the collection the removal of a pair from the collection the modification of an existing pair the lookup of a value associated with a particular key -The dictionary problem is a classic computer science problem: the task of designing a data structure that maintains a set of data during 'search', 'delete', and 'insert' operations. The two major solutions to the dictionary problem are a hash table or a search tree. In some cases it is also possible to solve the problem using directly addressed arrays, binary search trees, or other more specialized structures. -Many programming languages include associative arrays as primitive data types, and they are available in software libraries for many others. Content-addressable memory is a form of direct hardware-level support for associative arrays. -Associative arrays have many applications including such fundamental programming patterns as memoization and the decorator pattern.
Dijkstra's algorithm
an algorithm for finding the shortest paths between nodes in a graph, which may represent, for example, road networks. It was conceived by computer scientist Edsger W. Dijkstra in 1956 and published three years later.[1][2] The algorithm exists in many variants; Dijkstra's original variant found the shortest path between two nodes,[2] but a more common variant fixes a single node as the "source" node and finds shortest paths from the source to all other nodes in the graph, producing a shortest-path tree. For a given source node in the graph, the algorithm finds the shortest path between that node and every other.[3]:196-206 It can also be used for finding the shortest paths from a single node to a single destination node by stopping the algorithm once the shortest path to the destination node has been determined. For example, if the nodes of the graph represent cities and edge path costs represent driving distances between pairs of cities connected by a direct road, Dijkstra's algorithm can be used to find the shortest route between one city and all other cities. As a result, the shortest path algorithm is widely used in network routing protocols, most notably IS-IS and Open Shortest Path First (OSPF). It is also employed as a subroutine in other algorithms such as Johnson's. Dijkstra's original algorithm does not use a min-priority queue and runs in time {\displaystyle O(|V|^{2})} O(|V|^{2}) (where {\displaystyle |V|} |V| is the number of nodes). The idea of this algorithm is also given in (Leyzorek et al. 1957). The implementation based on a min-priority queue implemented by a Fibonacci heap and running in {\displaystyle O(|E|+|V|\log |V|)} O(|E|+|V|\log |V|) (where {\displaystyle |E|} |E| is the number of edges) is due to (Fredman & Tarjan 1984). This is asymptotically the fastest known single-source shortest-path algorithm for arbitrary directed graphs with unbounded non-negative weights. However, specialized cases (such as bounded/integer weights, directed acyclic graphs etc.) can indeed be improved further as detailed in § Specialized variants. In some fields, artificial intelligence in particular, Dijkstra's algorithm or a variant of it is known as uniform-cost search and formulated as an instance of the more general idea of best-first search
Queap
b-tree data structure. a priority queue data structure. The data structure allows insertions and deletions of arbitrary elements, as well as retrieval of the highest-priority element. Each deletion takes amortized time logarithmic in the number of items that have been in the structure for a longer time than the removed item. Insertions take constant amortized time. -The data structure consists of a doubly linked list and a 2-4 tree data structure, each modified to keep track of its minimum-priority element. The basic operation of the structure is to keep newly inserted elements in the doubly linked list, until a deletion would remove one of the list items, at which point they are all moved into the 2-4 tree. The 2-4 tree stores its elements in insertion order, rather than the more conventional priority-sorted order. -Both the data structure and its name were devised by John Iacono and Stefan Langerman.
Bx-tree
b-tree data structure. a query and update efficient B+ tree-based index structure for moving objects. -The base structure of the Bx-tree is a B+ tree in which the internal nodes serve as a directory, each containing a pointer to its right sibling. In the earlier version of the Bx-tree,[1] the leaf nodes contained the moving-object locations being indexed and corresponding index time. In the optimized version,[2] each leaf node entry contains the id, velocity, single-dimensional mapping value and the latest update time of the object. The fanout is increased by not storing the locations of moving objects, as these can be derived from the mapping values.
Dancing tree
b-tree data structure. a tree data structure similar to B+ trees. It was invented by Hans Reiser, for use by the Reiser4 file system. As opposed to self-balancing binary search trees that attempt to keep their nodes balanced at all times, dancing trees only balance their nodes when flushing data to a disk (either because of memory constraints or because a transaction has completed).[1] The idea behind this is to speed up file system operations by delaying optimization of the tree and only writing to disk when necessary, as writing to disk is thousands of times slower than writing to memory. Also, because this optimization is done less often than with other tree data structures, the optimization can be more extensive. In some sense, this can be considered to be a self-balancing binary search tree that is optimized for storage on a slow medium, in that the on-disc form will always be balanced but will get no mid-transaction writes; doing so eases the difficulty (at the time) of adding and removing nodes, and instead performs these (slow) rebalancing operations at the same time as the (much slower) write to the storage medium. However, a (negative) side effect of this behavior is witnessed in cases of unexpected shutdown, incomplete data writes, and other occurrences that may prevent the final (balanced) transaction from completing. In general, dancing trees will pose a greater difficulty for data recovery from incomplete transactions than a normal tree; though this can be addressed by either adding extra transaction logs or developing an algorithm to locate data on disk not previously present, then going through with the optimizations once more before continuing with any other pending operations/transactions.
2-3 tree
b-tree data structure. a tree data structure, where every node with children (internal node) has either two children (2-node) and one data element or three children (3-nodes) and two data elements. According to Knuth, "a B-tree of order 3 is a 2-3 tree." Nodes on the outside of the tree (leaf nodes) have no children and one or two data elements.[1][2] 2−3 trees were invented by John Hopcroft in 1970
B+ tree
b-tree data structure. an n-ary tree with a variable but often large number of children per node. A B+ tree consists of a root, internal nodes and leaves.[1] The root may be either a leaf or a node with two or more children.[2] -A B+ tree can be viewed as a B-tree in which each node contains only keys (not key-value pairs), and to which an additional level is added at the bottom with linked leaves. -The primary value of a B+ tree is in storing data for efficient retrieval in a block-oriented storage context — in particular, filesystems. This is primarily because unlike binary search trees, B+ trees have very high fanout (number of pointers to child nodes in a node,[1] typically on the order of 100 or more), which reduces the number of I/O operations required to find an element in the tree. -The ReiserFS, NSS, XFS, JFS, ReFS, and BFS filesystems all use this type of tree for metadata indexing; BFS also uses B+ trees for storing directories. NTFS uses B+ trees for directory indexing. EXT4 uses extent trees (a modified B+ tree data structure) for file extent indexing.[3] Relational database management systems such as IBM DB2,[4] Informix,[4] Microsoft SQL Server,[4] Oracle 8,[4] Sybase ASE,[4] and SQLite[5] support this type of tree for table indices. Key-value database management systems such as CouchDB[6] and Tokyo Cabinet[7] support this type of tree for data access.
Fusion tree
b-tree data structure. multiway tree data structure. a type of tree data structure that implements an associative array on w-bit integers. When operating on a collection of n key-value pairs, it uses O(n) space and performs searches in O(logw n) time, which is asymptotically faster than a traditional self-balancing binary search tree, and also better than the van Emde Boas tree for large values of w. It achieves this speed by exploiting certain constant-time operations that can be done on a machine word. Fusion trees were invented in 1990 by Michael Fredman and Dan Willard.[1] -Several advances have been made since Fredman and Willard's original 1990 paper. In 1999[2] it was shown how to implement fusion trees under a model of computation in which all of the underlying operations of the algorithm belong to AC0, a model of circuit complexity that allows addition and bitwise Boolean operations but disallows the multiplication operations used in the original fusion tree algorithm. A dynamic version of fusion trees using hash tables was proposed in 1996[3] which matched the original structure's O(logw n) runtime in expectation. Another dynamic version using exponential tree was proposed in 2007[4] which yields worst-case runtimes of O(logw n + log log u) per operation, where u is the size of the largest key. It remains open whether dynamic fusion trees can achieve O(logw n) per operation with high probability. -A fusion tree is essentially a B-tree with branching factor of w1/5 (any small exponent is also possible), which gives it a height of O(logw n). To achieve the desired runtimes for updates and queries, the fusion tree must be able to search a node containing up to w1/5 keys in constant time. This is done by compressing ("sketching") the keys so that all can fit into one machine word, which in turn allows comparisons to be done in parallel.
Self-balancing binary search tree (or height-balanced binary search tree)
binary tree data structure. any node-based binary search tree that automatically keeps its height (maximal number of levels below the root) small in the face of arbitrary item insertions and deletions -These structures provide efficient implementations for mutable ordered lists, and can be used for other abstract data structures such as associative arrays, priority queues and sets. -The red-black tree, which is a type of self-balancing binary search tree, was called symmetric binary B-tree[2] and was renamed but can still be confused with the generic concept of self-balancing binary search tree because of the initials. -popular implementations: 2-3 tree AA tree AVL tree Red-black tree Scapegoat tree Splay tree Treap
Leonardo Heap (smoothsort)
heap data structure. a comparison-based sorting algorithm. A variant of heapsort, it was invented and published by Edsger Dijkstra in 1981.[1] Like heapsort, smoothsort is an in-place algorithm with an upper bound of O(n log n),[2] but it is not a stable sort.[3][self-published source?] The advantage of smoothsort is that it comes closer to O(n) time if the input is already sorted to some degree, whereas heapsort averages O(n log n) regardless of the initial sorted state.
Beap (bi-parental heap)
heap data structure. a data structure where a node usually has two parents (unless it is the first or last on a level) and two children (unless it is on the last level). Unlike a heap, a beap allows sublinear search. The beap was introduced by Ian Munro and Hendra Suwanda. A related data structure is the Young tableau.
2-3 heap
heap data structure. a data structure, a variation on the heap, designed by Tadao Takaoka in 1999. The structure is similar to the Fibonacci heap, and borrows from the 2-3 tree. -Time costs for some common heap operations are: 1) Delete-min takes O(log(n)) amortized time. 2) Decrease-key takes constant amortized time. 3) Insertion takes constant amortized time.
Binary heap
heap data structure. a heap data structure that takes the form of a binary tree. Binary heaps are a common way of implementing priority queues -A binary heap is defined as a binary tree with two additional constraints: 1. Shape property: a binary heap is a complete binary tree; that is, all levels of the tree, except possibly the last one (deepest) are fully filled, and, if the last level of the tree is not complete, the nodes of that level are filled from left to right. 2. Heap property: the key stored in each node is either greater than or equal to or less than or equal to the keys in the node's children, according to some total order. -Heaps where the parent key is greater than or equal to (≥) the child keys are called max-heaps; those where it is less than or equal to (≤) are called min-heaps. Efficient (logarithmic-time) algorithms are known for the two operations needed to implement a priority queue on a binary heap: inserting an element, and removing the smallest (largest) element from a min-heap (max-heap). Binary heaps are also commonly employed in the heapsort sorting algorithm, which is an in-place algorithm owing to the fact that binary heaps can be implemented as an implicit data structure, storing keys in an array and using their relative positions within that array to represent child-parent relationships.
Binomial heap
heap data structure. a heap similar to a binary heap but also supports quick merging of two heaps. This is achieved by using a special tree structure. It is important as an implementation of the mergeable heap abstract data type (also called meldable heap), which is a priority queue supporting merge operation. -A binomial heap is implemented as a collection of binomial trees (compare with a binary heap, which has a shape of a single binary tree), which are defined recursively as follows: 1. A binomial tree of order 0 is a single node 2. A binomial tree of order k has a root node whose children are roots of binomial trees of orders k−1, k−2, ..., 2, 1, 0 (in this order). -A binomial tree of order k has 2k nodes, height k. -Because of its unique structure, a binomial tree of order k can be constructed from two trees of order k−1 trivially by attaching one of them as the leftmost child of the root of the other tree. This feature is central to the merge operation of a binomial heap, which is its major advantage over other conventional heaps. -The name comes from the shape: a binomial tree of order n has matrix parenthesis {n d} nodes at depth d.
D-ary heap (d-heap)
heap data structure. a priority queue data structure, a generalization of the binary heap in which the nodes have d children instead of 2. Thus, a binary heap is a 2-heap, and a ternary heap is a 3-heap. According to Tarjan[2] and Jensen et al.,[4] d-ary heaps were invented by Donald B. Johnson in 1975.[1] This data structure allows decrease priority operations to be performed more quickly than binary heaps, at the expense of slower delete minimum operations. This tradeoff leads to better running times for algorithms such as Dijkstra's algorithm in which decrease priority operations are more common than delete min operations.[1][5] Additionally, d-ary heaps have better memory cache behavior than a binary heap, allowing them to run more quickly in practice despite having a theoretically larger worst-case running time.[6][7] Like binary heaps, d-ary heaps are an in-place data structure that uses no additional storage beyond that needed to store the array of items in the heap
Leftist heap (leftist heap)
heap data structure. a priority queue implemented with a variant of a binary heap. Every node has an s-value which is the distance to the nearest leaf. In contrast to a binary heap, a leftist tree attempts to be very unbalanced. In addition to the heap property, leftist trees are maintained so the right descendant of each node has the lower s-value. -The height-biased leftist tree was invented by Clark Allan Crane.[1] The name comes from the fact that the left subtree is usually taller than the right subtree. -When inserting a new node into a tree, a new one-node tree is created and merged into the existing tree. To delete a minimum item, we remove the root and the left and right sub-trees are then merged. Both these operations take O(log n) time. For insertions, this is slower than binomial heaps which support insertion in amortized constant time, O(1) and O(log n) worst-case. -Leftist trees are advantageous because of their ability to merge quickly, compared to binary heaps which take Θ(n). In almost all cases, the merging of skew heaps has better performance. However merging leftist heaps has worst-case O(log n) complexity while merging skew heaps has only amortized O(log n) complexity.
Pairing heap
heap data structure. a type of heap data structure with relatively simple implementation and excellent practical amortized performance, introduced by Michael Fredman, Robert Sedgewick, Daniel Sleator, and Robert Tarjan in 1986.[1] Pairing heaps are heap-ordered multiway tree structures, and can be considered simplified Fibonacci heaps. They are considered a "robust choice" for implementing such algorithms as Prim's MST algorithm,[2] and support the following operations (assuming a min-heap): 1. find-min: simply return the top element of the heap. 2. merge: compare the two root elements, the smaller remains the root of the result, the larger element and its subtree is appended as a child of this root. 3. insert: create a new heap for the inserted element and merge into the original heap. 4. decrease-key (optional): remove the subtree rooted at the key to be decreased, replace the key with a smaller key, then merge the result back into the heap. 5. delete-min: remove the root and merge its subtrees. Various strategies are employed. The analysis of pairing heaps' time complexity was initially inspired by that of splay trees.[1] The amortized time per delete-min is O(log n), and the operations find-min, merge, and insert run in O(1) amortized time.[3] -Determining the precise asymptotic running time of pairing heaps when a decrease-key operation is needed has turned out to be difficult. Initially, the time complexity of this operation was conjectured on empirical grounds to be O(1),[4] but Fredman proved that the amortized time per decrease-key is at least Omega (\log \log n) for some sequences of operations.[5] Using a different amortization argument, Pettie then proved that insert, meld, and decrease-key all run in O(2^{{2{\sqrt {\log \log n}}}}) amortized time, which is o(\log n).[6] Elmasry later introduced a variant of pairing heaps for which decrease-key runs in O(\log \log n) amortized time and with all other operations matching Fibonacci heaps,[7] but no tight Theta (\log \log n) bound is known for the original data structure.[6][3] Moreover, it is an open question whether a o(\log n) amortized time bound for decrease-key and a O(1) amortized time bound for insert can be achieved simultaneously.[8] -Although this is worse than other priority queue algorithms such as Fibonacci heaps, which perform decrease-key in O(1) amortized time, the performance in practice is excellent. Stasko and Vitter,[4] Moret and Shapiro,[9] and Larkin, Sen, and Tarjan[8] conducted experiments on pairing heaps and other heap data structures. They concluded that pairing heaps are often faster in practice than array-based binary heaps and d-ary heaps, and almost always faster in practice than other pointer-based heaps, including data structures like Fibonacci heaps that are theoretically more efficient.
AF-heap
heap data structure. a type of priority queue for integer data, an extension of the fusion tree using an atomic heap proposed by M. L. Fredman and D. E. Willard.[1] Using an AF-heap, it is possible to perform m insert or decrease-key operations and n delete-min operations on machine-integer keys in time O(m + n log n / log log n). This allows Dijkstra's algorithm to be performed in the same O(m + n log n / log log n) time bound on graphs with n edges and m vertices, and leads to a linear time algorithm for minimum spanning trees, with the assumption for both problems that the edge weights of the input graph are machine integers in the transdichotomous model.
Soft heap
heap data structure. a variant on the simple heap data structure that has constant amortized time for 5 types of operations. This is achieved by carefully "corrupting" (increasing) the keys of at most a certain number of values in the heap. The constant time operations are: 1. create(S): Create a new soft heap 2. insert(S, x): Insert an element into a soft heap 3. meld(S, S' ): Combine the contents of two soft heaps into one, destroying both 4. delete(S, x): Delete an element from a soft heap 5. findmin(S): Get the element with minimum key in the soft heap Other heaps such as Fibonacci heaps achieve most of these bounds without any corruption, but cannot provide a constant-time bound on the critical delete operation. -The amount of corruption can be controlled by the choice of a parameter ε, but the lower this is set, the more time insertions require (O(log 1/ε) for an error rate of ε). -More precisely, the guarantee offered by the soft heap is the following: for a fixed value ε between 0 and 1/2, at any point in time there will be at most ε*n corrupted keys in the heap, where n is the number of elements inserted so far. Note that this does not guarantee that only a fixed percentage of the keys currently in the heap are corrupted: in an unlucky sequence of insertions and deletions, it can happen that all elements in the heap will have corrupted keys. Similarly, we have no guarantee that in a sequence of elements extracted from the heap with findmin and delete, only a fixed percentage will have corrupted keys: in an unlucky scenario only corrupted elements are extracted from the heap. -The soft heap was designed by Bernard Chazelle in 2000. The term "corruption" in the structure is the result of what Chazelle called "carpooling" in a soft heap. Each node in the soft heap contains a linked-list of keys and one common key. The common key is an upper bound on the values of the keys in the linked-list. Once a key is added to the linked-list, it is considered corrupted because its value is never again relevant in any of the soft heap operations: only the common keys are compared. This is what makes soft heaps "soft"; you can't be sure whether or not any particular value you put into it will be corrupted. The purpose of these corruptions is effectively to lower the information entropy of the data, enabling the data structure to break through information-theoretic barriers regarding heaps.
Weak heap
heap data structure. a variation of the a binary heap data structure. A weak max-heap on a set of n values is defined to be a binary tree with n nodes, one for each value, satisfying the following constraints: 1. The root node has no left child 2. For every node, the value associated with that node is greater or equal to than the values associated with all nodes in its right subtree. 3. The leaves of the tree have heights that are all within one of each other. A weak min-heap is similar, but reverses the required order relationship between the value at each node and in its right subtree. -In a weak max-heap, the maximum value can be found (in constant time) as the value associated with the root node; similarly, in a weak min-heap, the minimum value can be found at the root. As with binary heaps, weak heaps can support the typical operations of a priority queue data structure: insert, delete-min, delete, or decrease-key, in logarithmic time per operation. Variants of the weak heap structure allow constant amortized time insertions and decrease-keys, matching the time for Fibonacci heaps.
Enfilade (Xanadu)
multiway tree data structure. a class of tree data structures used in Project Xanadu "Green" designs of the 1970s and 1980s. Enfilades allow quick editing, versioning, retrieval and inter-comparison operations in a large, cross-linked hypertext database. The Xanadu "Gold" design starting in the 1990s used a related data structure called the Ent. -Although the principles of enfilades can be applied to any tree data structure, the particular structure used in the Xanadu system was much like a B-Tree. What distinguishes enfilades is the use of dsps and wids in the indexing information within tree nodes. -Dsps are displacements, offsets or relative keys. A dsp is the difference in key between a containing node and that of a subtree or leaf. For instance, the leaf for a grid square in a map might have a certain longitude and latitude offset relative to the larger grid represented by the subtree the leaf is part of. The key of any leaf of an enfilade is found by combining all the dsps on the path down the tree to that leaf. Dsps can also be used for other context information that is imposed top-down on entire subtrees or ranges of content at once. -Wids are widths, ranges, or bounding boxes. A wid is relative to the key of a subtree or leaf, but specifies a range of addresses covering all items within the subtree. Wids identify the interesting parts of sparsely populated address spaces. In some enfilades, the wids of subtrees under a given node can overlap, and in any case, a search for data within a range of addresses must visit any subtrees whose wids intersect the search range. Wids are combined from the leaves of the tree, upward through all layers to the root (although they are maintained incrementally). Wids can also contain other summaries such as totals or maxima of data. -The relative nature of wids and dsps allows subtrees to be rearranged within an enfilade. By changing the dsp at the top of a subtree, the keys of all the data underneath are implicitly changed. Edit operations in enfilades are performed by "cutting," or splitting the tree down relevant access paths, inserting, deleting or rearranging subtrees, and splicing the pieces back together. The cost of cutting and splicing operations is generally log-like in 1-D trees and between log-like and square-root-like in 2-D trees. -Subtrees can also be shared between trees, or be linked from multiple places within a tree. This makes the enfilade a fully persistent data structure with virtual copying and versioning of content. Each use of a subtree inherits a different context from the chain of dsps down to it. Changes to a copy create new nodes only along the cut paths, and leave the entire original in place. The overhead for a version is very small, a new version's tree is balanced and fast, and its storage cost is related only to changes from the original. -One-dimensional enfilades are intermediate between arrays' direct addressability and linked lists' ease of insertion, deletion and rearrangement. Multidimensional enfilades resemble loose, rearrangeable, versionable Quad trees, Oct trees or k-d trees.
Ternary tree
multiway tree data structure. a tree data structure in which each node has at most three child nodes, usually distinguished as "left", "mid" and "right". Nodes with children are parent nodes, and child nodes may contain references to their parents. Outside the tree, there is often a reference to the "root" node (the ancestor of all nodes), if it exists. Any node in the data structure can be reached by starting at root node and repeatedly following references to either the left, mid or right child. -Ternary trees are used to implement Ternary search trees and Ternary heaps.
SPQR-tree
multiway tree data structure. a tree data structure used in computer science, and more specifically graph algorithms, to represent the triconnected components of a graph. The SPQR tree of a graph may be constructed in linear time[1] and has several applications in dynamic graph algorithms and graph drawing. -The basic structures underlying the SPQR tree, the triconnected components of a graph, and the connection between this decomposition and the planar embeddings of a planar graph, were first investigated by Saunders Mac Lane (1937); these structures were used in efficient algorithms by several other researchers[2] prior to their formalization as the SPQR tree by Di Battista and Tamassia (1989, 1990, 1996). |Structure| An SPQR tree takes the form of an unrooted tree in which for each node x there is associated an undirected graph or multigraph Gx. The node, and the graph associated with it, may have one of four types, given the initials SPQR: 1.In an S node, the associated graph is a cycle graph with three or more vertices and edges. This case is analogous to series composition in series-parallel graphs; the S stands for "series".[3] 2.In a P node, the associated graph is a dipole graph, a multigraph with two vertices and three or more edges, the planar dual to a cycle graph. This case is analogous to parallel composition in series-parallel graphs; the P stands for "parallel".[3] 3.In a Q node, the associated graph has a single real edge. This trivial case is necessary to handle the graph that has only one edge. In some works on SPQR trees, this type of node does not appear in the SPQR trees of graphs with more than one edge; in other works, all non-virtual edges are required to be represented by Q nodes with one real and one virtual edge, and the edges in the other node types must all be virtual. 4.In an R node, the associated graph is a 3-connected graph that is not a cycle or dipole. The R stands for "rigid": in the application of SPQR trees in planar graph embedding, the associated graph of an R node has a unique planar embedding.[3] -Each edge xy between two nodes of the SPQR tree is associated with two directed virtual edges, one of which is an edge in Gx and the other of which is an edge in Gy. Each edge in a graph Gx may be a virtual edge for at most one SPQR tree edge. -An SPQR tree T represents a 2-connected graph GT, formed as follows. Whenever SPQR tree edge xy associates the virtual edge ab of Gx with the virtual edge cd of Gy, form a single larger graph by merging a and c into a single supervertex, merging b and d into another single supervertex, and deleting the two virtual edges. That is, the larger graph is the 2-clique-sum of Gx and Gy. Performing this gluing step on each edge of the SPQR tree produces the graph GT; the order of performing the gluing steps does not affect the result. Each vertex in one of the graphs Gx may be associated in this way with a unique vertex in GT, the supervertex into which it was merged. -Typically, it is not allowed within an SPQR tree for two S nodes to be adjacent, nor for two P nodes to be adjacent, because if such an adjacency occurred the two nodes could be merged into a single larger node. With this assumption, the SPQR tree is uniquely determined from its graph. When a graph G is represented by an SPQR tree with no adjacent P nodes and no adjacent S nodes, then the graphs Gx associated with the nodes of the SPQR tree are known as the triconnected components of G.
Spaghetti stack (cactus stack or saguaro stack)
multiway tree data structure. an in-tree or parent pointer tree is an N-ary tree data structure in which each node has a pointer to its parent node, but no pointers to child nodes. When used to implement a set of stacks, the structure is called a spaghetti stack, cactus stack or saguaro stack (after the saguaro, a kind of cactus).[1] Parent pointer trees are also used as disjoint-set data structures. -The structure can be regarded as a set of singly linked lists that share part of their structure, in particular, their tails. From any node, one can traverse to ancestors of the node, but not to any other node.
Link/cut tree
multiway tree data structure. data structure for representing a forest, a set of rooted trees, and offers the following operations: -Add a tree consisting of a single node to the forest. Given a node in one of the trees, disconnect it (and its subtree) from the tree of which it is part. Attach a node to another node as its child. Given a node, find the root of the tree to which it belongs. By doing this operation on two distinct nodes, one can check whether they belong to the same tree. The represented forest may consist of very deep trees, so if we represent the forest as a plain collection of parent pointer trees, it might take us a long time to find the root of a given node. However, if we represent each tree in the forest as a link/cut tree, we can find which tree an element belongs to in O(log(n)) amortized time. Moreover, we can quickly adjust the collection of link/cut trees to changes in the represented forest. In particular, we can adjust it to merge (link) and split (cut) in O(log(n)) amortized time. -Link/cut trees divide each tree in the represented forest into vertex-disjoint paths, where each path is represented by an auxiliary tree (often splay trees, though the original paper predates splay trees and thus uses biased binary search trees). The nodes in the auxiliary trees are keyed by their depth in the corresponding represented tree. In one variation, Naive Partitioning, the paths are determined by the most recently accessed paths and nodes, similar to Tango Trees. In Partitioning by Size paths are determined by the heaviest child (child with the most children) of the given node. This gives a more complicated structure, but reduces the cost of the operations from amortized O(log n) to worst case O(log n). It has uses in solving a variety of network flow problems and to jive data sets. -In the original publication, Sleator and Tarjan referred to link/cut trees as "dynamic trees", or "dynamic dyno trees".
VP-tree (vantage-point tree)
space partitioning or binary space partitioning data structure. a BSP tree which segregates data in a metric space by choosing a position in the space (the "vantage point") and dividing the data points into two partitions: those that are nearer to the vantage point than a threshold, and those that are not. By repeatedly applying this procedure to partition the data into smaller and smaller sets, a tree data structure is created where neighbors in the tree are likely to be neighbors in the space.[1] -One of its declination is called the multi-vantage point tree, or MVP tree: an abstract data structure for indexing objects from large metric spaces for similarity search queries. It uses more than one point to partition each level -The way a VP tree stores data can be represented by a circle.[5] First, understand that each node of this tree contains an input point and a radius. All the left children of a given node are the points inside the circle and all the right children of a given node are outside of the circle. The tree itself does not need to know any other information about what is being stored. All it needs is the distance function that satisfies the properties of the metric space.[5] Just imagine a circle with a radius. The left children are all located inside the circle and the right children are located outside the circle. |Advantages| 1.Instead of inferring multidimensional points for domain before the index being built, we build the index directly based on the distance.[5] Doing this, avoids pre-processing steps. 2.Updating a VP tree is relatively easy compared to the fast-map approach. For fast maps, after inserting or deleting data, there will come a time when fast-map will have to rescan itself. That takes up too much time and it is unclear to know when the rescanning will start. 3.Distance based methods are flexible. It is "able to index objects that are represented as feature vectors of a fixed number of dimensions.
UB-tree
space partitioning or binary space partitioning data structure. a balanced tree for storing and efficiently retrieving multidimensional data. It is basically a B+ tree (information only in the leaves) with records stored according to Z-order, also called Morton order. Z-order is simply calculated by bitwise interlacing the keys. -Insertion, deletion, and point query are done as with ordinary B+ trees. To perform range searches in multidimensional point data, however, an algorithm must be provided for calculating, from a point encountered in the data base, the next Z-value which is in the multidimensional search range. -The original algorithm to solve this key problem was exponential with the dimensionality and thus not feasible[1] ("GetNextZ-address"). A solution to this "crucial part of the UB-tree range query" linear with the z-address bit length has been described later.[2] This method has already been described in an older paper[3] where using Z-order with search trees has first been proposed. -proposed by Rudolf Bayer and Volker Markl
Relaxed Kd-tree
space partitioning or binary space partitioning data structure. a data structure presented as a variant of the well known K-d trees. As any other variant of K-dimensional trees, a relaxed K-dimensional tree stores a set of n-multidimensional records, each one having a unique K-dimensional key x=(x0,... ,xK−1). Different to K-d trees, in a relaxed K-d tree, the discriminants in each node are arbitrary. Relaxed K-d trees were introduced in 1998
Bin*
space partitioning or binary space partitioning data structure. a data structure which allows efficient region queries. Each time a data point falls into a bin, the frequency of that bin is increased by one.[1] -For example, if there are some axis-aligned rectangles on a 2D plane, answers the question: "Given a query rectangle, what are the rectangles intersecting it?". In the example in the figure, A, B, C, D, E and F are existing rectangles, the query with the rectangle Q should return C, D, E and F, if we define all rectangles as closed intervals. -The data structure partitions a region of the 2D plane into uniform-sized bins. The bounding box of the bins encloses all candidate rectangles to be queried. All the bins are arranged in a 2D array. All the candidates are represented also as 2D arrays. The size of a candidate's array is the number of bins it intersects. For example, in the figure, candidate B has 6 elements arranged in a 3 row by 2 column array because it intersects 6 bins in such an arrangement. Each bin contains the head of a singly linked list. If a candidate intersects a bin, it is chained to the bin's linked list. Each element in a candidate's array is a link node in the corresponding bin's linked list.
Z-order (Morton order, or Morton code)*
space partitioning or binary space partitioning data structure. a function which maps multidimensional data to one dimension while preserving locality of the data points. It was introduced in 1966 by G. M. Morton.[1] The z-value of a point in multidimensions is simply calculated by interleaving (interleave sequence is obtained by merging or shuffling two sequences) the binary representations of its coordinate values. Once the data are sorted into this ordering, any one-dimensional data structure can be used such as binary search trees, B-trees, skip lists or (with low significant bits truncated) hash tables. The resulting ordering can equivalently be described as the order one would get from a depth-first traversal of a quadtree. -The figure below shows the Z-values for the two dimensional case with integer coordinates 0 ≤ x ≤ 7, 0 ≤ y ≤ 7 (shown both in decimal and binary). Interleaving the binary coordinate values yields binary z-values as shown. Connecting the z-values in their numerical order produces the recursively Z-shaped curve. Two-dimensional Z-values are also called as quadkey ones. |applications| -linear algebra -texture mapping (Some GPUs store texture maps in Z-order to increase spatial locality of reference during texture mapped rasterization)
Implicit kd-tree
space partitioning or binary space partitioning data structure. a k-d tree defined by an implicit splitting function rather than an explicitly-stored set of splits -a k-d tree defined implicitly above a rectilinear grid. Its split planes' positions and orientations are not given explicitly but implicitly by some recursive splitting-function defined on the hyperrectangles belonging to the tree's nodes. Each inner node's split plane is positioned on a grid plane of the underlying grid, partitioning the node's grid into two subgrids. -Implicit max-k-d trees are used for ray casting isosurfaces/MIP (maximum intensity projection). The attribute assigned to each inner node is the maximal scalar value given in the subgrid belonging to the node. Nodes are not traversed if their scalar values are smaller than the searched iso-value/current maximum intensity along the ray. The low storage requirements of the implicit max kd-tree and the favorable visualization complexity of ray casting allow to ray cast (and even change the isosurface for) very large scalar fields at interactive framerates on commodity PCs. Similarly an implicit min/max kd-tree may be used to efficiently evaluate queries such as terrain line of sight
Min/max kd-tree
space partitioning or binary space partitioning data structure. a k-d tree that associates a minimum and maximum value with each of its nodes -a k-d tree with two scalar values - a minimum and a maximum - assigned to its nodes. The minimum/maximum of an inner node is equal to the minimum/maximum of its children's minima/maxima. |Construction| Min/max kd-trees may be constructed recursively. Starting with the root node, the splitting plane orientation and position is evaluated. Then the children's splitting planes and min/max values are evaluated recursively. The min/max value of the current node is simply the minimum/maximum of its children's minima/maxima. |Properties| The min/max kdtree has - besides the properties of an kd-tree - the special property that an inner node's min/max values coincide each with a min/max value of either one child. This allows to discard the storage of min/max values at the leaf nodes by storing two bits at inner nodes, assigning min/max values to the children: Each inner node's min/max values will be known in advance, where the root node's min/max values are stored separately. Each inner node has besides two min/max values also two bits given, defining to which child those min/max values are assigned (0: to the left child 1: to the right child). The non-assigned min/max values of the children are the from the current node already known min/max values. The two bits may also be stored in the least significant bits of the min/max values which have therefore to be approximated by fractioning them down/up. -The resulting memory reduction is not minor, as the leaf nodes of full binary kd-trees are one half of the tree's nodes.
R+ tree
space partitioning or binary space partitioning data structure. a method for looking up data using a location, often (x, y) coordinates, and often for locations on the surface of the earth. Searching on one number is a solved problem; searching on two or more, and asking for locations that are nearby in both x and y directions, requires craftier algorithms. -Fundamentally, an R+ tree is a tree data structure, a variant of the R tree, used for indexing spatial information. |Difference between R+ trees and R trees| -R+ trees are a compromise between R-trees and kd-trees: they avoid overlapping of internal nodes by inserting an object into multiple leaves if necessary. Coverage is the entire area to cover all related rectangles. Overlap is the entire area which is contained in two or more nodes.[1] Minimal coverage reduces the amount of "dead space" (empty area) which is covered by the nodes of the R-tree. Minimal overlap reduces the set of search paths to the leaves (even more critical for the access time than minimal coverage). Efficient search requires minimal coverage and overlap. R+ trees differ from R trees in that: nodes are not guaranteed to be at least half filled, the entries of any internal node do not overlap, and an object ID may be stored in more than one leaf node. |Advantages| Because nodes are not overlapped with each other, point query performance benefits since all spatial regions are covered by at most one node. A single path is followed and fewer nodes are visited than with the R-tree |Disadvantages| Since rectangles are duplicated, an R+ tree can be larger than an R tree built on same data set. Construction and maintenance of R+ trees is more complex than the construction and maintenance of R trees and other variants of the R tree.
Bounding interval hierarchy (BIH)
space partitioning or binary space partitioning data structure. a partitioning data structure similar to that of bounding volume hierarchies or kd-trees. Bounding interval hierarchies can be used in high performance (or real-time) ray tracing and may be especially useful for dynamic scenes. -The BIH was first presented under the name of SKD-Trees,[1] presented by Ooi et al., and BoxTrees,[2] independently invented by Zachmann. Bounding interval hierarchies (BIH) exhibit many of the properties of both bounding volume hierarchies (BVH) and kd-trees. Whereas the construction and storage of BIH is comparable to that of BVH, the traversal of BIH resemble that of kd-trees. Furthermore, BIH are also binary trees just like kd-trees (and in fact their superset, BSP trees). Finally, BIH are axis-aligned as are its ancestors. Although a more general non-axis-aligned implementation of the BIH should be possible (similar to the BSP-tree, which uses unaligned planes), it would almost certainly be less desirable due to decreased numerical stability and an increase in the complexity of ray traversal. -The key feature of the BIH is the storage of 2 planes per node (as opposed to 1 for the kd tree and 6 for an axis aligned bounding box hierarchy), which allows for overlapping children (just like a BVH), but at the same time featuring an order on the children along one dimension/axis (as it is the case for kd trees). -It is also possible to just use the BIH data structure for the construction phase but traverse the tree in a way a traditional axis aligned bounding box hierarchy does. This enables some simple speed up optimizations for large ray bundles [3] while keeping memory/cache usage low. -Some general attributes of bounding interval hierarchies (and techniques related to BIH) as described by [4] are: Very fast construction times Low memory footprint Simple and fast traversal Very simple construction and traversal algorithms High numerical precision during construction and traversal Flatter tree structure (decreased tree depth) compared to kd-trees
Kd-tree (short for k-dimensional tree)*
space partitioning or binary space partitioning data structure. a space-partitioning data structure for organizing points in a k-dimensional space. k-d trees are a useful data structure for several applications, such as searches involving a multidimensional search key (e.g. range searches and nearest neighbor searches). k-d trees are a special case of binary space partitioning trees. -The k-d tree is a binary tree in which every node is a k-dimensional point. Every non-leaf node can be thought of as implicitly generating a splitting hyperplane that divides the space into two parts, known as half-spaces. Points to the left of this hyperplane are represented by the left subtree of that node and points right of the hyperplane are represented by the right subtree. The hyperplane direction is chosen in the following way: every node in the tree is associated with one of the k-dimensions, with the hyperplane perpendicular to that dimension's axis. So, for example, if for a particular split the "x" axis is chosen, all points in the subtree with a smaller "x" value than the node will appear in the left subtree and all points with larger "x" value will be in the right subtree. In such a case, the hyperplane would be set by the x-value of the point, and its normal would be the unit x-axis
Segment tree
space partitioning or binary space partitioning data structure. a tree data structure for storing intervals, or segments. It allows querying which of the stored segments contain a given point. It is, in principle, a static structure; that is, its structure cannot be modified once it is built. A similar data structure is the interval tree. -A segment tree for a set I of n intervals uses O(n log n) storage and can be built in O(n log n) time. Segment trees support searching for all the intervals that contain a query point in O(log n + k), k being the number of retrieved intervals or segments.[1] -Applications of the segment tree are in the areas of computational geometry, and geographic information systems. -The segment tree can be generalized to higher dimension spaces as well.
Octree*
space partitioning or binary space partitioning data structure. a tree data structure in which each internal node has exactly eight children. Octrees are most often used to partition a three dimensional space by recursively subdividing it into eight octants. Octrees are the three-dimensional analog of quadtrees. The name is formed from oct + tree, but note that it is normally written "octree" with only one "t". Octrees are often used in 3D graphics and 3D game engines. -Each node in an octree subdivides the space it represents into eight octants. In a point region (PR) octree, the node stores an explicit 3-dimensional point, which is the "center" of the subdivision for that node; the point defines one of the corners for each of the eight children. In a matrix based (MX) octree, the subdivision point is implicitly the center of the space the node represents. The root node of a PR octree can represent infinite space; the root node of an MX octree must represent a finite bounded space so that the implicit centers are well-defined. Note that Octrees are not the same as k-d trees: k-d trees split along a dimension and octrees split around a point. Also k-d trees are always binary, which is not the case for octrees. By using a depth-first search the nodes are to be traversed and only required surfaces are to be viewed.
Quadtree*
space partitioning or binary space partitioning data structure. a tree data structure in which each internal node has exactly four children. Quadtrees are most often used to partition a two-dimensional space by recursively subdividing it into four quadrants or regions. The regions may be square or rectangular, or may have arbitrary shapes. This data structure was named a quadtree by Raphael Finkel and J.L. Bentley in 1974. A similar partitioning is also known as a Q-tree. -All forms of quadtrees share some common features: 1.They decompose space into adaptable cells 2.Each cell (or bucket) has a maximum capacity. When maximum capacity is reached, the bucket splits 3.The tree directory follows the spatial decomposition of the quadtree. -Quadtrees are the two-dimensional analog of octrees. -some common quadtrees: region quadtree, point quadtree, edge quadtree, polygonal map quadtree -some common uses of quadtrees: Image representation Spatial indexing Efficient collision detection in two dimensions View frustum culling of terrain data Storing sparse data, such as a formatting information for a spreadsheet[2] or for some matrix calculations[citation needed] Solution of multidimensional fields (computational fluid dynamics, electromagnetism) Conway's Game of Life simulation program.[3] State estimation[4] Quadtrees are also used in the area of fractal image analysis Maximum disjoint sets
Interval tree
space partitioning or binary space partitioning data structure. a tree data structure to hold intervals. Specifically, it allows one to efficiently find all intervals that overlap with any given interval or point. It is often used for windowing queries, for instance, to find all roads on a computerized map inside a rectangular viewport, or to find all visible elements inside a three-dimensional scene. A similar data structure is the segment tree. -The trivial solution is to visit each interval and test whether it intersects the given point or interval, which requires O(n) time, where n is the number of intervals in the collection. Since a query may return all intervals, for example if the query is a large interval intersecting all intervals in the collection, this is asymptotically optimal; however, we can do better by considering output-sensitive algorithms, where the runtime is expressed in terms of m, the number of intervals produced by the query. Interval trees have a query time of O(log n + m) and an initial creation time of O(n log n), while limiting memory consumption to O(n). After creation, interval trees may be dynamic, allowing efficient insertion and deletion of an interval in O(log n). If the endpoints of intervals are within a small integer range (e.g., in the range [1,...,O(n)]), faster data structures exist[1] with preprocessing time O(n) and query time O(1+m) for reporting m intervals containing a given query point
Adaptive k-d tree
space partitioning or binary space partitioning data structure. a tree for multidimensional points where successive levels may be split along different dimensions.
Cover tree
space partitioning or binary space partitioning data structure. a type of data structure in computer science that is specifically designed to facilitate the speed-up of a nearest neighbor search. It is a refinement of the Navigating Net data structure, and related to a variety of other data structures developed for indexing intrinsically low-dimensional data.[1] -The tree can be thought of as a hierarchy of levels with the top level containing the root point and the bottom level containing every point in the metric space. Each level C is associated with an integer value i that decrements by one as the tree is descended. Each level C in the cover tree has three important properties: 1.Nesting: C_{i}\subseteq C_{i-1} 2.Covering: For every point p [ ] C_{{i-1}}, there exists a point q [ ] C_{i} such that the distance from p to q is less than or equal to 2^{i} and exactly one such q is a parent of p. 3.Separation: For all points p,q [ ] C_{i}, the distance from p to q is greater than 2^{i}
R* tree*
space partitioning or binary space partitioning data structure. a variant of R-trees used for indexing spatial information. R*-trees have slightly higher construction cost than standard R-trees, as the data may need to be reinserted; but the resulting tree will usually have a better query performance. Like the standard R-tree, it can store both point and spatial data. It was proposed by Norbert Beckmann, Hans-Peter Kriegel, Ralf Schneider, and Bernhard Seeger in 1990 |Difference between R* Trees and R Trees| Minimization of both coverage and overlap is crucial to the performance of R-trees. Overlap means that, on data query or insertion, more than one branch of the tree needs to be expanded (due to the way data is being split in regions which may overlap). A minimized coverage improves pruning performance, allowing to exclude whole pages from search more often, in particular for negative range queries. The R*-tree attempts to reduce both, using a combination of a revised node split algorithm and the concept of forced reinsertion at node overflow. This is based on the observation that R-tree structures are highly susceptible to the order in which their entries are inserted, so an insertion-built (rather than bulk-loaded) structure is likely to be sub-optimal. Deletion and reinsertion of entries allows them to "find" a place in the tree that may be more appropriate than their original location. -When a node overflows, a portion of its entries are removed from the node and reinserted into the tree. (In order to avoid an indefinite cascade of reinsertions caused by subsequent node overflow, the reinsertion routine may be called only once in each level of the tree when inserting any one new entry.) This has the effect of producing more well-clustered groups of entries in nodes, reducing node coverage. Furthermore, actual node splits are often postponed, causing average node occupancy to rise. Re-insertion can be seen as a method of incremental tree optimization triggered on node overflow. |Performance| -Improved split heuristic produces pages that are more rectangular and thus better for many applications. -Reinsertion method optimizes the existing tree, but increases complexity. -Efficiently supports point and spatial data at the same time.
Hilbert R-tree*
space partitioning or binary space partitioning data structure. an R-tree variant, is an index for multidimensional objects like lines, regions, 3-D objects, or high-dimensional feature-based parametric objects. It can be thought of as an extension to B+-tree for multidimensional objects. -The performance of R-trees depends on the quality of the algorithm that clusters the data rectangles on a node. Hilbert R-trees use space-filling curves, and specifically the Hilbert curve, to impose a linear ordering on the data rectangles. |Hilbert curve or Hilbert space-filling curve| a continuous fractal space-filling curve first described by the German mathematician David Hilbert in 1891,[1] as a variant of the space-filling Peano curves discovered by Giuseppe Peano in 1890.[2] -Because it is space-filling, its Hausdorff dimension is 2 (precisely, its image is the unit square, whose dimension is 2 in any definition of dimension; its graph is a compact set homeomorphic to the closed unit interval, with Hausdorff dimension 2). -H_{n} is the nth approximation to the limiting curve. The Euclidean length of H_{n} is 2^{n} - (1/(2^{n})), i.e., it grows exponentially with n, while at the same time always being bounded by a square with a finite area. |Back to Hilbert R-Trees| -There are two types of Hilbert R-trees, one for static databases, and one for dynamic databases. In both cases Hilbert space-filling curves are used to achieve better ordering of multidimensional objects in the node. This ordering has to be 'good', in the sense that it should group 'similar' data rectangles together, to minimize the area and perimeter of the resulting minimum bounding rectangles (MBRs). Packed Hilbert R-trees are suitable for static databases in which updates are very rare or in which there are no updates at all. -The dynamic Hilbert R-tree is suitable for dynamic databases where insertions, deletions, or updates may occur in real time. Moreover, dynamic Hilbert R-trees employ flexible deferred splitting mechanism to increase the space utilization. Every node has a well defined set of sibling nodes. By adjusting the split policy the Hilbert R-tree can achieve a degree of space utilization as high as is desired. This is done by proposing an ordering on the R-tree nodes. The Hilbert R-tree sorts rectangles according to the Hilbert value of the center of the rectangles (i.e., MBR). (The Hilbert value of a point is the length of the Hilbert curve from the origin to the point.) Given the ordering, every node has a well-defined set of sibling nodes; thus, deferred splitting can be used. By adjusting the split policy, the Hilbert R-tree can achieve as high utilization as desired. To the contrary, other R-tree variants have no control over the space utilization.
Rapidly exploring random tree (RRT)
space partitioning or binary space partitioning data structure. an algorithm designed to efficiently search nonconvex, high-dimensional spaces by randomly building a space-filling tree. The tree is constructed incrementally from samples drawn randomly from the search space and is inherently biased to grow towards large unsearched areas of the problem. RRTs were developed by Steven M. LaValle and James J. Kuffner Jr. [1] .[2] They easily handle problems with obstacles and differential constraints (nonholonomic and kinodynamic) and have been widely used in autonomous robotic path planning. -RRTs can be viewed as a technique to generate open-loop trajectories for nonlinear systems with state constraints. An RRT can also be considered as a Monte-Carlo method to bias search into the largest Voronoi regions of a graph in a configuration space. Some variations can even be considered stochastic fractals. -An RRT grows a tree rooted at the starting configuration by using random samples from the search space. As each sample is drawn, a connection is attempted between it and the nearest state in the tree. If the connection is feasible (passes entirely through free space and obeys any constraints), this results in the addition of the new state to the tree. With uniform sampling of the search space, the probability of expanding an existing state is proportional to the size of its Voronoi region. As the largest Voronoi regions belong to the states on the frontier of the search, this means that the tree preferentially expands towards large unsearched areas. -The length of the connection between the tree and a new state is frequently limited by a growth factor. If the random sample is further from its nearest state in the tree than this limit allows, a new state at the maximum distance from the tree along the line to the random sample is used instead of the random sample itself. The random samples can then be viewed as controlling the direction of the tree growth while the growth factor determines its rate. This maintains the space-filling bias of the RRT while limiting the size of the incremental growth. -RRT growth can be biased by increasing the probability of sampling states from a specific area. Most practical implementations of RRTs make use of this to guide the search towards the planning problem goals. This is accomplished by introducing a small probability of sampling the goal to the state sampling procedure. The higher this probability, the more greedily the tree grows towards the goal.
Linear octree
space partitioning or binary space partitioning data structure. an octree that is represented by a linear array instead of a tree data structure. -To simplify implementation, a linear octree is usually complete (that is, every internal node has exactly 8 child nodes) and where the maximum permissible depth is fixed a priori (making it sufficient to store the complete list of leaf nodes). That is, all the nodes of the octree can be generated from the list of its leaf nodes. Space filling curves are often used to represent linear octrees.
Range tree
space partitioning or binary space partitioning data structure. an ordered tree data structure to hold a list of points. It allows all points within a given range to be reported efficiently, and is typically used in two or higher dimensions. Range trees were introduced by Jon Louis Bentley in 1979.[1] Similar data structures were discovered independently by Lueker,[2] Lee and Wong,[3] and Willard.[4] The range tree is an alternative to the k-d tree. Compared to k-d trees, range trees offer faster query times of (in Big O notation) O(log^d n + k) but worse storage of O(n log^{d-1} n), where n is the number of points stored in the tree, d is the dimension of each point and k is the number of points reported by a given query. -Bernard Chazelle improved this to query time O(log^{d-1} n + k) and space complexity O(n ((log n)/(log log n))^(d-1)
Metric tree
space partitioning or binary space partitioning data structure. any tree data structure specialized to index data in metric spaces. Metric trees exploit properties of metric spaces such as the triangle inequality to make accesses to the data more efficient. Examples include the M-tree, vp-trees, cover trees, MVP Trees, and BK-trees -If there is no structure to the similarity measure then a brute force search requiring the comparison of the query image to every image in the dataset is the best that can be done[citation needed]. If, however, the similarity function satisfies the triangle inequality then it is possible to use the result of each comparison to prune the set of candidates to be examined. |Triangle Inequality| states that for any triangle, the sum of the lengths of any two sides must be greater than or equal to the length of the remaining side |Back to metric tree| -The first article on metric trees, as well as the first use of the term "metric tree", published in the open literature was by Jeffrey Uhlmann in 1991.[2] Other researchers were working independently on similar data structures. In particular, Peter Yianilos claimed to have independently discovered the same method, which he called a vantage point tree (VP-tree).[3] The research on metric tree data structures blossomed in the late 1990s and included an examination by Google co-founder Sergey Brin of their use for very large databases.[4] The first textbook on metric data structures was published in 2006
Compressed suffix array
tree data structure where each tree node compares a bit slice of key values. a compressed data structure for pattern matching. Compressed suffix arrays are a general class of data structure that improve on the suffix array.[1][2] These data structures enable quick search for an arbitrary string with a comparatively small index. -Given a text T of n characters from an alphabet Σ, a compressed suffix array supports searching for arbitrary patterns in T. For an input pattern P of m characters, the search time is typically O(m) or O(m + log(n)). The space used is typically O(nH_{k}(T))+o(n), where H_{k}(T) is the k-th order empirical entropy of the text T. The time and space to construct a compressed suffix array are normally O(n). -The original instantiation of the compressed suffix array[1] solved a long-standing open problem by showing that fast pattern matching was possible using only a linear-space data structure, namely, one proportional to the size of the text T, which takes O(n log|Sigma|}) bits. The conventional suffix array and suffix tree use Omega (n log n) bits, which is substantially larger. The basis for the data structure is a recursive decomposition using the "neighbor function," which allows a suffix array to be represented by one of half its length. The construction is repeated multiple times until the resulting suffix array uses a linear number of bits. Following work showed that the actual storage space was related to the zeroth-order entropy and that the index supports self-indexing.[4] The space bound was further improved achieving the ultimate goal of higher-order entropy; the compression is obtained by partitioning the neighbor function by high-order contexts, and compressing each partition with a wavelet tree.[3] The space usage is extremely competitive in practice with other state-of-the-art compressors,[5] and it also supports fast pattern matching. -The memory accesses made by compressed suffix arrays and other compressed data structures for pattern matching are typically not localized, and thus these data structures have been notoriously hard to design efficiently for use in external memory. Recent progress using geometric duality takes advantage of the block access provided by disks to speed up the I/O time significantly[6] In addition, potentially practical search performance for a compressed suffix array in external-memory has been demonstrated.
X-fast trie
tree data structure where each tree node compares a bit slice of key values. a data structure for storing integers from a bounded domain. It supports exact and predecessor or successor queries in time O(log log M), using O(n log M) space, where n is the number of stored values and M is the maximum value in the domain. The structure was proposed by Dan Willard in 1982,[1] along with the more complicated y-fast trie, as a way to improve the space usage of van Emde Boas trees, while retaining the O(log log M) query time.
Y-fast trie
tree data structure where each tree node compares a bit slice of key values. a data structure for storing integers from a bounded domain. It supports exact and predecessor or successor queries in time O(log log M), using O(n) space, where n is the number of stored values and M is the maximum value in the domain. The structure was proposed by Dan Willard in 1982[1] to decrease the O(n log M) space used by an x-fast trie.
Judy array
tree data structure where each tree node compares a bit slice of key values. a data structure that has high performance, low memory usage and implements an associative array. Unlike normal arrays, Judy arrays may be sparse, that is, they may have large ranges of unassigned indices. They can be used for storing and looking up values using integer or string keys. The key benefits of using a Judy array is its scalability, high performance, memory efficiency and ease of use.[1] -Judy arrays are both speed- and memory-efficient,[clarification needed] and therefore they can sometimes replace common in-memory dictionary implementations (like red-black trees or hash tables). -Roughly speaking, Judy arrays are highly optimized 256-ary radix trees.[2] Judy arrays use over 20 different compression techniques on trie nodes to reduce memory usage. -The Judy array was invented by Douglas Baskins and named after his sister.
Radix tree (also radix trie or compact prefix tree)
tree data structure where each tree node compares a bit slice of key values. a data structure that represents a space-optimized trie in which each node that is the only child is merged with its parent. The result is that the number of children of every internal node is at least the radix r of the radix tree, where r is a positive integer and a power x of 2, having x ≥ 1. Unlike in regular tries, edges can be labeled with sequences of elements as well as single elements. This makes radix trees much more efficient for small sets (especially if the strings are long) and for sets of strings that share long prefixes. -Unlike regular trees (where whole keys are compared en masse from their beginning up to the point of inequality), the key at each node is compared chunk-of-bits by chunk-of-bits, where the quantity of bits in that chunk at that node is the radix r of the radix trie. When the r is 2, the radix trie is binary (i.e., compare that node's 1-bit portion of the key), which minimizes sparseness at the expense of maximizing trie depth—i.e., maximizing up to conflation of nondiverging bit-strings in the key. When r is an integer power of 2 greater or equal to 4, then the radix trie is an r-ary trie, which lessens the depth of the radix trie at the expense of potential sparseness. -As an optimization, edge labels can be stored in constant size by using two pointers to a string (for the first and last elements).[1] -Note that although the examples in this article show strings as sequences of characters, the type of the string elements can be chosen arbitrarily; for example, as a bit or byte of the string representation when using multibyte character encodings or Unicode.
B-trees*
tree data structure. a self-balancing tree data structure that keeps data sorted and allows searches, sequential access, insertions, and deletions in logarithmic time. The B-tree is a generalization of a binary search tree in that a node can have more than two children (Comer 1979, p. 123). Unlike self-balancing binary search trees, the B-tree is optimized for systems that read and write large blocks of data. B-trees are a good example of a data structure for external memory. It is commonly used in databases and filesystems.
Conc-Tree list*
List data structure. a data-structure that stores element sequences, and provides amortized O(1) time append and prepend operations, O(log n) time insert and remove operations and O(log n) time concatenation. This data structure is particularly viable for functional task-parallel and data-parallel programming, and is relatively simple to implement compared to other data-structures with similar asymptotic complexity.[1] Conc-Trees were designed to improve efficiency of data-parallel operations that do not depend on the iteration order,[3] and improve constant factors in these operations by avoiding unnecessary copies of the data.[2] Orthogonally, they are used to efficiently aggregate data in functional-style task-parallel algorithms, as an implementation of the conc-list data abstraction.[4] Conc-list is a parallel programming counterpart to functional cons-lists, and was originally introduced by the Fortress language Fortress (programming language).
Linked list*
List data structure. a linear collection of data elements, called nodes, pointing to the next node by means of a pointer. It is a data structure consisting of a group of nodes which together represent a sequence. Under the simplest form, each node is composed of data and a reference (in other words, a link) to the next node in the sequence. This structure allows for efficient insertion or removal of elements from any position in the sequence during iteration. More complex variants add additional links, allowing efficient insertion or removal from arbitrary element references. -Linked lists are among the simplest and most common data structures. They can be used to implement several other common abstract data types, including lists (the abstract data type), stacks, queues, associative arrays, and S-expressions, though it is not uncommon to implement the other data structures directly without using a list as the basis of implementation. -The principal benefit of a linked list over a conventional array is that the list elements can easily be inserted or removed without reallocation or reorganization of the entire structure because the data items need not be stored contiguously in memory or on disk, while an array has to be declared in the source code, before compiling and running the program. Linked lists allow insertion and removal of nodes at any point in the list, and can do so with a constant number of operations if the link previous to the link being added or removed is maintained during list traversal. -On the other hand, simple linked lists by themselves do not allow random access to the data, or any form of efficient indexing. Thus, many basic operations — such as obtaining the last node of the list (assuming that the last node is not maintained as separate node reference in the list structure), or finding a node that contains a given datum, or locating the place where a new node should be inserted — may require sequential scanning of most or all of the list elements. The advantages and disadvantages of using linked lists are given below.
VList
List data structure. a persistent data structure designed by Phil Bagwell in 2002 that combines the fast indexing of arrays with the easy extension of cons-based (or singly linked) linked lists. -Like arrays, VLists have constant-time lookup on average and are highly compact, requiring only O(log n) storage for pointers, allowing them to take advantage of locality of reference. Like singly linked or cons-based lists, they are persistent, and elements can be added to or removed from the front in constant time. Length can also be found in O(log n) time. -The primary operations of a VList are: 1) Locate the kth element (O(1) average, O(log n) worst-case) 2) Add an element to the front of the VList (O(1) average, with an occasional allocation) 3) Obtain a new array beginning at the second element of an old array (O(1)) 4) Compute the length of the list (O(log n)) -The primary advantage VLists have over arrays is that different updated versions of the VList automatically share structure. Because VLists are immutable, they are most useful in functional programming languages, where their efficiency allows a purely functional implementation of data structures traditionally thought to require mutable arrays, such as hash tables. -However, VLists also have a number of disadvantages over their competitors: ...While immutability is a benefit, it is also a drawback, making it inefficient to modify elements in the middle of the array. ...Access near the end of the list can be as expensive as O(log n); it is only constant on average over all elements. This is still, however, much better than performing the same operation on cons-based lists. ...Wasted space in the first block is proportional to n. This is similar to linked lists, but there are data structures with less overhead. When used as a fully persistent data structure, the overhead may be considerably higher and this data structure may not be appropriate. -VList may be modified to support the implementation of a growable array. In the application of a growable array, immutability is no longer required. Instead of growing at the beginning of the list, the ordering interpretation is reversed to allow growing at the end of the array.
Zipper
List data structure. a technique of representing an aggregate data structure so that it is convenient for writing programs that traverse the structure arbitrarily and update its contents, especially in purely functional programming languages. The zipper was described by Gérard Huet in 1997.[1] It includes and generalizes the gap buffer technique sometimes used with arrays. -The zipper technique is general in the sense that it can be adapted to lists, trees, and other recursively defined data structures. Such modified data structures are usually referred to as "a tree with zipper" or "a list with zipper" to emphasize that the structure is conceptually a tree or list, while the zipper is a detail of the implementation. -A layman's explanation for a tree with zipper would be an ordinary computer filesystem with operations to go to parent (often cd ..), and the possibility to go downwards (cd subdirectory). The zipper is the pointer to the current path. Behind the scenes the zippers are efficient when making (functional) changes to a data structure, where a new, slightly changed, data structure is returned from an edit operation (instead of making a change in the current data structure). -Uses: -The zipper is often used where there is some concept of focus or of moving around in some set of data, since its semantics reflect that of moving around but in a functional non-destructive manner. -The zipper has been used in ...Xmonad, to manage focus and placement of windows ...Huet's papers cover a structural editor based on zippers and a theorem prover ...A filesystem (ZipperFS) written in Haskell offering "...transactional semantics; undo of any file and directory operation; snapshots; statically guaranteed the strongest, repeatable read, isolation mode for clients; pervasive copy-on-write for files and directories; built-in traversal facility; and just the right behavior for cyclic directory references." ...Clojure has extensive support for zippers.
Difference list
List data structure. may refer to one of two data structures for representing lists. One of these data structures contains two lists, and represents the difference of those two lists. -The second data structure is a functional representation of a list with an efficient concatenation operation. In the second approach, difference lists are implemented as single-argument functions, which take a list as argument and prepend to that list. As a consequence, concatenation of difference lists of the second type is implemented essentially as function composition, which is O(1). However, of course the list still has to be constructed eventually (assuming all of its elements are needed), which is plainly at least O(n). -A difference list of the second sort represents lists as a function f, which when given a list x, returns the list that f represents, prepended to x. It is typically used in functional programming languages such as Haskell, although it could be used in imperative languages as well. Whether this kind of difference list is more efficient than another list representations depends on usage patterns. If an algorithm builds a list by concatenating smaller lists, which are themselves built by concatenating still smaller lists, then use of difference lists can improve performance by effectively "flattening" the list building computations.
Container
abstract data type. a class, a data structure,[1][2] or an abstract data type (ADT) whose instances are collections of other objects. In other words, they store objects in an organized way that follows specific access rules. The size of the container depends on the number of objects (elements) it contains. Containers can be looked at in three ways: -access, that is the way of accessing the objects of the container. In the case of arrays, access is done with the array index. In the case of stacks, access is done according to the LIFO (last in, first out) order (alternative name: FILO, first in, last out)[3] and in the case of queues it is done according to the FIFO (first in, first out) order (alternative name: LILO, last in, last out); -storage, that is the way of storing the objects of the container; -traversal, that is the way of traversing the objects of the container. -Container classes are expected to implement methods to do the following: create an empty container; insert objects into the container; delete objects from the container; delete all the objects in the container (clear); access the objects in the container; access the number of objects in the container (size). Containers are sometimes implemented in conjunction with iterators. -Containers can be divided into two groups: single value containers and associative containers.
Double-ended queue (dequeue, often abbreviated to deque, pronounced deck)
abstract data type. an abstract data type that generalizes a queue, for which elements can be added to or removed from either the front (head) or back (tail). It is also often called a head-tail linked list, though properly this refers to a specific data structure implementation. -This differs from the queue abstract data type or First-In-First-Out List (FIFO), where elements can only be added to one end and removed from the other. This general data class has some possible sub-types: 1) An input-restricted deque is one where deletion can be made from both ends, but insertion can be made at one end only. 2) An output-restricted deque is one where insertion can be made at both ends, but deletion can be made from one end only. -Both the basic and most common list types in computing, queues and stacks can be considered specializations of deques, and can be implemented using deques. -There are at least two common ways to efficiently implement a deque: with a modified dynamic array or with a doubly linked list. The dynamic array approach uses a variant of a dynamic array that can grow from both ends, sometimes called array deques. These array deques have all the properties of a dynamic array, such as constant-time random access, good locality of reference, and inefficient insertion/removal in the middle, with the addition of amortized constant-time insertion/removal at both ends, instead of just one end. Three common implementations include: 1) Storing deque contents in a circular buffer, and only resizing when the buffer becomes full. This decreases the frequency of resizings. 2) Allocating deque contents from the center of the underlying array, and resizing the underlying array when either end is reached. This approach may require more frequent resizings and waste more space, particularly when elements are only inserted at one end. 3) Storing contents in multiple smaller arrays, allocating additional arrays at the beginning or end as needed. Indexing is implemented by keeping a dynamic array containing pointers to each of the smaller arrays.
List (or sequence)
abstract data type. an abstract data type that represents an ordered sequence of values, where the same value may occur more than once. An instance of a list is a computer representation of the mathematical concept of a finite sequence; the (potentially) infinite analog of a list is a stream.[1]:§3.5 Lists are a basic example of containers, as they contain other values. If the same value occurs multiple times, each occurrence is considered a distinct item. -Implementation of the list data structure may provide some of the following operations: a constructor for creating an empty list; an operation for testing whether or not a list is empty; an operation for prepending an entity to a list an operation for appending an entity to a list an operation for determining the first component (or the "head") of a list an operation for referring to the list consisting of all the components of a list except for its first (this is called the "tail" of the list.)
Priority queue
abstract data type. an abstract data type which is like a regular queue or stack data structure, but where additionally each element has a "priority" associated with it. In a priority queue, an element with high priority is served before an element with low priority. If two elements have the same priority, they are served according to their order in the queue. -While priority queues are often implemented with heaps (it is the maximally efficient implementation of it), they are conceptually distinct from heaps. A priority queue is an abstract concept like "a list" or "a map"; just as a list can be implemented with a linked list or an array, a priority queue can be implemented with a heap or a variety of other methods such as an unordered array. -One can imagine a priority queue as a modified queue, but when one would get the next element off the queue, the highest-priority element is retrieved first. -Stacks and queues may be modeled as particular kinds of priority queues. As a reminder, here is how stacks and queues behave: 1) stack - elements are pulled in last-in first-out-order (e.g., a stack of papers) 2) queue - elements are pulled in first-in first-out-order (e.g., a line in a cafeteria) In a stack, the priority of each inserted element is monotonically increasing; thus, the last element inserted is always the first retrieved. In a queue, the priority of each inserted element is monotonically decreasing; thus, the first element inserted is always the first retrieved.
Bit field
array data structure. -can be distinguished (need verification) from a bit array in that the latter is used to store a large set of bits indexed by integers and is often wider than any integral type supported by the language. Bit fields, on the other hand, typically fit within a machine word, and the denotation of bits is independent of their numerical index. -Bit field can be used to reduce memory consumption when it is known that only some bits would be used for a variable. Bit fields allow efficient packaging of data in the memory. As we know, integer takes two bytes(16-bits) in memory. Some times we need to store value that takes less than 2-bytes. In such cases, there is wastages of memory. For example, if we use a variable temp to store value either 0 or 1. In this case only one bit of memory will be used rather than 16-bits. By using bit field, we can save lot of memory.
Bitboard
array data structure. A bitboard, often used for boardgames such as chess, checkers, othello and word games, is a specialization of the bit array data structure, where each bit represents a game position or state, designed for optimization of speed and/or memory or disk use in mass calculations. Bits in the same bitboard relate to each other in the rules of the game, often forming a game position when taken together. Other bitboards are commonly used as masks to transform or answer queries about positions. The "game" may be any game-like system where information is tightly packed in a structured form with "rules" affecting how the individual units or pieces relate. -Bitboards are used in many of the world's highest-rated chess playing programs such as Houdini, Stockfish, and Critter. They help the programs analyze chess positions with few CPU instructions and hold a massive number of positions in memory efficiently. -Bitboards allow the computer to answer some questions about game state with one logical operation. For example, if a chess program wants to know if the white player has any pawns in the center of the board (center four squares) it can just compare a bitboard for the player's pawns with one for the center of the board using a logical AND operation. If there are no center pawns then the result will be zero. -Query results can also be represented using bitboards. For example, the query "What are the squares between X and Y?" can be represented as a bitboard. These query results are generally pre-calculated, so that a program can simply retrieve a query result with one memory load. However, as a result of the massive compression and encoding, bitboard programs are not easy for software developers to either write or debug.
Circular buffer (or circular queue, cyclic buffer or ring buffer)
array data structure. a data structure that uses a single, fixed-size buffer as if it were connected end-to-end. This structure lends itself easily to buffering data streams. -a data buffer (or just buffer) is a region of a physical memory storage used to temporarily store data while it is being moved from one place to another. -the useful property of a circular buffer is that it does not need to have its elements shuffled around when one is consumed. (If a non-circular buffer were used then it would be necessary to shift all elements when one is consumed.) In other words, the circular buffer is well-suited as a FIFO buffer while a standard, non-circular buffer is well suited as a LIFO buffer. -Circular buffering makes a good implementation strategy for a queue that has fixed maximum size. Should a maximum size be adopted for a queue, then a circular buffer is a completely ideal implementation; all queue operations are constant time. However, expanding a circular buffer requires shifting memory, which is comparatively costly. For arbitrarily expanding queues, a linked list approach may be preferred instead. -In some situations, overwriting circular buffer can be used, e.g. in multimedia. If the buffer is used as the bounded buffer in the producer-consumer problem then it is probably desired for the producer (e.g., an audio generator) to overwrite old data if the consumer (e.g., the sound card) is unable to momentarily keep up. Also, the LZ77 family of lossless data compression algorithms operates on the assumption that strings seen more recently in a data stream are more likely to occur soon in the stream. Implementations store the most recent data in a circular buffer. -A circular buffer can be implemented using four pointers, or two pointers and two integers: buffer start in memory buffer end in memory, or buffer capacity start of valid data (index or pointer) end of valid data (index or pointer), or amount of data currently in the buffer (integer) -A circular-buffer implementation may be optimized by mapping the underlying buffer to two contiguous regions of virtual memory. (Naturally, the underlying buffer's length must then equal some multiple of the system's page size.) Reading from and writing to the circular buffer may then be carried out with greater efficiency by means of direct memory access; those accesses which fall beyond the end of the first virtual-memory region will automatically wrap around to the beginning of the underlying buffer. When the read offset is advanced into the second virtual-memory region, both offsets—read and write—are decremented by the length of the underlying buffer.
Dope vector
array data structure. a data structure used to hold information about a data object, e.g. an array, especially its memory layout. -A dope vector typically contains information about the type of array element, rank of an array, the extents of an array, and the stride of an array as well as a pointer to the block in memory containing the array elements. -It is often used in compilers to pass entire arrays between procedures in a high level language like Fortran. -The dope vector includes an identifier, a length, a parent address, and a next child address. The identifier is an assigned name and may be unused. The length is the amount of allocated storage to this vector from the end of the dope vector that contains data of use to the internal processes of the computer. This length is called the offset, span, or vector length. The parent and child references are absolute memory addresses, or register and offset settings to the parent or child depending on the type of computer. -Dope vectors are often managed internally by the operating system and allow the processor to allocate and de-allocate storage in specific segments as needed. -Dope vectors may also have a status bit that tells the system if they are active; if it is not active it can be reallocated when needed. Using this technology the computer can perform a more granular memory management.
Iliffe vector (or display)
array data structure. a data structure used to implement multi-dimensional arrays. An Iliffe vector for an n-dimensional array (where n ≥ 2) consists of a vector (or 1-dimensional array) of pointers to an (n − 1)-dimensional array. They are often used to avoid the need for expensive multiplication operations when performing address calculation on an array element. They can also be used to implement jagged arrays, such as triangular arrays, triangular matrices and other kinds of irregularly shaped arrays. The data structure is named after John K. Iliffe. Their disadvantages include the need for multiple chained pointer indirections to access an element, and the extra work required to determine the next row in an n-dimensional array to allow an optimising compiler to prefetch it. Both of these are a source of delays on systems where the CPU is significantly faster than main memory. The Iliffe vector for a 2-dimensional array is simply a vector of pointers to vectors of data, i.e., the Iliffe vector represents the columns of an array where each column element is a pointer to a row vector. Multidimensional arrays in languages such as Java, Python (multidimensional lists), Ruby, Visual Basic .NET, Perl, PHP, JavaScript, Objective-C (when using NSArray, not a row-major C-style array), Swift, and Atlas Autocode are implemented as Iliffe vectors. Iliffe vectors are contrasted with dope vectors in languages such as Fortran, which contain the stride factors and offset values for the subscripts in each dimension.
Hashed array tree (HAT)
array data structure. a dynamic array data-structure published by Edward Sitarski in 1996,[1] maintaining an array of separate memory fragments (or "leaves") to store the data elements, unlike simple dynamic arrays which maintain their data in one contiguous memory area. Its primary objective is to reduce the amount of element copying due to automatic array resizing operations, and to improve memory usage patterns. -Whereas simple dynamic arrays based on geometric expansion waste linear (Ω(n)) space, where n is the number of elements in the array, hashed array trees waste only order O(√n) storage space. An optimization of the algorithm allows to eliminate data copying completely, at a cost of increasing the wasted space. -It can perform access in constant (O(1)) time, though slightly slower than simple dynamic arrays. The algorithm has O(1) amortized performance when appending a series of objects to the end of a hashed array tree. Contrary to its name, it does not use hash functions. -As defined by Sitarski, a hashed array tree has a top-level directory containing a power of two number of leaf arrays. All leaf arrays are the same size as the top-level directory. This structure superficially resembles a hash table with array-based collision chains, which is the basis for the name hashed array tree. A full hashed array tree can hold m2 elements, where m is the size of the top-level directory. The use of powers of two enables faster physical addressing through bit operations instead of arithmetic operations of quotient and remainder[1] and ensures the O(1) amortized performance of append operation in the presence of occasional global array copy while expanding. -Brodnik et al. presented a dynamic array algorithm with a similar space wastage profile to hashed array trees. Brodnik's implementation retains previously allocated leaf arrays, with a more complicated address calculation function as compared to hashed array trees.
Bitmap
array data structure. a mapping from some domain (for example, a range of integers) to bits, that is, values which are zero or one. It is also called a bit array or bitmap index. In computer graphics, when the domain is a rectangle (indexed by two coordinates) a bitmap gives a way to store a binary image, that is, an image in which each pixel is either black or white (or any two colors). -The more general term pixmap refers to a map of pixels, where each one may store more than two colors, thus using more than one bit per pixel. Often bitmap is used for this as well. In some contexts, the term bitmap implies one bit per pixel, while pixmap is used for images with multiple bits per pixel -A bitmap is a type of memory organization or image file format used to store digital images. The term bitmap comes from the computer programming terminology, meaning just a map of bits, a spatially mapped array of bits. Now, along with pixmap, it commonly refers to the similar concept of a spatially mapped array of pixels. Raster images in general may be referred to as bitmaps or pixmaps, whether synthetic or photographic, in files or memory. -Many graphical user interfaces use bitmaps in their built-in graphics subsystems -Similarly, most other image file formats, such as JPEG, TIFF, PNG, and GIF, also store bitmap images (as opposed to vector graphics), but they are not usually referred to as bitmaps, since they use compressed formats internally. -In typical uncompressed bitmaps, image pixels are generally stored with a color depth of 1, 4, 8, 16, 24, 32, 48, or 64 bits per pixel. Pixels of 8 bits and fewer can represent either grayscale or indexed color. An alpha channel (for transparency) may be stored in a separate bitmap, where it is similar to a grayscale bitmap, or in a fourth channel that, for example, converts 24-bit images to 32 bits per pixel. -The bits representing the bitmap pixels may be packed or unpacked (spaced out to byte or word boundaries), depending on the format or device requirements. Depending on the color depth, a pixel in the picture will occupy at least n/8 bytes, where n is the bit depth.
Sparse matrix
array data structure. a matrix in which most of the elements are zero. By contrast, if most of the elements are nonzero, then the matrix is considered dense. The number of zero-valued elements divided by the total number of elements (e.g., m × n for an m × n matrix) is called the sparsity of the matrix (which is equal to 1 minus the density of the matrix). -Conceptually, sparsity corresponds to systems which are loosely coupled. Consider a line of balls connected by springs from one to the next: this is a sparse system as only adjacent balls are coupled. By contrast, if the same line of balls had springs connecting each ball to all other balls, the system would correspond to a dense matrix. The concept of sparsity is useful in combinatorics and application areas such as network theory, which have a low density of significant data or connections. -Large sparse matrices often appear in scientific or engineering applications when solving partial differential equations. -When storing and manipulating sparse matrices on a computer, it is beneficial and often necessary to use specialized algorithms and data structures that take advantage of the sparse structure of the matrix. Operations using standard dense-matrix structures and algorithms are slow and inefficient when applied to large sparse matrices as processing and memory are wasted on the zeroes. Sparse data is by nature more easily compressed and thus require significantly less storage. Some very large sparse matrices are infeasible to manipulate using standard dense-matrix algorithms.
Heightmap (or heightfield)
array data structure. a raster image used to store values, such as surface elevation data, for display in 3D computer graphics. A heightmap can be used in bump mapping to calculate where this 3D data would create shadow in a material, in displacement mapping to displace the actual geometric position of points over the textured surface, or for terrain where the heightmap is converted into a 3D mesh. -A heightmap contains one channel interpreted as a distance of displacement or "height" from the "floor" of a surface and sometimes visualized as luma of a grayscale image, with black representing minimum height and white representing maximum height. When the map is rendered, the designer can specify the amount of displacement for each unit of the height channel, which corresponds to the "contrast" of the image. Heightmaps can be stored by themselves in existing grayscale image formats, with or without specialized metadata, or in specialized file formats such as Daylon Leveller, GenesisIV and Terragen documents. -One may also exploit the use of individual color channels to increase detail. For example, a standard RGB 8-bit image can only show 256 values of grey and hence only 256 heights. By using colors, a greater number of heights can be stored (for an 24-bit image, 2563 = 16,777,216 heights can be represented (2564 = 4,294,967,296 if the alpha channel is also used)). This technique is especially useful where height varies slightly over a large area. Using only grey values, because the heights must be mapped to only 256 values, the rendered terrain appears flat, with "steps" in certain places. -Heightmap of Earth's surface (including water and ice) in equirectangular projection, normalized as 8-bit grayscale Heightmaps are commonly used in geographic information systems, where they are called digital elevation models.
Matrix
array data structure. a rectangular array of numbers, symbols, or expressions, arranged in rows and columns. The individual items in a matrix are called its elements or entries. -Applications of matrices are found in most scientific fields. In every branch of physics, including classical mechanics, optics, electromagnetism, quantum mechanics, and quantum electrodynamics, they are used to study physical phenomena, such as the motion of rigid bodies. In computer graphics, they are used to project a 3D model onto a 2 dimensional screen. In probability theory and statistics, stochastic matrices are used to describe sets of probabilities; for instance, they are used within the PageRank algorithm that ranks the pages in a Google search.[5] Matrix calculus generalizes classical analytical notions such as derivatives and exponentials to higher dimensions. -A major branch of numerical analysis is devoted to the development of efficient algorithms for matrix computations, a subject that is centuries old and is today an expanding area of research. Matrix decomposition methods simplify computations, both theoretically and practically. Algorithms that are tailored to particular matrix structures, such as sparse matrices and near-diagonal matrices, expedite computations in finite element method and other computations. Infinite matrices occur in planetary theory and in atomic theory. A simple example of an infinite matrix is the matrix representing the derivative operator, which acts on the Taylor series of a function.
Sorted array
array data structure. an array data structure in which each element is sorted in numerical, alphabetical, or some other order, and placed at equally spaced addresses in computer memory. It is typically used in computer science to implement static lookup tables to hold multiple values which have the same data type. Sorting an array is useful in organising data in ordered form and recovering them rapidly. -There are many well-known methods by which an array can be sorted, which include, but are not limited to: selection sort, bubble sort, insertion sort, merge sort, quicksort, heapsort, and counting sort. -Sorted arrays are the most space-efficient data structure with the best locality of reference for sequentially stored data.[citation needed] -Elements within a sorted array are found using a binary search, in O(log n); thus sorted arrays are suited for cases when one needs to be able to look up elements quickly, e.g. as a set or multiset data structure. This complexity for lookups is the same as for self-balancing binary search trees. -In some data structures, an array of structures is used. In such cases, the same sorting methods can be used to sort the structures according to some key as a structure element; for example, sorting records of students according to roll numbers or names or grades. -If one is using a sorted dynamic array, then it is possible to insert and delete elements. The insertion and deletion of elements in a sorted array executes at O(n), due to the need to shift all the elements following the element to be inserted or deleted; in comparison a self-balancing binary search tree inserts and deletes at O(log n). In the case where elements are deleted or inserted at the end, a sorted dynamic array can do this in amortized O(1) time while a self-balancing binary search tree always operates at O(log n). -Elements in a sorted array can be looked up by their index (random access) at O(1) time, an operation taking O(log n) or O(n) time for more complex data structures.
Bit array (also known as bitmap, bitset, bit string, or bit vector)
array data structure. an array data structure that compactly stores bits. It can be used to implement a simple set data structure. A bit array is effective at exploiting bit-level parallelism in hardware to perform operations quickly. A typical bit array stores kw bits, where w is the number of bits in the unit of storage, such as a byte or word, and k is some nonnegative integer. If w does not divide the number of bits to be stored, some space is wasted due to internal fragmentation. -A bit array is a mapping from some domain (almost always a range of integers) to values in the set {0, 1}. The values can be interpreted as dark/light, absent/present, locked/unlocked, valid/invalid, et cetera. The point is that there are only two possible values, so they can be stored in one bit. As with other arrays, the access to a single bit can be managed by applying an index to the array. Assuming its size (or length) to be n bits, the array can be used to specify a subset of the domain (e.g. {0, 1, 2, ..., n−1}), where a 1-bit indicates the presence and a 0-bit the absence of a number in the set. This set data structure uses about n/w words of space, where w is the number of bits in each machine word. Whether the least significant bit (of the word) or the most significant bit indicates the smallest-index number is largely irrelevant, but the former tends to be preferred (on little-endian machines).
Left-child right-sibling binary tree (or child-sibling representation, doubly chained tree or filial-heir chain)
binary tree data structure. Every multi-way or k-ary tree structure studied in computer science admits a representation as a binary tree. In a binary tree that represents a multi-way tree T, each node corresponds to a node in T and has two pointers: one to the node's first child, and one to its next sibling in T. The children of a node thus form a singly-linked list. -use cases: LCRS representation fits two normal criteria: 1) LCRS representation uses much memory. 2) Random access of a node's children is not required. Case (1) is utilized when large multi-way trees are necessary, especially when the trees contains a large set of data. For example, if storing a phylogenetic tree, the LCRS representation might be suitable. Case (2) arises in specialized data structures in which the tree structure is being used in very specific ways. For example, many types of heap data structures that use multi-way trees can be space optimized by using the LCRS representation. The main reason for this is that in heap data structures, the most common operations tend to be 1) Remove the root of a tree and process each of its children, or 2) Join two trees together by making one tree a child of the other. Operation (1) it is very efficient. In LCRS representation, it organizes the tree to have a right child because it does not have a sibling, so it is easy to remove the root. Operation (2) it is also efficient. It is easy to join two trees together.
Rope (or cord)
binary tree data structure. a data structure composed of smaller strings that is used for efficiently storing and manipulating a very long string. For example, a text editing program may use a rope to represent the text being edited, so that operations such as insertion, deletion, and random access can be done efficiently -A rope is a binary tree having leaf nodes that contain a short string. Each node has a weight value equal to the length of its string plus the sum of all leaf nodes' weight in its left subtree, namely the weight of a node is the total string length in its left subtree for a non-leaf node, or the string length of itself for a leaf node. Thus a node with two children divides the whole string into two parts: the left subtree stores the first part of the string. The right subtree stores the second part and its weight is the sum of the left child's weight and the length of its contained string. -The binary tree can be seen as several levels of nodes. The bottom level contains all the nodes that contain a string. Higher levels have fewer and fewer nodes. The top level consists of a single "root" node. The rope is built by putting the nodes with short strings in the bottom level, then attaching a random half of the nodes to parent nodes in the next level.
AA tree*
binary tree data structure. a form of balanced tree used for storing and retrieving ordered data efficiently. AA trees are named for Arne Andersson, their inventor. -AA trees are a variation of the red-black tree, a form of binary search tree which supports efficient addition and deletion of entries. Unlike red-black trees, red nodes on an AA tree can only be added as a right subchild. In other words, no red node can be a left sub-child. This results in the simulation of a 2-3 tree instead of a 2-3-4 tree, which greatly simplifies the maintenance operations. The maintenance algorithms for a red-black tree need to consider seven different shapes to properly balance the tree (see image in wiki). An AA tree on the other hand only needs to consider two shapes due to the strict requirement that only right links can be red (see image in wiki). -Balancing rotations: -Whereas red-black trees require one bit of balancing metadata per node (the color), AA trees require O(log(N)) bits of metadata per node, in the form of an integer "level". The following invariants hold for AA trees: 1. The level of every leaf node is one. 2. The level of every left child is exactly one less than that of its parent. 3. The level of every right child is equal to or one less than that of its parent. 4. The level of every right grandchild is strictly less than that of its grandparent. 5. Every node of level greater than one has two children. A link where the child's level is equal to that of its parent is called a horizontal link, and is analogous to a red link in the red-black tree. Individual right horizontal links are allowed, but consecutive ones are forbidden; all left horizontal links are forbidden. These are more restrictive constraints than the analogous ones on red-black trees, with the result that re-balancing an AA tree is procedurally much simpler than re-balancing a red-black tree. Insertions and deletions may transiently cause an AA tree to become unbalanced (that is, to violate the AA tree invariants). Only two distinct operations are needed for restoring balance: "skew" and "split". Skew is a right rotation to replace a subtree containing a left horizontal link with one containing a right horizontal link instead. Split is a left rotation and level increase to replace a subtree containing two or more consecutive right horizontal links with one containing two fewer consecutive right horizontal links. Implementation of balance-preserving insertion and deletion is simplified by relying on the skew and split operations to modify the tree only if needed, instead of making their callers decide whether to skew or split.
WAVL tree (or weak AVL tree)
binary tree data structure. a self-balancing binary search tree. WAVL trees are named after AVL trees, another type of balanced search tree, and are closely related both to AVL trees and red-black trees, which all fall into a common framework of rank balanced trees. Like other balanced binary search trees, WAVL trees can handle insertion, deletion, and search operations in time O(log n) per operation.[1][2] -WAVL trees are designed to combine some of the best properties of both AVL trees and red-black trees. One advantage of AVL trees over red-black trees is that they are more balanced: they have height at most log (phi?) n= approx 1.44 log _{2}n (for a tree with n data items, where phi is the golden ratio), while red-black trees have larger maximum height, 2log _{2}n. If a WAVL tree is created using only insertions, without deletions, then it has the same small height bound that an AVL tree has. On the other hand, red-black trees have the advantage over AVL trees that they perform less restructuring of their trees. In AVL trees, each deletion may require a logarithmic number of tree rotation operations, while red-black trees have simpler deletion operations that use only a constant number of tree rotations. WAVL trees, like red-black trees, use only a constant number of tree rotations, and the constant is even better than for red-black trees.[1][2] -WAVL trees were introduced by Haeupler, Sen & Tarjan (2015). The same authors also provided a common view of AVL trees, WAVL trees, and red-black trees as all being a type of rank-balanced tree.
Binary tree*
binary tree data structure. a tree data structure in which each node has at most two children, which are referred to as the left child and the right child. A recursive definition using just set theory notions is that a (non-empty) binary tree is a triple (L, S, R), where L and R are binary trees or the empty set and S is a singleton set.[1] Some authors allow the binary tree to be the empty set as well.[2] -From a graph theory perspective, binary (and K-ary) trees as defined here are actually arborescences.[3] A binary tree may thus be also called a bifurcating arborescence[3]—a term which actually appears in some very old programming books,[4] before the modern computer science terminology prevailed. It is also possible to interpret a binary tree as an undirected, rather than a directed graph, in which case a binary tree is an ordered, rooted tree.[5] Some authors use rooted binary tree instead of binary tree to emphasize the fact that the tree is rooted, but as defined above, a binary tree is always rooted.[6] A binary tree is a special case of an ordered K-ary tree, where k is 2. -In computing, binary trees are seldom used solely for their structure. Much more typical is to define a labeling function on the nodes, which associates some value to each node.[7] Binary trees labelled this way are used to implement binary search trees and binary heaps, and are used for efficient searching and sorting. The designation of non-root nodes as left or right child even when there is only one child present matters in some of these applications, in particular it is significant in binary search trees.[8] In mathematics, what is termed binary tree can vary significantly from author to author. Some use the definition commonly used in computer science,[9] but others define it as every non-leaf having exactly two children and don't necessarily order (as left/right) the children either
Tango tree
binary tree data structure. a type of binary search tree proposed by Erik D. Demaine, Dion Harmon, John Iacono, and Mihai Patrascu in 2004. -It is an online binary search tree that achieves an O(log log n) competitive ratio relative to the optimal offline binary search tree, while only using O(log log n) additional bits of memory per node. This improved upon the previous best known competitive ratio, which was O(log n). -Tango trees work by partitioning a binary search tree into a set of preferred paths, which are themselves stored in auxiliary trees (so the tango tree is represented as a tree of trees). -First, we define for each node its preferred child, which informally is the most-recently touched child by a traditional binary search tree lookup. More formally, consider a subtree T, rooted at p, with children l (left) and r (right). We set r as the preferred child of p if the most recently accessed node in T is in the subtree rooted at r, and l as the preferred child otherwise. Note that if the most recently accessed node of T is p itself, then l is the preferred child by definition. -A preferred path is defined by starting at the root and following the preferred children until reaching a leaf node. Removing the nodes on this path partitions the remainder of the tree into a number of subtrees, and we recurse on each subtree (forming a preferred path from its root, which partitions the subtree into more subtrees).
T-tree*
binary tree data structure. a type of binary tree data structure that is used by main-memory databases, such as Datablitz, EXtremeDB, MySQL Cluster, Oracle TimesTen and MobileLite. -A T-tree is a balanced index tree data structure optimized for cases where both the index and the actual data are fully kept in memory, just as a B-tree is an index structure optimized for storage on block oriented secondary storage devices like hard disks. T-trees seek to gain the performance benefits of in-memory tree structures such as AVL trees while avoiding the large storage space overhead which is common to them. -T-trees do not keep copies of the indexed data fields within the index tree nodes themselves. Instead, they take advantage of the fact that the actual data is always in main memory together with the index so that they just contain pointers to the actual data fields. -The 'T' in T-tree refers to the shape of the node data structures in the original paper that first described this type of index. -Although T-trees seem to be widely used for main-memory databases, recent research indicates that they actually do not perform better than B-trees on modern hardware. -The main reason seems to be that the traditional assumption of memory references having uniform cost is no longer valid given the current speed gap between cache access and main memory access. -A T-tree node usually consists of pointers to the parent node, the left and right child node, an ordered array of data pointers and some extra control data. Nodes with two subtrees are called internal nodes, nodes without subtrees are called leaf nodes and nodes with only one subtree are named half-leaf nodes. A node is called the bounding node for a value if the value is between the node's current minimum and maximum value, inclusively. -For each internal node, leaf or half leaf nodes exist that contain the predecessor of its smallest data value (called the greatest lower bound) and one that contains the successor of its largest data value (called the least upper bound). Leaf and half-leaf nodes can contain any number of data elements from one to the maximum size of the data array. Internal nodes keep their occupancy between predefined minimum and maximum numbers of elements
Treap
binary tree data structure. heap data structure. The treap was first described by Cecilia R. Aragon and Raimund Seidel in 1989;[1][2] its name is a portmanteau of tree and heap. It is a Cartesian tree in which each key is given a (randomly chosen) numeric priority. As with any binary search tree, the inorder traversal order of the nodes is the same as the sorted order of the keys. The structure of the tree is determined by the requirement that it be heap-ordered: that is, the priority number for any non-leaf node must be greater than or equal to the priority of its children. Thus, as with Cartesian trees more generally, the root node is the maximum-priority node, and its left and right subtrees are formed in the same manner from the subsequences of the sorted order to the left and right of that node. -An equivalent way of describing the treap is that it could be formed by inserting the nodes highest-priority-first into a binary search tree without doing any rebalancing. Therefore, if the priorities are independent random numbers (from a distribution over a large enough space of possible priorities to ensure that two nodes are very unlikely to have the same priority) then the shape of a treap has the same probability distribution as the shape of a random binary search tree, a search tree formed by inserting the nodes without rebalancing in a randomly chosen insertion order. Because random binary search trees are known to have logarithmic height with high probability, the same is true for treaps. -Aragon and Seidel also suggest assigning higher priorities to frequently accessed nodes, for instance by a process that, on each access, chooses a random number and replaces the priority of the node with that number if it is higher than the previous priority. This modification would cause the tree to lose its random shape; instead, frequently accessed nodes would be more likely to be near the root of the tree, causing searches for them to be faster. -Naor and Nissim[3] describe an application in maintaining authorization certificates in public-key cryptosystems.
Randomized binary search tree
binary tree data structure. introduced by Martínez and Roura subsequently to the work of Aragon and Seidel on treaps, stores the same nodes with the same random distribution of tree shape, but maintains different information within the nodes of the tree in order to maintain its randomized structure. Rather than storing random priorities on each node, the randomized binary search tree stores a small integer at each node, the number of its descendants (counting itself as one); these numbers may be maintained during tree rotation operations at only a constant additional amount of time per rotation. When a key x is to be inserted into a tree that already has n nodes, the insertion algorithm chooses with probability 1/(n + 1) to place x as the new root of the tree, and otherwise it calls the insertion procedure recursively to insert x within the left or right subtree (depending on whether its key is less than or greater than the root). The numbers of descendants are used by the algorithm to calculate the necessary probabilities for the random choices at each step. Placing x at the root of a subtree may be performed either as in the treap by inserting it at a leaf and then rotating it upwards, or by an alternative algorithm described by Martínez and Roura that splits the subtree into two pieces to be used as the left and right children of the new node. -The deletion procedure for a randomized binary search tree uses the same information per node as the insertion procedure, and like the insertion procedure it makes a sequence of O(log n) random decisions in order to join the two subtrees descending from the left and right children of the deleted node into a single tree. If the left or right subtree of the node to be deleted is empty, the join operation is trivial; otherwise, the left or right child of the deleted node is selected as the new subtree root with probability proportional to its number of descendants, and the join proceeds recursively.
Tagged union (also called variant, variant record, discriminated union, or disjoint union)
composite data type. a data structure used to hold a value that could take on several different, but fixed, types. Only one of the types can be in use at any one time, and a tag field explicitly indicates which one is in use. It can be thought of as a type that has several "cases," each of which should be handled correctly when that type is manipulated. Like ordinary unions, tagged unions can save storage by overlapping storage areas for each type, since only one is in use at a time. -Tagged unions are most important in functional languages such as ML and Haskell, where they are called datatypes (see algebraic data type) and the compiler is able to verify that all cases of a tagged union are always handled, avoiding many types of errors. They can, however, be constructed in nearly any language, and are much safer than untagged unions, often simply called unions, which are similar but do not explicitly keep track of which member of the union is currently in use. -Tagged unions are often accompanied by the concept of a type constructor, which is similar but not the same as a constructor for a class. Type constructors produce a tagged union type, given the initial tag type and the corresponding type. -An enumerated type can be seen as a degenerate case: a tagged union of unit types. It corresponds to a set of nullary constructors and may be implemented as a simple tag variable, since it holds no additional data besides the value of the tag.
Record (also called tuple or struct)
composite data type. a record (also called struct or compound data) is a basic data structure. A record is a collection of fields ("members" in object-oriented programming or "elements"), possibly of different data types, typically in fixed number and sequence. Most modern computer languages allow the programmer to define new record types. The definition includes specifying the data type of each field and an identifier (name or label) by which it can be accessed. Records can exist in any storage medium, including main memory and mass storage devices such as magnetic tapes or hard disks. Records are a fundamental component of most data structures, especially linked data structures. Many computer files are organized as arrays of logical records, often grouped into larger physical records or blocks for efficiency. An object in object-oriented language is essentially a record that contains procedures specialized to handle that record; and object types are an elaboration of record types. Indeed, in most object-oriented languages, records are just special cases of objects, and are known as plain old data structures (PODSs), to contrast with objects that use OO features. A record can be viewed as the computer analog of a mathematical tuple. In the same vein, a record type can be viewed as the computer language analog of the Cartesian product of two or more mathematical sets, or the implementation of an abstract product type in a specific language.
Union
composite data type. a value that may have any of several representations or formats; or it is a data structure that consists of a variable that may hold such a value. Some programming languages support special data types, called union types, to describe such values and variables. In other words, a union type definition will specify which of a number of permitted primitive types may be stored in its instances, e.g., "float or long integer". Contrast with a record (or structure), which could be defined to contain a float and an integer; in a union, there is only one value at any given time. -A union can be pictured as a chunk of memory that is used to store variables of different data types. Once a new value is assigned to a field, the existing data is overwritten with the new data. The memory area storing the value has no intrinsic type (other than just bytes or words of memory), but the value can be treated as one of several abstract data types, having the type of the value that was last written to the memory area. -Because of the limitations of their use, untagged unions are generally only provided in untyped languages or in a type-unsafe way (as in C). They have the advantage over simple tagged unions of not requiring space to store a data type tag. -The name "union" stems from the type's formal definition. If a type is considered as the set of all values that that type can take on, a union type is simply the mathematical union of its constituting types, since it can take on any value any of its fields can. Also, because a mathematical union discards duplicates, if more than one field of the union can take on a single common value, it is impossible to tell from the value alone which field was last written. -However, one useful programming function of unions is to map smaller data elements to larger ones for easier manipulation. A data structure consisting, for example, of 4 bytes and a 32-bit integer, can form a union with an unsigned 64-bit integer, and thus be more readily accessed for purposes of comparison etc. -In C and C++, untagged unions are expressed nearly exactly like structures (structs), except that each data member begins at the same location in memory. The data members, as in structures, need not be primitive values, and in fact may be structures or even other unions. -The primary use of a union is allowing access to a common location by different data types, for example hardware input/output access, perhaps bitfield and word sharing. Unions also provide crude polymorphism. However, there is no checking of types, so it is up to the programmer to be sure that the proper fields are accessed in different contexts. The relevant field of a union variable is typically determined by the state of other variables, possibly in an enclosing struct.
Boolean data type
primitive data type. a data type, having two values (usually denoted true and false), intended to represent the truth values of logic and Boolean algebra. It is named after George Boole, who first defined an algebraic system of logic in the mid 19th century. The Boolean data type is primarily associated with conditional statements, which allow different actions and change control flow depending on whether a programmer-specified Boolean condition evaluates to true or false. It is a special case of a more general logical data type; logic does not always have to be Boolean. In programming languages that have a built-in Boolean data type, such as Pascal and Java, the comparison operators such as > and ≠ are usually defined to return a Boolean value. Conditional and iterative commands may be defined to test Boolean-valued expressions. Languages without an explicit Boolean data type, like C90 and Lisp, may still represent truth values by some other data type. Common Lisp uses an empty list for false, and any other value for true. C uses an integer type, where relational expressions like i > j and logical expressions connected by && and || are defined to have value 1 if true and 0 if false, whereas the test parts of if, while, for, etc., treat any non-zero value as true.[1][2] Indeed, a Boolean variable may be regarded (and be implemented) as a numerical variable with a single binary digit (bit), which can store only two values. It is worth noting that the implementation of Booleans in computers are most likely represented as a full word, rather than a bit; this is usually due to the ways computers transfer blocks of information. Most programming languages, even those that do not have an explicit Boolean type, have support for Boolean algebraic operations such as conjunction (AND, &, *), disjunction (OR, |, +), equivalence (EQV, =, ==), exclusive or/non-equivalence (XOR, NEQV, ^, !=), and negation (NOT, ~, !). In some languages, like Ruby, Smalltalk, and Alice the "true" and "false" values belong to separate classes—i.e. True and False, resp.—so there is no single Boolean "type." In SQL, which uses a three-valued logic for explicit comparisons because of its special treatment of Nulls, the Boolean data type (introduced in SQL:1999) is also defined to include more than two truth values, so that SQL "Booleans" can store all logical values resulting from the evaluation of predicates in SQL. A column of Boolean type can also be restricted to just TRUE and FALSE though.
Enumerated type
primitive data type. a small set of uniquely-named values. a data type consisting of a set of named values called elements, members, enumeral, or enumerators of the type. The enumerator names are usually identifiers that behave as constants in the language. An enumerated type can be seen as a degenerate tagged union of unit type. A variable that has been declared as having an enumerated type can be assigned any of the enumerators as a value. In other words, an enumerated type has values that are different from each other, and that can be compared and assigned, but are not specified by the programmer as having any particular concrete representation in the computer's memory; compilers and interpreters can represent them arbitrarily. (also called enumeration or enum, or factor in the R programming language, and a categorical variable in statistics)
Trees
In these data structures each tree node compares a bit slice of key values. Trie Radix tree Suffix tree Suffix array Compressed suffix array FM-index Generalised suffix tree B-trie Judy array X-fast trie Y-fast trie Merkle Tree Ctrie
Propositional directed acyclic graph (PDAG)
Graph data structure. a data structure that is used to represent a Boolean function. A Boolean function can be represented as a rooted, directed acyclic graph of the following form: 1.Leaves are labeled with [T] (true), [upsidedownT] (false), or a Boolean variable. 2.Non-leaves are [bigtriangleup] (logical and), [bigtriangledown with point on bottom] (logical or) and [Diamond] (logical not). 3.[bigtriangleup]- and [bigtriangledown]- nodes have at least one child. 4.[Diamond]- nodes have exactly one child. Leaves labeled with [T] ([upsidedownT]) represent the constant Boolean function which always evaluates to 1 (0). A leaf labeled with a Boolean variable x is interpreted as the assignment x=1, i.e. it represents the Boolean function which evaluates to 1 if and only if x=1. The Boolean function represented by a [bigtriangleup]-node is the one that evaluates to 1, if and only if the Boolean function of all its children evaluate to 1. Similarly, a [bigtriangledown]-node represents the Boolean function that evaluates to 1, if and only if the Boolean function of at least one child evaluates to 1. Finally, a [Diamond]-node represents the complemenatary Boolean function its child, i.e. the one that evaluates to 1, if and only if the Boolean function of its child evaluates to 0.
Hash trie
Hash data structure. may refer to: -Hash tree (persistent data structure), a trie used to map hash values to keys -A space-efficient implementation of a sparse trie, in which the descendants of each node may be interleaved in memory. (The name is suggested by a similarity to a closed hash table.) -A data structure which "combines features of hash tables and LC-tries (Least Compression tries) in order to perform efficient lookups and updates"
Quad-edge
Other data structure. a computer representation of the topology of a two-dimensional or three-dimensional map, that is, a graph drawn on a (closed) surface. The quad-edge data structure: 1.represents simultaneously both the map, its dual and mirror image. 2.can represent the most general form of a map, admitting vertices and faces of degree 1 and 2. 3.is a variant of the earlier winged edge data structure. -The fundamental idea behind the quad-edge structure is the recognition that a single edge, in a closed polygonal mesh topology, sits between exactly two faces and exactly two vertices. Thus, it can represent a dual of the graph simply by reversing the convention on what is a vertex and what is a face. |Uses| Much like Winged Edge, quad-edge structures are used in programs to store the topology of a 2D or 3D polygonal mesh. The mesh itself does not need to be closed in order to form a valid quad-edge structure. -Using a quad-edge structure, iterating through the topology is quite easy. Often, the interface to quad-edge topologies is through directed edges. This allows the two vertices to have explicit names (start and end), and this gives faces explicit names as well (left and right, relative to a person standing on start and looking in the direction of end). The four edges are also given names, based on the vertices and faces: start-left, start-right, end-left, and end-right. A directed edge can be reversed to generate the edge in the opposite direction. -Iterating around a particular face only requires having a single directed edge to which that face is on the left (by convention) and then walking through all of the start-left edges until the original edge is reached.
BK-tree
space partitioning or binary space partitioning data structure. a metric tree suggested by Walter Austin Burkhard and Robert M. Keller[1] specifically adapted to discrete metric spaces. For simplicity, let us consider integer discrete metric d(x,y). Then, BK-tree is defined in the following way. An arbitrary element a is selected as root node. The root node may have zero or more subtrees. The k-th subtree is recursively built of all elements b such that d(a,b)=k. BK-trees can be used for approximate string matching in a dictionary
Generalised suffix tree
tree data structure where each tree node compares a bit slice of key values. a suffix tree for a set of strings. Given the set of strings D=S_{1},S_{2},...,S_{d} of total length n, it is a Patricia tree containing all n suffixes of the strings. It is mostly used in bioinformatics.[1]
binary search algorithm
-binary search, also known as half-interval search[1] or logarithmic search,[2] is a search algorithm that finds the position of a target value within a sorted array. It compares the target value to the middle element of the array; if they are unequal, the half in which the target cannot lie is eliminated and the search continues on the remaining half until it is successful. -Binary search runs in at worst logarithmic time, making O(log n) comparisons, where n is the number of elements in the array and log is the binary logarithm; and using only constant (O(1)) space. Although specialized data structures designed for fast searching—such as hash tables—can be searched more efficiently, binary search applies to a wider range of search problems. -Although the idea is simple, implementing binary search correctly requires attention to some subtleties about its exit conditions and midpoint calculation. -There exist numerous variations of binary search. One variation in particular (fractional cascading) speeds up binary searches for the same value in multiple arrays.
Hash table (or hash map)*
Hash data structure. a data structure used to implement an associative array, a structure that can map keys to values. A hash table uses a hash function to compute an index into an array of buckets or slots, from which the desired value can be found. -Ideally, the hash function will assign each key to a unique bucket, but it is possible that two keys will generate an identical hash causing both keys to point to the same bucket. Instead, most hash table designs assume that hash collisions—different keys that are assigned by the hash function to the same bucket—will occur and must be accommodated in some way. -In a well-dimensioned hash table, the average cost (number of instructions) for each lookup is independent of the number of elements stored in the table. Many hash table designs also allow arbitrary insertions and deletions of key-value pairs, at (amortized[2]) constant average cost per operation. -In many situations, hash tables turn out to be more efficient than search trees or any other table lookup structure. For this reason, they are widely used in many kinds of computer software, particularly for associative arrays, database indexing, caches, and sets.
MinHash (or the min-wise independent permutations locality sensitive hashing scheme)
Hash data structure. a technique for quickly estimating how similar two sets are. The scheme was invented by Andrei Broder (1997), and initially used in the AltaVista search engine to detect duplicate web pages and eliminate them from search results. It has also been applied in large-scale clustering problems, such as clustering documents by the similarity of their sets of words. -The Jaccard similarity coefficient is a commonly used indicator of the similarity between two sets. |Applications| The original applications for MinHash involved clustering and eliminating near-duplicates among web documents, represented as sets of the words occurring in those documents.[1][2] Similar techniques have also been used for clustering and near-duplicate elimination for other types of data, such as images: in the case of image data, an image can be represented as a set of smaller subimages cropped from it, or as sets of more complex image feature descriptions.[6] -In data mining, Cohen et al. (2001) use MinHash as a tool for association rule learning. Given a database in which each entry has multiple attributes (viewed as a 0-1 matrix with a row per database entry and a column per attribute) they use MinHash-based approximations to the Jaccard index to identify candidate pairs of attributes that frequently co-occur, and then compute the exact value of the index for only those pairs to determine the ones whose frequencies of co-occurrence are below a given strict threshold.
Quotient filter
Hash data structure. introduced by Bender et al. in 2011. the quotient filter is a space-efficient probabilistic data structure used to test whether an element is a member of a set (an approximate member query filter, AMQ). A query will elicit a reply specifying either that the element is definitely not in the set or that the element is probably in the set. The former result is definitive; i.e., the test does not generate false negatives. But with the latter result there is some probability, ε, of the test returning "element is in the set" when in fact the element is not present in the set (i.e., a false positive). There is a tradeoff between ε, the false positive rate, and storage size; increasing the filter's storage size reduces ε. Other AMQ operations include "insert" and "optionally delete". The more elements are added to the set, the larger the probability of false positives. -An approximate member query (AMQ) filter used to speed up answers in a key-value storage system. Key-value pairs are stored on a disk which has slow access times. AMQ filter decisions are much faster. However some unnecessary disk accesses are made when the filter reports a positive (in order to weed out the false positives). Overall answer speed is better with the AMQ filter than without it. Use of an AMQ filter for this purpose, however, does increase memory usage. A typical application for quotient filters, and other AMQ filters, is to serve as a proxy for the keys in a database on disk. As keys are added to or removed from the database, the filter is updated to reflect this. Any lookup will first consult the fast quotient filter, then look in the (presumably much slower) database only if the quotient filter reported the presence of the key. If the filter returns absence, the key is known not to be in the database without any disk accesses having been performed. -A quotient filter has the usual AMQ operations of insert and query. In addition it can also be merged and re-sized without having to re-hash the original keys (thereby avoiding the need to access those keys from secondary storage). This property benefits certain kinds of log-structured merge-trees.
Winged edge
Other data structure. a data representation used to describe polygon models in computer graphics. It explicitly describes the geometry and topology of faces, edges, and vertices when three or more surfaces come together and meet at a common edge. The ordering is such that the surfaces are ordered counter-clockwise with respect to the innate orientation of the intersection edge. Moreover the representation allows numerically unstable situations like that depicted below. -The winged edge data structure allows for quick traversal between faces, edges, and vertices due to the explicitly linked structure of the network. This rich form of specifying an unstructured grid is in contrast to simpler specifications of polygon meshes such as a node and element list, or the implied connectivity of a regular grid. It has application in common modeling operations such as subdivision, extrusion etc.
Symbol table
Other data structure. a data structure used by a language translator such as a compiler or interpreter, where each identifier in a program's source code is associated with information relating to its declaration or appearance in the source. |Implementation| A common implementation technique is to use a hash table. A compiler may use one large symbol table for all symbols or use separated, hierarchical symbol tables for different scopes. There are also trees, linear lists and self-organizing lists which can be used to implement a symbol table. It also simplifies the classification of literals in tabular format. The symbol table is accessed by most phases of a compiler, beginning with the lexical analysis to optimization. |Uses| An object file will contain a symbol table of the identifiers it contains that are externally visible. During the linking of different object files, a linker will use these symbol tables to resolve any unresolved references. -A symbol table may only exist during the translation process, or it may be embedded in the output of that process for later exploitation, for example, during an interactive debugging session, or as a resource for formatting a diagnostic report during or after execution of a program. -While reverse engineering an executable, many tools refer to the symbol table to check what addresses have been assigned to global variables and known functions. If the symbol table has been stripped or cleaned out before being converted into an executable, tools will find it harder to determine addresses or understand anything about the program. -At that time of accessing variables and allocating memory dynamically, a compiler should perform many works and as such the extended stack model requires the symbol table.
Weight-balanced tree (WBTs)
binary tree data structure. a type of self-balancing binary search trees that can be used to implement dynamic sets, dictionaries (maps) and sequences.[1] These trees were introduced by Nievergelt and Reingold in the 1970s as trees of bounded balance, or BB[α] trees.[2][3] Their more common name is due to Knuth.[4] Like other self-balancing trees, WBTs store bookkeeping information pertaining to balance in their nodes and perform rotations to restore balance when it is disturbed by insertion or deletion operations. Specifically, each node stores the size of the subtree rooted at the node, and the sizes of left and right subtrees are kept within some factor of each other. Unlike the balance information in AVL trees (which store the height of subtrees) and red-black trees (which store a fictional "color" bit), the bookkeeping information in a WBT is an actually useful property for applications: the number of elements in a tree is equal to the size of its root, and the size information is exactly the information needed to implement the operations of an order statistic tree, viz., getting the n'th largest element in a set or determining an element's index in sorted order.[5] -Weight-balanced trees are popular in the functional programming community and are used to implement sets and maps in MIT Scheme, SLIB and implementations of Haskell. -A weight-balanced tree is a binary search tree that stores the sizes of subtrees in the nodes. That is, a node has fields ...key, of any ordered type ...value (optional, only for mappings) ...left, right, pointer to node ...size, of type integer. By definition, the size of a leaf (typically represented by a nil pointer) is zero. The size of an internal node is the sum of sizes of its two children, plus one (size[n] = size[n.left] + size[n.right] + 1). Based on the size, one defines the weight as weight[n] = size[n] + 1
Fenwick tree (binary indexed tree)
multiway tree data structure. a data structure that can efficiently update elements and calculate prefix sums in a table of numbers. This structure was proposed by Peter Fenwick in 1994 to improve the efficiency of arithmetic coding compression algorithms.[1] -There is a trade-off between the efficiency of element update and prefix sum calculation. In a flat array of n numbers, calculating prefix sum requires O(n) time, while in an array of prefix sums, updating elements requires O(n) time. Fenwick trees allow both operations to be performed in O(\log n) time. This is achieved by representing the numbers as a tree, where the value of each node is the sum of the numbers in that subtree. The tree structure allows operations to be performed using only O(\log n) node accesses.
Skew heap (or self-adjusting heap)
heap data structure. a heap data structure implemented as a binary tree. Skew heaps are advantageous because of their ability to merge more quickly than binary heaps. In contrast with binary heaps, there are no structural constraints, so there is no guarantee that the height of the tree is logarithmic. Only two conditions must be satisfied: The general heap order must be enforced Every operation (add, remove_min, merge) on two skew heaps must be done using a special skew heap merge. A skew heap is a self-adjusting form of a leftist heap which attempts to maintain balance by unconditionally swapping all nodes in the merge path when merging two heaps. (The merge operation is also used when adding and removing values.) With no structural constraints, it may seem that a skew heap would be horribly inefficient. However, amortized complexity analysis can be used to demonstrate that all operations on a skew heap can be done in O(log n).
Disjoint-set data structure (also called a union-find data structure or merge-find set)
multiway tree data structure. a data structure that keeps track of a set of elements partitioned into a number of disjoint (nonoverlapping) subsets. It supports two useful operations: Find: Determine which subset a particular element is in. Find typically returns an item from this set that serves as its "representative"; by comparing the result of two Find operations, one can determine whether two elements are in the same subset. Union: Join two subsets into a single subset. The other important operation, MakeSet, which makes a set containing only a given element (a singleton), is generally trivial. With these three operations, many practical partitioning problems can be solved (see the Applications section). In order to define these operations more precisely, some way of representing the sets is needed. One common approach is to select a fixed element of each set, called its representative, to represent the set as a whole. Then, Find(x) returns the representative of the set that x belongs to, and Union takes two set representatives as its arguments.
(a,b)-tree
multiway tree data structure. a kind of balanced search tree. An (a,b)-tree has all of its leaves at the same depth, and all internal nodes except for the root have between a and b children, where a and b are integers such that 2 ≤ a ≤ (b+1)/2. The root has, if it is not a leaf, between 2 and b children.
K-ary tree (also sometimes known as a k-way tree, an N-ary tree, or an M-ary tree)
multiway tree data structure. a rooted tree in which each node has no more than k children. A binary tree is the special case where k=2.
Van Emde Boas tree (or Van Emde Boas priority queue, also known as a vEB tree)
multiway tree data structure. is a tree data structure which implements an associative array with m-bit integer keys. It performs all operations in O(log m) time, or equivalently in O(log log M) time, where M = 2m is the maximum number of elements that can be stored in the tree. The M is not to be confused with the actual number of elements stored in the tree, by which the performance of other tree data-structures is often measured. The vEB tree has good space efficiency when it contains a large number of elements, as discussed below. It was invented by a team led by Dutch computer scientist Peter van Emde Boas in 1975.
FM-index
tree data structure where each tree node compares a bit slice of key values. a compressed full-text substring index based on the Burrows-Wheeler transform, with some similarities to the suffix array. It was created by Paolo Ferragina and Giovanni Manzini,[1] who describe it as an opportunistic data structure as it allows compression of the input text while still permitting fast substring queries. The name stands for Full-text index in Minute space.[2] It can be used to efficiently find the number of occurrences of a pattern within the compressed text, as well as locate the position of each occurrence. Both the query time and storage space requirements are sublinear[clarification needed] with respect to the size of the input data. The original authors have devised improvements to their original approach and dubbed it "FM-Index version 2".[3] A further improvement, the alphabet-friendly FM-index, combines the use of compression boosting and wavelet trees [4] to significantly reduce the space usage for large alphabets. The FM-index has found use in, among other places, bioinformatics.[5]
Heaps
tree data structure. a specialized tree-based data structure that satisfies the heap property: If A is a parent node of B then the key (the value) of node A is ordered with respect to the key of node B with the same ordering applying across the heap. A heap can be classified further as either a "max heap" or a "min heap". In a max heap, the keys of parent nodes are always greater than or equal to those of the children and the highest key is in the root node. In a min heap, the keys of parent nodes are less than or equal to those of the children and the lowest key is in the root node. Heaps are crucial in several efficient graph algorithms such as Dijkstra's algorithm, and in the sorting algorithm heapsort. A common implementation of a heap is the binary heap, in which the tree is a complete binary tree (see figure). -In a heap, the highest (or lowest) priority element is always stored at the root. A heap is not a sorted structure and can be regarded as partially ordered. As visible from the heap-diagram, there is no particular relationship among nodes on any given level, even among the siblings. When a heap is a complete binary tree, it has a smallest possible height—a heap with N nodes always has log N height. A heap is a useful data structure when you need to remove the object with the highest (or lowest) priority. -Note that, as shown in the graphic, there is no implied ordering between siblings or cousins and no implied sequence for an in-order traversal (as there would be in, e.g., a binary search tree). The heap relation mentioned above applies only between nodes and their parents, grandparents, etc. The maximum number of children each node can have depends on the type of heap, but in many types it is at most two, which is known as a binary heap. -The heap is one maximally efficient IMPLEMENTATION of an abstract data type called a priority queue, and in fact priority queues are often referred to as "heaps", regardless of how they may be implemented. -A heap data structure should not be confused with the heap which is a common name for the pool of memory from which dynamically allocated memory is allocated. The term was originally used only for the data structure. -Types Heap Binary heap Weak heap Binomial heap Fibonacci heap AF-heap Leonardo Heap 2-3 heap Soft heap Pairing heap Leftist heap Treap Beap Skew heap Ternary heap D-ary heap Brodal queue
Multiway trees
Ternary tree K-ary tree And-or tree (a,b)-tree Link/cut tree SPQR-tree Spaghetti stack Disjoint-set data structure Fusion tree Enfilade Exponential tree Fenwick tree Van Emde Boas tree Rose tree
arrays data structure
Array Bit array Bit field Bitboard Bitmap Circular buffer Control table Image Dope vector Dynamic array Gap buffer Hashed array tree Heightmap Lookup table Matrix Parallel array Sorted array Sparse array Sparse matrix Iliffe vector Variable-length array
Red-black tree
binary tree data structure. The leaf nodes of red-black trees do not contain data. These leaves need not be explicit in computer memory—a null child pointer can encode the fact that this child is a leaf—but it simplifies some algorithms for operating on red-black trees if the leaves really are explicit nodes. To save memory, sometimes a single sentinel node performs the role of all leaf nodes; all references from internal nodes to leaf nodes then point to the sentinel node. Red-black trees, like all binary search trees, allow efficient in-order traversal (that is: in the order Left-Root-Right) of their elements. The search-time results from the traversal from root to leaf, and therefore a balanced tree of n nodes, having the least possible tree height, results in O(log n) search time.
Integer
primitive data type. integral or fixed-precision values. a datum of integral data type, a data type which represents some finite subset of the mathematical integers. Integral data types may be of different sizes and may or may not be allowed to contain negative values. Integers are commonly represented in a computer as a group of binary digits (bits). The size of the grouping varies so the set of integer sizes available varies between different types of computers. Computer hardware, including virtual machines, nearly always provide a way to represent a processor register or memory address as an integer. The most common representation of a positive integer is a string of bits, using the binary numeral system. see table in wikipedia.
Skip list*
List data structure. a data structure that allows fast search within an ordered sequence of elements. Fast search is made possible by maintaining a linked hierarchy of subsequences, each skipping over fewer elements. Searching starts in the sparsest subsequence until two consecutive elements have been found, one smaller and one larger than or equal to the element searched for. Via the linked hierarchy, these two elements link to elements of the next sparsest subsequence, where searching is continued until finally we are searching in the full sequence. The elements that are skipped over may be chosen probabilistically [2] or deterministically,[3] with the former being more common. -A schematic picture of the skip list data structure. Each box with an arrow represents a pointer and a row is a linked list giving a sparse subsequence; the numbered boxes at the bottom represent the ordered data sequence. Searching proceeds downwards from the sparsest subsequence at the top until consecutive elements bracketing the search element are found. -A skip list is built in layers. The bottom layer is an ordinary ordered linked list. Each higher layer acts as an "express lane" for the lists below, where an element in layer i appears in layer i+1 with some fixed probability p (two commonly used values for p are 1/2 or 1/4). On average, each element appears in 1/(1-p) lists, and the tallest element (usually a special head element at the front of the skip list) in all the lists, {\displaystyle \log _{1/p}n\,} \log _{1/p}n\, of them. A search for a target element begins at the head element in the top list, and proceeds horizontally until the current element is greater than or equal to the target. If the current element is equal to the target, it has been found. If the current element is greater than the target, or the search reaches the end of the linked list, the procedure is repeated after returning to the previous element and dropping down vertically to the next lower list. The expected number of steps in each linked list is at most 1/p, which can be seen by tracing the search path backwards from the target until reaching an element that appears in the next higher list or reaching the beginning of the current list. Therefore, the total expected cost of a search is (log _{1/p}n)/p, which is O(log n) when p is a constant. By choosing different values of p, it is possible to trade search costs against storage costs.
Order statistic tree
binary tree data structure. a variant of the binary search tree (or more generally, a B-tree[1]) that supports two additional operations beyond insertion, lookup and deletion: -Select(i) — find the i'th smallest element stored in the tree -Rank(x) - find the rank of element x in the tree, i.e. its index in the sorted list of elements of the tree -Both operations can be performed in O(log n) worst case time when a self-balancing tree is used as the base data structure.