Chapter 3

¡Supera tus tareas y exámenes ahora con Quizwiz!

Why not use a method like the less() method that we used for sorting?

Equality plays a special role in symbol tables, so we also would need a method for testing equality. To avoid proliferation of methods that have essentially the same function, we adopt the built-in Java methods equals() and compareTo().

How does Java implement hashCode() for Integer, Double, and Long?

For Integer it just returns the 32-bit value. For Double and Long it returns the exclusive or of the first 32 bits with the second 32 bits of the standard machine representation of the number. These choices may not seem to be very random, but they do serve the purpose of spreading out the values.

I've seen BSTs before, but not using recursion. What are the tradeoffs?

Generally, recursive implementations are a bit easier to verify for correctness; nonrecursive implementations are a bit more efficient. See Exercise 3.2.13 for an implementation of get(), the one case where you might notice the improved efficiency. If trees are unbalanced, the depth of the function-call stack could be a problem in a recursive implementation. Our primary reason for using recursion is to ease the transition to the balanced BST implementations of the next section, which definitely are easier to implement and debug with recursion.

Why not use BinarySearchST or RedBlackBST instead of SequentialSearchST in Algorithm 3.5?

Generally, we set parameters so as to make the number of keys hashing to each value small, and elementary symbol tables are generally better for the small tables. In certain situations, slight performance gains may be achieved with such hybrid methods, but such tuning is best left for experts.

Why not declare key[] as Object[] (instead of Comparable[]) in BinarySearchST before casting, in the same way that val[] is declared as Object?

Good question. If you do so, you will get a ClassCastException because keys need to be Comparable (to ensure that entries in key[] have a compareTo() method). Thus, declaring key[] as Comparable[] is required. Delving into the details of programming- language design to explain the reasons would take us somewhat off topic. We use precisely this idiom (and nothing more complicated) in any code that uses Comparable generics and arrays throughout this book.

Why not use an array of Key values to represent 2-, 3-, and 4-nodes with a single Node type?

Good question. That is precisely what we do for B-trees (see Chapter 6), where we allow many more keys per node. For the small nodes in 2-3 trees, the overhead for the array is too high a price to pay.

Why would I need FileIndex? Doesn't my operating system solve this problem

If you are using an OS that meets your needs, continue to do so, by all means. As with many of our programs, FileIndex is intended to show you the basic underlying mechanisms of such applications and to suggest possibilities to you.

Presorting the table as discussed on page 385 seems like a good idea. Why relegate that to an exercise (see Exercise 3.1.12)?

Indeed, this may be the method of choice in some applications. But adding a slow insert method to a data structure designed for fast search "for convenience" is a performance trap, because an unsuspecting client might intermix searches and inserts in a huge table and experience quadratic performance. Such traps are all too common, so that "buyer beware" is certainly appropriate when using software developed by others, particularly when interfaces are too wide. This problem becomes acute when a large number of methods are included "for convenience" leaving performance traps throughout, while a client might expect efficient implementations of all methods. Java's ArrayList class is an example (see Exercise 3.5.27).

Is hashing faster than searching in red-black BSTs?

It depends on the type of the key, which determines the cost of computing hashCode() versus the cost of compareTo(). For typical key types and for Java default implementations, these costs are similar, so hashing will be significantly faster, since it uses only a constant number of operations. But it is important to remember that this question is moot if you need ordered operations, which are not efficiently supported in hash tables. See Section 3.5 for further discussion.

What if we need to associate multiple values with the same key? For example, if we use Date as a key in an application, wouldn't we have to process equal keys?

Maybe, maybe not. For example, you can't have two trains arrive at the station on the same track at the same time (but they could arrive on different tracks at the same time). There are two ways to handle the situation: use some other information to disambiguate or make the value a Queue of values having the same key. We consider applications in detail in Section 3.5.

Why bother with equals() ? Why not just use compareTo() throughout?

Not all data types lead to key values that are easy to compare, even though having a symbol table still might make sense. To take an extreme example, you may wish to use pictures or songs as keys. There is no natural way to compare them, but we can certainly test equality (with some work).

So, why not implement hash(x) by returning Math.abs(x.hashcode()) % M?

Nice try. Unfortunately, Math.abs() returns a negative result for the largest negative number. For many typical calculations, this overflow presents no real problem, but for hashing it would leave you with a program that is likely to crash after a few billion inserts, an unsettling possibility. For example, s.hashCode() is 231 for the Java String value "polygenelubricants". Finding other strings that hash to this value (and to 0) has turned into an amusing algorithm-puzzle pastime.

Why not let the linear probing table get, say, three-quarters full?

No particular reason. You can choose any value of , using Proposition M to estimate search costs. For = 3/4, the average cost of search hits is 2.5 and search misses is 8.5, but if you let grow to 7/8, the average cost of a search miss is 32.5, perhaps more than you want to pay. As gets close to 1, the estimate in Proposition M becomes invalid, but you don't want your table to get that close to being full.

Can a SET be null?

No. A SET can be empty (contain no objects), but not null. As with any Java data type, a variable of type SET can have the value null, but that just indicates that it does not reference any SET. The result of using new to create a SET is always an object that is not null.

Can a SET contain null?

No. As with symbol tables, keys are non-null objects.

If all my data is in memory, there is no real reason to use a filter, right?

Right. Filtering really shines in the case when you have no idea how much data to expect. Otherwise, it may be a useful way of thinking, but not a cure-all.

Why not have the dot() method in SparseVector take a SparseVector object as argument and return a SparseVector object?

That is a fine alternate design and a nice programming exercise that requires code that is a bit more intricate than for our design (see Exercise 3.5.16). For general matrix processing, it might be worthwhile to also add a SparseMatrix type.

Why not use an Item type that implements Comparable for symbol tables, in the same way as we did for priority queues in Section 2.4, instead of having separate keys and values ?

That is also a reasonable option. These two approaches illustrate two different ways to associate information with keys—we can do so implicitly by building a data type that includes the key or explicitly by separating keys from values. For symbol tables, we have chosen to highlight the associative array abstraction. Note also that a client specifies just a key in search, not a key-value aggregation.

Maintaining the node count field in Node seems to require a lot of code. Is it really necessary? Why not maintain a single instance variable containing the number of nodes in the tree to implement the size() client method?

The rank() and select() methods need to have the size of the subtree rooted at each node. If you are not using these ordered operations, you can streamline the code by eliminating this field (see Exercise 3.2.12). Keeping the node count correct for all nodes is admittedly error-prone, but also a good check for debugging. You might also use a recursive method to implement size() for clients, but that would take linear time to count all the nodes and is a dangerous choice because you might experience poor performance in a client program, not realizing that such a simple operation is so expensive.

Why not let the 3-nodes lean either way and also allow 4-nodes in the trees?

Those are fine alternatives, used by many for decades. You can learn about several of these alternatives in the exercises. The left-leaning convention reduces the number of cases and therefore requires substantially less code.

Why not allow keys to take the value null?

We assume that Key is an Object because we use it to invoke compareTo() or equals(). But a call like a.compareTo(b) would cause a null pointer exception if a is null. By ruling out this possibility, we allow for simpler client code.

I've forgotten. Why don't we implement hash(x) by returning x.hashCode() % M?

We need a result between 0 and M-1, but in Java, the % function may be negative.

When we split a 4-node, we sometimes set the color of the right node to RED in rotateRight() and then immediately set it to BLACK in flipColors(). Isn't that wasteful?

Yes, and we also sometimes unnecessarily recolor the middle node. In the grand scheme of things, resetting a few extra bits is not in the same league with the improvement from linear to logarithmic that we get for all operations, but in performance-critical applications, you can put the code for rotateRight() and flipColors() inline and eliminate those extra tests. We use those methods for deletion, as well, and find them slightly easier to use, understand, and maintain by making sure that they preserve perfect black balance.

When using array resizing, the size of the table is always a power of 2. Isn't that a potential problem, because it only uses the least significant bits of hashCode()?

Yes, particularly with the default implementations. One way to address this problem is to first distribute the key values using a prime larger than M, as in the following example: private int hash(Key x) { int t = x.hashCode() & 0x7fffffff; if (lgM < 26) t = t % primes[lgM+5]; return t % M; } This code assumes that we maintain an instance variable lgM that is equal to lg M (by initializing to the appropriate value, incrementing when doubling, and decrementing when halving) and an array primes[] of the smallest prime greater than each power of 2 (see the table at right). The constant 5 is an arbitrary choice—we expect the first % to distribute the values equally among the values less than the prime and the second to map about five of those values to each value less than M. Note that the point is moot for large M.

I have data in a spreadsheet. Can I develop something like LookupCSV to search through it?

Your spreadsheet application probably has an option to export to a .csv file, so you can use LookupCSV directly.


Conjuntos de estudio relacionados

Sustainability in Construction Management

View Set

Pathopharmacology III Exam 1 Drugs

View Set

Unit 1 Lesson 4: The British Origins of American Constitutionalism (Concepts/Principles)

View Set

Overview of TCP/IP applications

View Set

Midterm: ATI Mental Health Unit 4 (Ch. 21-26) and Chapter 31 Practice Questions

View Set

Exam 3 Chapter 10 Warehousing Management (True/False)

View Set

COGS1000 - Introduction to Neuroscience 1

View Set