Big Data

Lakukan tugas rumah & ujian kamu dengan baik sekarang menggunakan Quizwiz!

Hadoop Configuring ReduceTasks

# of MapTasks is determined by how many splits getSplits() returns # of ReduceTasks is determined by configuration rather than data

MapReduce map() and reduce() Caveat

A true map() is always 1:1, while MapReduce's map() is 1:N for some N>=0 A true reduce is always M:1, while MapReduce's is M:N for some N>=0 MapReduce's reduce() is performed for each key rather than once globally MapReduce's map() and reduce() are both more like a flatMap()

Joins in MapReduce

Add the files for both sides of the join as inputs to the MapReduce job. Key the map() output by the join condition and let MapReduce do the sorting. Perform a key-scoped nested loop within your reduce() implementation.

TaskTrackers (Responsibilities in Hadoop MapReduce)

Advertises "slots" for running MapTasks and ReduceTasks, respectively Receives and acknowledges task assignments from the JobTracker Spawns a JVM for each task and captures process output and exit code Periodically updates the JobTracker with the statuses of assigned tasks

DataNodes (HDFS)

Advertises local disk resources for storing blocks of files Receives and acknowledges block assignments from the NameNode Exposes an RPC interfaces that clients use to read and write blocks Periodically updates the NameNode with the statuses of assigned blocks

NodeManagers

Advertises resources (vCPU, memory) available for running containers Receives and acknowledges assignments from the ResourceManager Spawns each container and captures process output and exit code Periodically updates the ResourceManager with container statuses

Supplying reduce()'s input

After the shuffle, each ReduceTask has a single, sorted file of map() outputs - If a ReduceTask has any output key, it will have all outputs with that key - If a ReduceTask has any output key, no other ReduceTask has that key The ReduceTask iterates this sorted file, grouping each time the key changes The keys and values Iterator are deserialized lazily, requiring little memory

Supplying reduce()'s input (revisited)

After the shuffle, each ReduceTask has a single, sorted file of map() outputs - If a ReduceTask has any output with a key, it will have all the outputs with that key - If a ReduceTask has any output with a key, no other ReduceTask has that key The ReduceTask iterates this sorted file, grouping each time the key changes The keys and values are deserialized lazily, requiring little additional memory

Traditional Shared-Disk Architecture

Allows processors to access any disk in the system Includes Network Attached Storage (NAS), like our NFS home directories

HDFS (Hadoop Distributed File System)

Almost identical to the Google File System Namenode = GFS Master Datanode = GFS Chunkserver Blocks = Chunks Maintains the strict separation of metadata and data

MapReduce vs. GNU Parallel

Both developed around the same time in the early/mid 2000s You can actually use GNU Parallel parallelize across multiple machines Essential difference is that GNU Parallel does not perform a "shuffle" - GNU supports M Mappers, but only a single Reducer - Unlike MapReduce, GNU Parallel often requires no additional infrastructure - GNU Parallel will handle copying scripts and removing them when done - GNU Parallel supports many other useful options, including retry failures

Exploring the Split Files

Both job.split and job.splitmetainfo include information about the input splits - job.splitmetainfo is for the JobTracker and job.split is shared by the MapTasks These files use Hadoop's own serialization and thus require custom readers

Storing Results

By default, Hive writes results to HDFS and then fetches and prints the results For non interactive workloads you will needs to persist the results by: -- Create a new (managed) Hive table to store the results. CREATE TABLE review_counts_by_stars AS SELECT stars, COUNT(*) FROM reviews GROUP BY stars; -- Write the results to a directory on HDFS that is external to Hive. INSERT OVERWRITE DIRECTORY 'output' SELECT stars, COUNT(*) FROM reviews GROUP BY stars;

Hadoop Mapper

Class rather than an Interface with a sensible default - KEYIN and VALUEIN, rather than IN in SimpleMapReduce

The Client Application (Hadoop Execution Model)

Client configures the Job and then submits it to the cluster over RPC - details handled by the framework hidden by Job.submit() The code within your single JAR file will be executed in two different places: - Your main() method will be called on the machine where you run hadoop jar - Your Mapper and Reducer classes will run wherever the framework decides

Supplying the map()'s input

Client uses InputFormat to configure MapReduce's input Job.submit() calls InputFormat.getSplits() on the client to divide up the work Framework will then create one MapTask for each returned InputSplit Within each MapTask, the framework creates a RecordReader for its split - Uses this RecordReader to parse the Writable key/value pairs - Logic encapsulated within MapContext and ReduceContext

Velocity (The Three V's)

Data is constantly growing or changing - Processing must both respond quickly to changes and also scale predictably

Big Data

Data that can't be effectively processed using standard methods of the time

HDFS Blocks vs. MapReduce Splits

DataNode stores HDFS blocks, not splits FileInputFormat attempts to keep the two aligned, but they're rarely identical - TextInputFormat, in particular, "spills" across blocks looking for the final newline Even local tasks will often need to read a few bytes from the remote DataNode Job can additionally bound their split sizes or use CombineFileInputFormat // Bound split size between 2MB and 256MB, if the file allow it. FileInputFormat.setMinInputSplitSize(job, 2 * 1024 * 1024); FileInputFormat.setMaxInputSplitSize(job, 256 * 1024 * 1024);

Assigning data to nodes

Different systems give this concept different names - shards, slices, partitions The basic idea is similar to how you might "check in" at some large event: - Coordinator + different partitions

Querying in MPP

Each node in MPP system is responsible for handling the part of the query that involves its own local data The "coordinator" is responsible for splitting the query up and sending it to each node and for combining each of the node's individual results Some have dedicated coordinator role while others allow any node to act as the coordinator for the given query

MPP's "Shared-Nothing" Architecture

Each processor able to access only the disk that they specifically "own" Operations on one node no longer impact the other nodes in the system

Sort Merge Join

Either requires that the inputs to already be sorted - a common occurrence in database indexes - or as a first step sorts them, if needed Two inputs are then traversed in parallel, and the equivalent of a nested loop join scoped only to the current key is performed to produce all matching pairs

Mapper run() method

Framework finally calls back into your application code via the run() methods of your Mapper and Reducer classes, passing only the Context object, in this case the MapContext Call to run happens within the run() methods of MapTask and ReduceTask, respectively Most Mappers will use the default run() method, but still technically part of your code and therefore yours to use

GFS

GFS master tells client to write chunks to GFS chunkserver - Master looksup file in chunk index which will locate the file in the correct GFS chunk server Master has complete global namespace - Master in charge of control plane - Client has access to data plane after talking to Master in charge of control plane - Clients can't write or read new chunks, but can access chunks with already stored chunk handle - That way master isn't a single point of file system failure\ - separation of data plane and control plane

Hadoop 2.0 & YARN

Hadoop 2.0 introduced YARN, short for "Yet Another Resource Negotiator" The idea of YARN was once again to "do what Google did", this time with Borg Rather than MapReduce specific, YARN provides general cluster management YARN was not the only Apache project following the footsteps of Borg: - Mesos

Hive Compiler

Hive compiles your HiveQL to a directed acyclic graph (DAG) of zero or more MapReduce jobs, at least when the hive.execution.engine property is set to "mr" - short for MapReduce - as it is by default and on the csse584 course server.

Hive as a "Leaky Abstraction"

Hive does a pretty good job of abstraction, so it can be easy to forget that: •For a business analyst ("data scientist"), knowing HiveQL is likely sufficient. •A data (infrastructure) engineer will need to know about the inner workings. Hive does not replace MapReduce. Hive is an abstraction on top of MapReduce. Hive does not replace YARN. Hive runs its job on YARN by way of MapReduce.

ExecMapper and ExecReducer

Hive's special MapOperator is responsible the SerDe parsing of Writables. •Hive writes its output not via reduce() but using its own TerminalOperators. •ExecMapper and ExecReducer extend the MapReduce framework classes:

ApplicationMaster

If YARN is a generic cluster manager, who handles the MapReduce specifics? - Recall, the JobTracker was also responsible for locality and brokering the shuffle - YARN can handle locality generally well, in the form of Placement Constraints - The MapReduce specifics are handled by the MapReduce ApplicationMaster You can think of the ApplicationMaster as a JobTracker for only that job

Web 2.0 / Participatory Web

Instead of content being created by a select few "webmasters", sites allowed users to publish their own content and social media was born (Huge shift in usage of the web)

Nested Loop Join

Iterates every combination

The Reducer run() Method

Just like the Mapper, the MapReduce framework calls back into your code via the run() method of your Reducer, providing it with the Context, in this case an instance of ReduceContext

Caveats of Column Pruning

Just like with partitioning, column pruning is great, but has some trade-offs: •Wouldn't it be nice to be able to get benefits of pruning without the trade-offs? Only queries accessing only non-pruned columns benefit from the pruning. We're now storing multiple copies of the data for each non-pruned column. Whenever the original data is updated, the pruned copies must be updated also.

Hadoop MapReduce (Version <2.0)

Like the NameNode, a single JobTracker manages the MapReduce metadata Also like the NameNode, the JobTracker handles only the metadata, no tasks.

JobTracker (Responsibilities in Hadoop MapReduce)

Maintains the global view of MapReduce jobs running on the cluster Exposes an RPC interface for clients to submit MapReduce jobs Assigns {Map,Reduce}Tasks to TasksTracker and monitors task status Responds to failures to ensure successful job completion, if possible

ResourceManager (Responsibilities in Hadoop YARN)

Maintains the global view of containers running on the cluster Exposes an RPC interface for clients to submit YARN applications Assigns containers to NodeManagers and monitors containers status Responds to failures to ensure successful job completion, if possible

NameNode (HDFS)

Maintains the global view of the filesystem namespace and metadata Exposes an RPC interface that clients use to operate on the file system Assigns blocks to DataNodes and monitors the status of those blocks Responds to failures to ensure integrity of the file system, if possible

SimpleMapReduce MapContext Class

MapContext sorts and groups the map() outputs in memory using TreeMap

SimpleMapReduce Runtime

MapReduce programming model is incredibly powerful and enables: - Parallelization and distribution of the execution across a cluster of workers - Efficient and transparent mechanisms for fault tolerance and error recovery

Sort Merge Join in MapReduce

MapReduce will handle the grouping - and sorting, although we don't really require that here - by key within a partition. Your job is to handle the processing within each group by partitioning values back into "left" and "right" and generating all possible combinations

Hadoop InputFormat and OutputFormat

Mapper and Reducer handle data within the system, but not I/O - You can tell Hadoop how to read the inputs to the Mappers using InputFormat - You tell Hadoop how to write the output of the Reducers using OutputFormat TextInputFormat and TextOutputFormat are the respective default formats - TextInputFormat uses linebreaks to identify records - TextOutputFormat writes output records delimited (by default) by newlines FileInputFormat most common, particularly on top of HDFS

Variety (The Three V's)

Most "big data" includes unstructured data Goal is to extract some insight by considering this data in aggregate

SimpleMapReduce Job Class

Mostly Java boilerplate Responsible for calling into the "business logic" in the Mapper and Reducer classes Does little more than calling the map() and reduce() methods with the appropriate key/value pairs

Online Analytical Data Processing (OLAP)

Originally done by Enterprise Data Warehouse For businesses to get a "global view" of their data for "business intelligence" purposes; the aggregation became known as enterprise data warehouse

Online Transaction Processing (OLTP)

Originally done using "business machines" Users input data using a "dumb terminal", rather than punch cards The system processed the data and returned a response within seconds SQL (Structured Query Language) remains the dominant query language used in OLTP

Handling reduce()'s output (revisited)

OutputFormat has no getSplits(), since output "splits" are part of the design: - Each ReduceTask writes its output independently of the other ReduceTasks - For FileOutputFormat, each part "r-####" file contains that ReduceTask's output The blocks of part r-#### are written to the DataNode local to the ReduceTask HDFS then takes care of replicating the blocks to other DataNodes in the cluster

Caveats of Partitioning

Partitioning is a major feature of Hive, but you don't want to get carried away. •As with most things related to performance, it's wise to measure and refine. Partitions are purely overhead whenever Hive can't perform partition pruning. Each partition places additional storage and query load on the Metastore. For efficient MapReduce, partitions should not be smaller than an HDFS block. Partitioning by multiple columns requires greater care with the above points.

Hadoop SequenceFile{Input,Output} Format

Provides efficient (de) serialization of Writable key/value pairs SequenceFileInputFormat and SequenceOutputFormat expose this format. This format is particularly useful for passing data between multiple jobs...

Building JARs for Hadoop

Recall that Java applications search their classpath for dependencies MapReduce runs your code on multiple machines making this a bit trickier Several options for managing dependencies in distributed systems: - Pre-emptively install very job's dependencies on every machine in the cluster - Upload the dependencies as part of the job, like we saw in GNU Parallel - Package your application as a self-contained "bundle", including dependencies - Apache Maven used to package Java applications as JARs - Maven Assembly Plugin produces a "fat JAR" containing all dependencies

Downloading the JAR and XML

Recall that TaskTracker spawns a new JVM for each {Map,Reduce}Task it runs - Since this is Java, including your job.jar is just a matter of setting the classpath - Hadoop's Configuration class is similarly classpath-aware and parses job.xml - The input splits instructing each MapTask how to locate its own data After a few layers of indirection, Task.run() is called within the task's JVM

The SerDe

Recall, Hadoop deals exclusively in key/value pairs of Writable objects. Hive's data model is instead based on rows of strongly-typed column data. Hive's Serializer/Deserializer ("SerDe") is responsible for bridging this gap by converting Writables to row data on input and row data back to Writables on output

Hash Joins in MapReduce

Recall, a hash join builds an in-memory index of the small side of the join. In the case of MapReduce, this hash table must exist within all MapTasks rather than following the typical InputSplit mechanism. This is done using the Hadoop Distributed Cache. Unlike the sort merge join, the hash join is performed entirely within the Mapper, allowing the job's Reducer to be used for some other task, in this case summing counts by state The hash join is great, with the caveatthat it requires every MapTask tocommit memory to storing the hashtable, which in some cases may beexpensive or even impossible

Handling map()'s output (revisited)

Recall, map() produces output by calling write() on the provided Context For MapContext, this output is (eventually) written to files local to the MapTask - We now know exactly what these files look like and even how to inspect them Each MapTask is short-lived, so keeping outputs in memory is not sufficient - We now know why this is - we need to free up precious slots on the TaskTracker

The Shuffle Handler

ReduceTasks across the cluster need access to map() output on the local disk Just like in HDFS, there is a separation between control plane and data plane: - The JobTracker tells each ReduceTask about all of the MapTask's locations - Each TaskTracker exposes the HTTP service known as ShuffleHandler The ReduceTasks interact with these ShuffleHandlers to fetch their inputs Fetching can start as soon as one MapTask completes but reduce() must wait (to ensure keys in order)

Reducer Interface

Reducer looks a lot like the Mapper, but it's values are an Iterable not a single value Reducer again uses Context to produce its output, since there may be many Possible (although uncommon) to output a different key from it's input key

Sorting in MapReduce

Remember, MapReduce always sorts by key in each partition - The obvious way to sort records globally is to use only a single ReduceTask Can create your own custom partitioner or use Hadoop's TotalOrderPartitioner to use InputSampler to do some preprocessing to reorder and partition the input to allow for multiple reducers and still have data sorted within files and globally across files

Secondary Sort

Remember, the order of values in the Iterable passed to reduce() is undefined. •Occasionally, an algorithm will require that these values be sorted as well. •Hadoop's "secondary sort" provides the illusion of sorting other than by key: Implement a custom Writable for a composite key that also includes the value. Use Job.setPartitionerClass() to partition using only your "true key". Use Job.setGroupingComparatorClass() to group using only your "true key".

Essential Analytical Functionality

Selection Projection Aggregation Sorting Join All essential for OLAP workloads

MapReduce on YARN

Submitting the MapReduce job involves just a little bit more work on YARN: - The client allocates a new ApplicationMaster from the ResourceManager - The client interacts with this ApplicationMaster as a "one-off" JobTracker - The ApplicationMaster allocates additional containers as "one-off" TaskTrackers This additional complexity is, of course, hidden behind Job.submit()

Hadoop MapReduce (>= Version 2.0)

TaskTrackers become NodeManagers and a ResourceManager is introduced A NodeManager is basically a TaskTracker that can run any YARN container

Hadoop Partitioning & Sorting

The "within a partition" restriction is essential to MapReduce's ability to scale - Each MapTask produces not a single output file, but one for each ReduceTask - The routing to a ReduceTask is determined by the key of the map() output - The records within each MapTask output file are sorted by key by the MapTask - The framework again handles this for you, you just need to provide the key

Client Application (Revisited)

The client configures the Job and then submits it to the "cluster" over RPC - We now know more precisely that this RPC is specifically to the JobManager The code within your single JAR file will be executed in two different places: - Your main() method will be called on the machine where you run your hadoop jar - Your Mapper and Reducer classes will run by some TaskManager

The Job Submission Process

The client, JobTracker, and (some) TaskTrackers all need to know about a "job" At it's core, a Hadoop MapReduce job is made of three components: - The JAR file containing your code to be run as MapTasks and ReduceTasks - The various configuration options that your client set on its Job instance - The input splits instructing each MapTask how to locate its own input data Hadoop clearly leverages HDFS to share these details across the cluster

Handling reduce()'s output

The opposite of InputFormat...OutputFormat provides getRecordWriter() OutputFormat has no getSplits(), since output "splits are part of the design" - Each ReduceTask writes its output independently of the other ReduceTasks - For FileOutputFormat, each "part-r####" file contains the ReduceTask's output - OutputFormat can optionally support OutputCommitter for atomic commit

Resource Requests in YARN

The power of YARN's generality also brings with it additional responsibility - For example, you must think about vCPUs and memory rather than "slots"

Interfacing with the Framework

The primary - and in many simpler applications, only - interfaces between your code and the Hadoop framework are the MapContext and ReduceContext, both specialized subclasses of TaskInputOutputContext Separate implementations for Mapper and Reducer for the same reasons as in SimpleMapReduce

Hive Server 2 / Thrift Server

The purpose of the HiveServer is to allow non-JVM clients to execute HiveQL. ◦ It also allows JVM clients to be lightweight, such as the Beeline replacement CLI. •HiveServer2 runs as a shared server process.

Volume (The Three V's)

Think of it as "size in bytes" Where to store all the bytes of data? - (A little trickier when considering backups)

MapReduce - Reduce (Map-Only Jobs)

This is the one case in which MapReduce does not sort by key within partition A map-only job is a job with zero ReduceTasks, hence the "map-only" name - The sorting in MapReduce is to provide the guarantees around reduce() - If reduce() will never be called, then there is no need to waste time sorting Use job.setNumReduceTasks(0) One common use of map-only jobs is large-scale conversions of file formats

SimpleMapReduce ReduceContext Class

Unlike MapContext, ReduceContext stores no state, and uses little memory Writing to a file instead only requires another implementation of Context

Hadoop Reducer

Unlike, SimpleMapReduce, Hadoop allows input and output of different types

Writable Interface

Unlike, SimpleMapReduce, Hadoop does not use standard Java types Hadoop's Writables are designed specifically for efficient (de)serialization Rather than Integer, use IntWritable. Rather than String, use Text If needed, you can create your own classes implement Writable - readFields(DataInput in) ----> deserialize input - write(DataOutput out) ----> serialize to output Unlike Java's primitive wrappers. Writable is mutable When you call write(), the value is copied (serialized) out of the Writable If your Mapper/Reducer ever needs to store a value, you must similarly copy it The benefit of this mutability is reduced object allocation due to reuse

Hash Join

Uses a hash table to index the smaller table based on the key of the join condition The inner loop of the nested loop join can be replaced with an expected O(1) hash lookup Only works for joins based on equality and works best when the hash table can be kept entirely in memory, although on disk hash tables are possible

The Three V's of Big Data

Volume, velocity, and variety

Supplying map()'s input

We now know in much more detail exactly how Hadoop's input splits work: - Job.submit() calls getSplits() and uploads the results to the staging directory - The JobTracker uses job.splitmetainfo to assign MapTasks to TaskTrackers - The JobTracker tells each MapTask where to look in job.split for its own split From here on, the RecordReader works just like we talked about last time

Locality

When setting up a Hadoop cluster it is your job to configure Rack Awareness The JobTracker uses the locations in job.splitmetainfo to select a TaskTracker: (This part is subtly wrong DataNode is actually storing blocks) - On the same node as the DataNode that is storing the input split, if possible - Otherwise, in the same rack as a DataNode storing the input split, if possible - Finally, on any node in the cluster with an available MapTask slot, if possible You can configure how long the JobTracker waits before "falling through"

Hive MetaStore

When you run CREATE TABLE in Hive the definition is stored in the Metastore. •The Metastore handles only metadata operations, which is an OLTP workload

The Benefits of YARN

While the YARN architecture is certainly more complex, it has several benefits: - The shared JobTracker is no longer a single-point of failure for all jobs - The ResourceManager does less work, and as a result can scale further - MapReduce and non-MapReduce ApplicationMasters can share a cluster - Any improvement on YARN benefits all frameworks, not just MapReduce

Design Decisions

Why not put all machines on fewer racks? - less network traffic overhead - If a machine goes down (you're in trouble) Why Hard Disks and not SSDs? - SSDs cost more and HDDs are pretty good at sequential read which is needed for MapReduce - read/write more efficient

YARN Manages "Containers"

YARN is not Docker - Like Docker, however, YARN containers encapsulate programs and resources You can think of a YARN container as a generalization of a {MapReduce}Task Contains: Program (JAR file) Resource Requests - vCPU - Memory Other Useful Stuff - Environment - Files

Combiner

You can think of a Combiner as a Reducer that runs within each MapTask - You can set it using: Job.setCombinerClass() Important: the Combiner is an optimization not guaranteed to be executed Combiner is useful when there is significant repetition in the intermediate keys produced by each map task, and the user-specified Reduce function is commutative and associative

{Input,Output}Formats & Chaining

You could use Text{Input,Output} format (e.g. JSON) to pass data between jobs Much better is SequenceFile, which natively understands Hadoop's Writables

table

a collection of data records (rows) with the same schema

Hive warehouse

a collection of databases

schema

defines the structure of a row in terms of columns

Hadoop's WritableComparable

designed to support efficient comparison The keys in key/value pairs must be WritableComparable, the values Writable Standard types like IntWritable and Text implement WritableComparable Hadoop's RawComparator supports comparison without deserialization

Massively Parallel Processing (MPP)

enables fast run of the most complex queries operating on large amounts of data. Multiple compute nodes handle all query processing leading up to the final result aggregation, with each core of each node running the same compiled query segments on portions of the entire data. Started with Teradata in 1988 Redshift if AWS's MPP

column

has a name and datatype

Optimizer

it is the job of the Optimizer to find the best way to implement a HiveQL query. •Hive's optimizer takes a well-studied approach to database query planning: •Hive specifically uses the Apache Calcite library to optimize its logical plan. ◦ The general technique used by Calcite is that of cost-based optimization. Parse the query into a logical plan, independent of execution environment. Apply general query optimization techniques to the query's logical plan. Convert the logical plan into a physical plan targeting a specific environment. The physical plan is distributed to Hadoop Distributed Cache Use EXPLAIN to dump the plan the optimizer came up with

Distinguishing Language

map() and reduce() are the methods of the MapReduce programming model Mapper and Reducer are the classes you implement in Hadoop MapReduce MapTask and ReduceTask are the units of execution in the Hadoop runtime

Mapper Interface

map() output will be sorted by key, so KEYOUT must be comparable

Hadoop "The Shuffle" & Merge

map() outputs with the same key may be produced by multiple MapTasks - These map() outputs must be "shuffled" to the appropriate ReduceTask - Each ReduceTask fetches its partition's map() output from every MapTask - The ReduceTask efficiently sorts (via merge) these outputs into a single file Only time two nodes talk to each other is during the "shuffle" - Similar to MPP "Shared-nothing" design Again, the hard distributed systems problems are solved by the framework

Handling map()'s output

map() produces output by calling write() on the provided Context For MapContext, this output is (eventually) written to files local to the MapTask Hadoop uses a format similar to SequenceFile to efficiently store this output Each MapTask is short-lived, so keeping outputs in memory is not sufficient

Database

namespaced collection of tables

CompositeInputFormat

performs a merge join prior to the Mapper downside: uses its own language

Java's Reflection API

provides an incredibly, powerful mechanism for interacting with Java classes - rather than instances of those classes - at runtime

The Staging Directory

serves as the coordination space for MapReduce jobs The job.jar files is the exact JAR file that our client submitted The job.xml is less obviously our instance of Job, just serialized as XML The *.jhist and *_conf.xml relate to the History Server and are out of scope

Sort Merge Join in Hadoop

we'll need to split the computation into two Hadoop jobs - one to perform the join of the states and useful counts and a second to sum up the counts by state. The most important part of the Job definition is adding files from both "sides" of the join with a common key. The Reducer is where the join part ofthe sort merge join happens. We firstiterate all values and add them to localbuffers, one for each side. In this case,we're interested only in the "state"and "useful" fields.Afterwards, we use nested loops togenerate all possible combinations ofthe buffered values.


Set pelajaran terkait

Chapter 10 Mini Stim Exercise: Human Resource Management

View Set

Pre-Lab Quiz-Glassware, Techniques, and Measurement

View Set

Chapter 13: Assessing Nutritional status

View Set

Endocrine System Part 1 Questions

View Set

NRSG 337 Exam #5 Class Questions

View Set

Ch. 1 (What is Public Relations)

View Set