Big Data
Hadoop Configuring ReduceTasks
# of MapTasks is determined by how many splits getSplits() returns # of ReduceTasks is determined by configuration rather than data
MapReduce map() and reduce() Caveat
A true map() is always 1:1, while MapReduce's map() is 1:N for some N>=0 A true reduce is always M:1, while MapReduce's is M:N for some N>=0 MapReduce's reduce() is performed for each key rather than once globally MapReduce's map() and reduce() are both more like a flatMap()
Joins in MapReduce
Add the files for both sides of the join as inputs to the MapReduce job. Key the map() output by the join condition and let MapReduce do the sorting. Perform a key-scoped nested loop within your reduce() implementation.
TaskTrackers (Responsibilities in Hadoop MapReduce)
Advertises "slots" for running MapTasks and ReduceTasks, respectively Receives and acknowledges task assignments from the JobTracker Spawns a JVM for each task and captures process output and exit code Periodically updates the JobTracker with the statuses of assigned tasks
DataNodes (HDFS)
Advertises local disk resources for storing blocks of files Receives and acknowledges block assignments from the NameNode Exposes an RPC interfaces that clients use to read and write blocks Periodically updates the NameNode with the statuses of assigned blocks
NodeManagers
Advertises resources (vCPU, memory) available for running containers Receives and acknowledges assignments from the ResourceManager Spawns each container and captures process output and exit code Periodically updates the ResourceManager with container statuses
Supplying reduce()'s input
After the shuffle, each ReduceTask has a single, sorted file of map() outputs - If a ReduceTask has any output key, it will have all outputs with that key - If a ReduceTask has any output key, no other ReduceTask has that key The ReduceTask iterates this sorted file, grouping each time the key changes The keys and values Iterator are deserialized lazily, requiring little memory
Supplying reduce()'s input (revisited)
After the shuffle, each ReduceTask has a single, sorted file of map() outputs - If a ReduceTask has any output with a key, it will have all the outputs with that key - If a ReduceTask has any output with a key, no other ReduceTask has that key The ReduceTask iterates this sorted file, grouping each time the key changes The keys and values are deserialized lazily, requiring little additional memory
Traditional Shared-Disk Architecture
Allows processors to access any disk in the system Includes Network Attached Storage (NAS), like our NFS home directories
HDFS (Hadoop Distributed File System)
Almost identical to the Google File System Namenode = GFS Master Datanode = GFS Chunkserver Blocks = Chunks Maintains the strict separation of metadata and data
MapReduce vs. GNU Parallel
Both developed around the same time in the early/mid 2000s You can actually use GNU Parallel parallelize across multiple machines Essential difference is that GNU Parallel does not perform a "shuffle" - GNU supports M Mappers, but only a single Reducer - Unlike MapReduce, GNU Parallel often requires no additional infrastructure - GNU Parallel will handle copying scripts and removing them when done - GNU Parallel supports many other useful options, including retry failures
Exploring the Split Files
Both job.split and job.splitmetainfo include information about the input splits - job.splitmetainfo is for the JobTracker and job.split is shared by the MapTasks These files use Hadoop's own serialization and thus require custom readers
Storing Results
By default, Hive writes results to HDFS and then fetches and prints the results For non interactive workloads you will needs to persist the results by: -- Create a new (managed) Hive table to store the results. CREATE TABLE review_counts_by_stars AS SELECT stars, COUNT(*) FROM reviews GROUP BY stars; -- Write the results to a directory on HDFS that is external to Hive. INSERT OVERWRITE DIRECTORY 'output' SELECT stars, COUNT(*) FROM reviews GROUP BY stars;
Hadoop Mapper
Class rather than an Interface with a sensible default - KEYIN and VALUEIN, rather than IN in SimpleMapReduce
The Client Application (Hadoop Execution Model)
Client configures the Job and then submits it to the cluster over RPC - details handled by the framework hidden by Job.submit() The code within your single JAR file will be executed in two different places: - Your main() method will be called on the machine where you run hadoop jar - Your Mapper and Reducer classes will run wherever the framework decides
Supplying the map()'s input
Client uses InputFormat to configure MapReduce's input Job.submit() calls InputFormat.getSplits() on the client to divide up the work Framework will then create one MapTask for each returned InputSplit Within each MapTask, the framework creates a RecordReader for its split - Uses this RecordReader to parse the Writable key/value pairs - Logic encapsulated within MapContext and ReduceContext
Velocity (The Three V's)
Data is constantly growing or changing - Processing must both respond quickly to changes and also scale predictably
Big Data
Data that can't be effectively processed using standard methods of the time
HDFS Blocks vs. MapReduce Splits
DataNode stores HDFS blocks, not splits FileInputFormat attempts to keep the two aligned, but they're rarely identical - TextInputFormat, in particular, "spills" across blocks looking for the final newline Even local tasks will often need to read a few bytes from the remote DataNode Job can additionally bound their split sizes or use CombineFileInputFormat // Bound split size between 2MB and 256MB, if the file allow it. FileInputFormat.setMinInputSplitSize(job, 2 * 1024 * 1024); FileInputFormat.setMaxInputSplitSize(job, 256 * 1024 * 1024);
Assigning data to nodes
Different systems give this concept different names - shards, slices, partitions The basic idea is similar to how you might "check in" at some large event: - Coordinator + different partitions
Querying in MPP
Each node in MPP system is responsible for handling the part of the query that involves its own local data The "coordinator" is responsible for splitting the query up and sending it to each node and for combining each of the node's individual results Some have dedicated coordinator role while others allow any node to act as the coordinator for the given query
MPP's "Shared-Nothing" Architecture
Each processor able to access only the disk that they specifically "own" Operations on one node no longer impact the other nodes in the system
Sort Merge Join
Either requires that the inputs to already be sorted - a common occurrence in database indexes - or as a first step sorts them, if needed Two inputs are then traversed in parallel, and the equivalent of a nested loop join scoped only to the current key is performed to produce all matching pairs
Mapper run() method
Framework finally calls back into your application code via the run() methods of your Mapper and Reducer classes, passing only the Context object, in this case the MapContext Call to run happens within the run() methods of MapTask and ReduceTask, respectively Most Mappers will use the default run() method, but still technically part of your code and therefore yours to use
GFS
GFS master tells client to write chunks to GFS chunkserver - Master looksup file in chunk index which will locate the file in the correct GFS chunk server Master has complete global namespace - Master in charge of control plane - Client has access to data plane after talking to Master in charge of control plane - Clients can't write or read new chunks, but can access chunks with already stored chunk handle - That way master isn't a single point of file system failure\ - separation of data plane and control plane
Hadoop 2.0 & YARN
Hadoop 2.0 introduced YARN, short for "Yet Another Resource Negotiator" The idea of YARN was once again to "do what Google did", this time with Borg Rather than MapReduce specific, YARN provides general cluster management YARN was not the only Apache project following the footsteps of Borg: - Mesos
Hive Compiler
Hive compiles your HiveQL to a directed acyclic graph (DAG) of zero or more MapReduce jobs, at least when the hive.execution.engine property is set to "mr" - short for MapReduce - as it is by default and on the csse584 course server.
Hive as a "Leaky Abstraction"
Hive does a pretty good job of abstraction, so it can be easy to forget that: •For a business analyst ("data scientist"), knowing HiveQL is likely sufficient. •A data (infrastructure) engineer will need to know about the inner workings. Hive does not replace MapReduce. Hive is an abstraction on top of MapReduce. Hive does not replace YARN. Hive runs its job on YARN by way of MapReduce.
ExecMapper and ExecReducer
Hive's special MapOperator is responsible the SerDe parsing of Writables. •Hive writes its output not via reduce() but using its own TerminalOperators. •ExecMapper and ExecReducer extend the MapReduce framework classes:
ApplicationMaster
If YARN is a generic cluster manager, who handles the MapReduce specifics? - Recall, the JobTracker was also responsible for locality and brokering the shuffle - YARN can handle locality generally well, in the form of Placement Constraints - The MapReduce specifics are handled by the MapReduce ApplicationMaster You can think of the ApplicationMaster as a JobTracker for only that job
Web 2.0 / Participatory Web
Instead of content being created by a select few "webmasters", sites allowed users to publish their own content and social media was born (Huge shift in usage of the web)
Nested Loop Join
Iterates every combination
The Reducer run() Method
Just like the Mapper, the MapReduce framework calls back into your code via the run() method of your Reducer, providing it with the Context, in this case an instance of ReduceContext
Caveats of Column Pruning
Just like with partitioning, column pruning is great, but has some trade-offs: •Wouldn't it be nice to be able to get benefits of pruning without the trade-offs? Only queries accessing only non-pruned columns benefit from the pruning. We're now storing multiple copies of the data for each non-pruned column. Whenever the original data is updated, the pruned copies must be updated also.
Hadoop MapReduce (Version <2.0)
Like the NameNode, a single JobTracker manages the MapReduce metadata Also like the NameNode, the JobTracker handles only the metadata, no tasks.
JobTracker (Responsibilities in Hadoop MapReduce)
Maintains the global view of MapReduce jobs running on the cluster Exposes an RPC interface for clients to submit MapReduce jobs Assigns {Map,Reduce}Tasks to TasksTracker and monitors task status Responds to failures to ensure successful job completion, if possible
ResourceManager (Responsibilities in Hadoop YARN)
Maintains the global view of containers running on the cluster Exposes an RPC interface for clients to submit YARN applications Assigns containers to NodeManagers and monitors containers status Responds to failures to ensure successful job completion, if possible
NameNode (HDFS)
Maintains the global view of the filesystem namespace and metadata Exposes an RPC interface that clients use to operate on the file system Assigns blocks to DataNodes and monitors the status of those blocks Responds to failures to ensure integrity of the file system, if possible
SimpleMapReduce MapContext Class
MapContext sorts and groups the map() outputs in memory using TreeMap
SimpleMapReduce Runtime
MapReduce programming model is incredibly powerful and enables: - Parallelization and distribution of the execution across a cluster of workers - Efficient and transparent mechanisms for fault tolerance and error recovery
Sort Merge Join in MapReduce
MapReduce will handle the grouping - and sorting, although we don't really require that here - by key within a partition. Your job is to handle the processing within each group by partitioning values back into "left" and "right" and generating all possible combinations
Hadoop InputFormat and OutputFormat
Mapper and Reducer handle data within the system, but not I/O - You can tell Hadoop how to read the inputs to the Mappers using InputFormat - You tell Hadoop how to write the output of the Reducers using OutputFormat TextInputFormat and TextOutputFormat are the respective default formats - TextInputFormat uses linebreaks to identify records - TextOutputFormat writes output records delimited (by default) by newlines FileInputFormat most common, particularly on top of HDFS
Variety (The Three V's)
Most "big data" includes unstructured data Goal is to extract some insight by considering this data in aggregate
SimpleMapReduce Job Class
Mostly Java boilerplate Responsible for calling into the "business logic" in the Mapper and Reducer classes Does little more than calling the map() and reduce() methods with the appropriate key/value pairs
Online Analytical Data Processing (OLAP)
Originally done by Enterprise Data Warehouse For businesses to get a "global view" of their data for "business intelligence" purposes; the aggregation became known as enterprise data warehouse
Online Transaction Processing (OLTP)
Originally done using "business machines" Users input data using a "dumb terminal", rather than punch cards The system processed the data and returned a response within seconds SQL (Structured Query Language) remains the dominant query language used in OLTP
Handling reduce()'s output (revisited)
OutputFormat has no getSplits(), since output "splits" are part of the design: - Each ReduceTask writes its output independently of the other ReduceTasks - For FileOutputFormat, each part "r-####" file contains that ReduceTask's output The blocks of part r-#### are written to the DataNode local to the ReduceTask HDFS then takes care of replicating the blocks to other DataNodes in the cluster
Caveats of Partitioning
Partitioning is a major feature of Hive, but you don't want to get carried away. •As with most things related to performance, it's wise to measure and refine. Partitions are purely overhead whenever Hive can't perform partition pruning. Each partition places additional storage and query load on the Metastore. For efficient MapReduce, partitions should not be smaller than an HDFS block. Partitioning by multiple columns requires greater care with the above points.
Hadoop SequenceFile{Input,Output} Format
Provides efficient (de) serialization of Writable key/value pairs SequenceFileInputFormat and SequenceOutputFormat expose this format. This format is particularly useful for passing data between multiple jobs...
Building JARs for Hadoop
Recall that Java applications search their classpath for dependencies MapReduce runs your code on multiple machines making this a bit trickier Several options for managing dependencies in distributed systems: - Pre-emptively install very job's dependencies on every machine in the cluster - Upload the dependencies as part of the job, like we saw in GNU Parallel - Package your application as a self-contained "bundle", including dependencies - Apache Maven used to package Java applications as JARs - Maven Assembly Plugin produces a "fat JAR" containing all dependencies
Downloading the JAR and XML
Recall that TaskTracker spawns a new JVM for each {Map,Reduce}Task it runs - Since this is Java, including your job.jar is just a matter of setting the classpath - Hadoop's Configuration class is similarly classpath-aware and parses job.xml - The input splits instructing each MapTask how to locate its own data After a few layers of indirection, Task.run() is called within the task's JVM
The SerDe
Recall, Hadoop deals exclusively in key/value pairs of Writable objects. Hive's data model is instead based on rows of strongly-typed column data. Hive's Serializer/Deserializer ("SerDe") is responsible for bridging this gap by converting Writables to row data on input and row data back to Writables on output
Hash Joins in MapReduce
Recall, a hash join builds an in-memory index of the small side of the join. In the case of MapReduce, this hash table must exist within all MapTasks rather than following the typical InputSplit mechanism. This is done using the Hadoop Distributed Cache. Unlike the sort merge join, the hash join is performed entirely within the Mapper, allowing the job's Reducer to be used for some other task, in this case summing counts by state The hash join is great, with the caveatthat it requires every MapTask tocommit memory to storing the hashtable, which in some cases may beexpensive or even impossible
Handling map()'s output (revisited)
Recall, map() produces output by calling write() on the provided Context For MapContext, this output is (eventually) written to files local to the MapTask - We now know exactly what these files look like and even how to inspect them Each MapTask is short-lived, so keeping outputs in memory is not sufficient - We now know why this is - we need to free up precious slots on the TaskTracker
The Shuffle Handler
ReduceTasks across the cluster need access to map() output on the local disk Just like in HDFS, there is a separation between control plane and data plane: - The JobTracker tells each ReduceTask about all of the MapTask's locations - Each TaskTracker exposes the HTTP service known as ShuffleHandler The ReduceTasks interact with these ShuffleHandlers to fetch their inputs Fetching can start as soon as one MapTask completes but reduce() must wait (to ensure keys in order)
Reducer Interface
Reducer looks a lot like the Mapper, but it's values are an Iterable not a single value Reducer again uses Context to produce its output, since there may be many Possible (although uncommon) to output a different key from it's input key
Sorting in MapReduce
Remember, MapReduce always sorts by key in each partition - The obvious way to sort records globally is to use only a single ReduceTask Can create your own custom partitioner or use Hadoop's TotalOrderPartitioner to use InputSampler to do some preprocessing to reorder and partition the input to allow for multiple reducers and still have data sorted within files and globally across files
Secondary Sort
Remember, the order of values in the Iterable passed to reduce() is undefined. •Occasionally, an algorithm will require that these values be sorted as well. •Hadoop's "secondary sort" provides the illusion of sorting other than by key: Implement a custom Writable for a composite key that also includes the value. Use Job.setPartitionerClass() to partition using only your "true key". Use Job.setGroupingComparatorClass() to group using only your "true key".
Essential Analytical Functionality
Selection Projection Aggregation Sorting Join All essential for OLAP workloads
MapReduce on YARN
Submitting the MapReduce job involves just a little bit more work on YARN: - The client allocates a new ApplicationMaster from the ResourceManager - The client interacts with this ApplicationMaster as a "one-off" JobTracker - The ApplicationMaster allocates additional containers as "one-off" TaskTrackers This additional complexity is, of course, hidden behind Job.submit()
Hadoop MapReduce (>= Version 2.0)
TaskTrackers become NodeManagers and a ResourceManager is introduced A NodeManager is basically a TaskTracker that can run any YARN container
Hadoop Partitioning & Sorting
The "within a partition" restriction is essential to MapReduce's ability to scale - Each MapTask produces not a single output file, but one for each ReduceTask - The routing to a ReduceTask is determined by the key of the map() output - The records within each MapTask output file are sorted by key by the MapTask - The framework again handles this for you, you just need to provide the key
Client Application (Revisited)
The client configures the Job and then submits it to the "cluster" over RPC - We now know more precisely that this RPC is specifically to the JobManager The code within your single JAR file will be executed in two different places: - Your main() method will be called on the machine where you run your hadoop jar - Your Mapper and Reducer classes will run by some TaskManager
The Job Submission Process
The client, JobTracker, and (some) TaskTrackers all need to know about a "job" At it's core, a Hadoop MapReduce job is made of three components: - The JAR file containing your code to be run as MapTasks and ReduceTasks - The various configuration options that your client set on its Job instance - The input splits instructing each MapTask how to locate its own input data Hadoop clearly leverages HDFS to share these details across the cluster
Handling reduce()'s output
The opposite of InputFormat...OutputFormat provides getRecordWriter() OutputFormat has no getSplits(), since output "splits are part of the design" - Each ReduceTask writes its output independently of the other ReduceTasks - For FileOutputFormat, each "part-r####" file contains the ReduceTask's output - OutputFormat can optionally support OutputCommitter for atomic commit
Resource Requests in YARN
The power of YARN's generality also brings with it additional responsibility - For example, you must think about vCPUs and memory rather than "slots"
Interfacing with the Framework
The primary - and in many simpler applications, only - interfaces between your code and the Hadoop framework are the MapContext and ReduceContext, both specialized subclasses of TaskInputOutputContext Separate implementations for Mapper and Reducer for the same reasons as in SimpleMapReduce
Hive Server 2 / Thrift Server
The purpose of the HiveServer is to allow non-JVM clients to execute HiveQL. ◦ It also allows JVM clients to be lightweight, such as the Beeline replacement CLI. •HiveServer2 runs as a shared server process.
Volume (The Three V's)
Think of it as "size in bytes" Where to store all the bytes of data? - (A little trickier when considering backups)
MapReduce - Reduce (Map-Only Jobs)
This is the one case in which MapReduce does not sort by key within partition A map-only job is a job with zero ReduceTasks, hence the "map-only" name - The sorting in MapReduce is to provide the guarantees around reduce() - If reduce() will never be called, then there is no need to waste time sorting Use job.setNumReduceTasks(0) One common use of map-only jobs is large-scale conversions of file formats
SimpleMapReduce ReduceContext Class
Unlike MapContext, ReduceContext stores no state, and uses little memory Writing to a file instead only requires another implementation of Context
Hadoop Reducer
Unlike, SimpleMapReduce, Hadoop allows input and output of different types
Writable Interface
Unlike, SimpleMapReduce, Hadoop does not use standard Java types Hadoop's Writables are designed specifically for efficient (de)serialization Rather than Integer, use IntWritable. Rather than String, use Text If needed, you can create your own classes implement Writable - readFields(DataInput in) ----> deserialize input - write(DataOutput out) ----> serialize to output Unlike Java's primitive wrappers. Writable is mutable When you call write(), the value is copied (serialized) out of the Writable If your Mapper/Reducer ever needs to store a value, you must similarly copy it The benefit of this mutability is reduced object allocation due to reuse
Hash Join
Uses a hash table to index the smaller table based on the key of the join condition The inner loop of the nested loop join can be replaced with an expected O(1) hash lookup Only works for joins based on equality and works best when the hash table can be kept entirely in memory, although on disk hash tables are possible
The Three V's of Big Data
Volume, velocity, and variety
Supplying map()'s input
We now know in much more detail exactly how Hadoop's input splits work: - Job.submit() calls getSplits() and uploads the results to the staging directory - The JobTracker uses job.splitmetainfo to assign MapTasks to TaskTrackers - The JobTracker tells each MapTask where to look in job.split for its own split From here on, the RecordReader works just like we talked about last time
Locality
When setting up a Hadoop cluster it is your job to configure Rack Awareness The JobTracker uses the locations in job.splitmetainfo to select a TaskTracker: (This part is subtly wrong DataNode is actually storing blocks) - On the same node as the DataNode that is storing the input split, if possible - Otherwise, in the same rack as a DataNode storing the input split, if possible - Finally, on any node in the cluster with an available MapTask slot, if possible You can configure how long the JobTracker waits before "falling through"
Hive MetaStore
When you run CREATE TABLE in Hive the definition is stored in the Metastore. •The Metastore handles only metadata operations, which is an OLTP workload
The Benefits of YARN
While the YARN architecture is certainly more complex, it has several benefits: - The shared JobTracker is no longer a single-point of failure for all jobs - The ResourceManager does less work, and as a result can scale further - MapReduce and non-MapReduce ApplicationMasters can share a cluster - Any improvement on YARN benefits all frameworks, not just MapReduce
Design Decisions
Why not put all machines on fewer racks? - less network traffic overhead - If a machine goes down (you're in trouble) Why Hard Disks and not SSDs? - SSDs cost more and HDDs are pretty good at sequential read which is needed for MapReduce - read/write more efficient
YARN Manages "Containers"
YARN is not Docker - Like Docker, however, YARN containers encapsulate programs and resources You can think of a YARN container as a generalization of a {MapReduce}Task Contains: Program (JAR file) Resource Requests - vCPU - Memory Other Useful Stuff - Environment - Files
Combiner
You can think of a Combiner as a Reducer that runs within each MapTask - You can set it using: Job.setCombinerClass() Important: the Combiner is an optimization not guaranteed to be executed Combiner is useful when there is significant repetition in the intermediate keys produced by each map task, and the user-specified Reduce function is commutative and associative
{Input,Output}Formats & Chaining
You could use Text{Input,Output} format (e.g. JSON) to pass data between jobs Much better is SequenceFile, which natively understands Hadoop's Writables
table
a collection of data records (rows) with the same schema
Hive warehouse
a collection of databases
schema
defines the structure of a row in terms of columns
Hadoop's WritableComparable
designed to support efficient comparison The keys in key/value pairs must be WritableComparable, the values Writable Standard types like IntWritable and Text implement WritableComparable Hadoop's RawComparator supports comparison without deserialization
Massively Parallel Processing (MPP)
enables fast run of the most complex queries operating on large amounts of data. Multiple compute nodes handle all query processing leading up to the final result aggregation, with each core of each node running the same compiled query segments on portions of the entire data. Started with Teradata in 1988 Redshift if AWS's MPP
column
has a name and datatype
Optimizer
it is the job of the Optimizer to find the best way to implement a HiveQL query. •Hive's optimizer takes a well-studied approach to database query planning: •Hive specifically uses the Apache Calcite library to optimize its logical plan. ◦ The general technique used by Calcite is that of cost-based optimization. Parse the query into a logical plan, independent of execution environment. Apply general query optimization techniques to the query's logical plan. Convert the logical plan into a physical plan targeting a specific environment. The physical plan is distributed to Hadoop Distributed Cache Use EXPLAIN to dump the plan the optimizer came up with
Distinguishing Language
map() and reduce() are the methods of the MapReduce programming model Mapper and Reducer are the classes you implement in Hadoop MapReduce MapTask and ReduceTask are the units of execution in the Hadoop runtime
Mapper Interface
map() output will be sorted by key, so KEYOUT must be comparable
Hadoop "The Shuffle" & Merge
map() outputs with the same key may be produced by multiple MapTasks - These map() outputs must be "shuffled" to the appropriate ReduceTask - Each ReduceTask fetches its partition's map() output from every MapTask - The ReduceTask efficiently sorts (via merge) these outputs into a single file Only time two nodes talk to each other is during the "shuffle" - Similar to MPP "Shared-nothing" design Again, the hard distributed systems problems are solved by the framework
Handling map()'s output
map() produces output by calling write() on the provided Context For MapContext, this output is (eventually) written to files local to the MapTask Hadoop uses a format similar to SequenceFile to efficiently store this output Each MapTask is short-lived, so keeping outputs in memory is not sufficient
Database
namespaced collection of tables
CompositeInputFormat
performs a merge join prior to the Mapper downside: uses its own language
Java's Reflection API
provides an incredibly, powerful mechanism for interacting with Java classes - rather than instances of those classes - at runtime
The Staging Directory
serves as the coordination space for MapReduce jobs The job.jar files is the exact JAR file that our client submitted The job.xml is less obviously our instance of Job, just serialized as XML The *.jhist and *_conf.xml relate to the History Server and are out of scope
Sort Merge Join in Hadoop
we'll need to split the computation into two Hadoop jobs - one to perform the join of the states and useful counts and a second to sum up the counts by state. The most important part of the Job definition is adding files from both "sides" of the join with a common key. The Reducer is where the join part ofthe sort merge join happens. We firstiterate all values and add them to localbuffers, one for each side. In this case,we're interested only in the "state"and "useful" fields.Afterwards, we use nested loops togenerate all possible combinations ofthe buffered values.