COP 4521 Final Exam

Ace your homework & exams now with Quizwiz!

Message Passing

Message passing combines communication and synchronization: >The sender specifies the message and a destination —a process, a port, a set of processes, ... >The receiver specifies message variables and a source —source may or may not be explicitly identified >Message transfer may be: —asynchronous: send operations never block —buffered: sender may block if the buffer is full —synchronous: sender and receiver must both be ready

Starvation

No process waiting for a given resource can be forced to wait forever while other processes repeatedly lock and unlock the same resource.

Observed Speedup

Observed speedup of a code which has been parallelized, defined as: wall-clock time of serial execution ----------------------------------- wall-clock time of parallel execution One of the simplest and most widely used indicators for a parallel program's performance

Race Condition

Occurs whenever the outcome of the execution depends on the particular order that instructions are executed

Single Data (SISD)

Only one data stream is being used as input during any one clock cycle

Single Instruction (SISD)

Only one instruction stream is being acted on by the CPU during any one clock cycle

What is Parallel Computing?

Parallel computing is the simultaneous use of multiple compute resources to solve a computational problem. A problem is broken into discrete parts that can be solved concurrently.

Communications

Parallel tasks typically need to exchange data. There are several ways this can be accomplished, such as through a shared memory bus or over a network, however the actual event of data exchange is commonly referred to as communications regardless of the method employed.

Scope of communications

Point-to-point scoping involves two tasks with one task acting as the sender/producer of data, and the other acting as the receiver/consumer. Collective scoping involves data sharing between more than two tasks, which are often specified as being members in a common group, or collective.

Scalability

Refers to a parallel system's (hardware and/or software) ability to demonstrate a proportionate increase in parallel speedup with the addition of more resources. Factors that contribute to scalability include: •Hardware - particularly memory-cpu bandwidths and network communication properties •Application algorithm •Parallel overhead related Characteristics of your specific application

Massively Parallel

Refers to the hardware that comprises a given parallel system - having many processing elements. The meaning of "many" keeps increasing, but currently, the largest parallel computers are comprised of processing elements numbering in the hundreds of thousands to millions.

CPU

The brain of a computer has its cycles that are essentially the time that the CPU takes for the execution of one processor operation. One tick of a cycle is 1 bytecode instruction.

Critical Section

The code that updates the data when two or more processes share data in common.

Synchronization

The coordination of parallel tasks in real time, very often associated with communications. Often implemented by establishing a synchronization point within an application where a task may not proceed further until another task(s) reaches the same or logically equivalent point. Synchronization usually involves waiting by at least one task, and can therefore cause a parallel application's wall clock execution time to increase.

Distributed Computing

•Distributed computing is the key to the influx of Big Data processing we've seenin recent years. •It is the technique of splitting an enormous task (e.g aggregate 100 billion records), of which no single computer is capable of practically executing on itsown, into many smaller tasks, each of which can fit into a single commoditymachine. •You split your huge task into many smaller ones, have them execute on many machines in parallel, aggregate the data appropriately and you have solved your initial problem. •This approach again enables you to scale horizontally — when you have a bigger task, simply include more nodes in the calculation.

Distributed File System

•Distributed file systems can be thought of as distributed data stores. •They're the same thing as a concept •However, the difference is that distributed file systems allow files to be accessed using the same interfaces and semantics as local files, not through a customAPI •Cassandra needs the CQL, but Hadoop is just a file system.

Distributed Messaging

•Messaging systems provide a central place for storage and propagation of messages/events inside your overall system. •They allow you to decouple your application logic from directly talking with your other systems. •Simply put, a messaging platform works in the following way: •A message is broadcast from the application which potentially create it (called a producer), goes into the platform and is read by potentially multipleapplications which are interested in it (called consumers).

Partitioning

•Splits server into multiple smaller servers, calledshards •Each shard holds a different set of records - defined by a business rule •Shards should be defined with uniformity in mind •Issue: •Load balancing can be really hard to define andpredict

Amdahl's Law

•states that potential program speedup is defined by the fraction of code (P) that can be parallelized: 1 speedup = ------------- 1 - p •If none of the code can be parallelized, P = 0 and the speedup = 1 (no speedup). • •If all of the code is parallelized, P = 1 and the speedup is infinite (in theory). • •If 50% of the code can be parallelized, maximum speedup = 2, meaning the code will run twice as fast.

Node

A standalone "computer in a box". Usually comprised of multiple CPUs/processors/cores, memory, network interfaces, etc. Nodes are networked together to comprise a supercomputer.

Single Instruction Multiple Data (SIMD)

A type of parallel computer. Best suited for specialized problems characterized by a high degree of regularity, such as graphics/image processing. Synchronous (lockstep) and deterministic execution. Two varieties: Processor Arrays and Vector Pipelines Most modern computers, particularly those with graphics processor units (GPUs) employ SIMD instructions and execution units.

Multiple Instruction Multiple Data (MIMD)

A type of parallel computer. Execution can be synchronous or asynchronous, deterministic or non-deterministic Currently, the most common type of parallel computer - most modern supercomputers fall into this category. Note: many MIMD architectures also include SIMD execution sub-components

Multiple Instruction Single Data (MISD)

A type of parallel computer. Few (if any) actual examples of this class of parallel computer have ever existed. Some conceivable uses might be: multiple frequency filters operating on a single signal stream. multiple cryptography algorithms attempting to crack a single coded message

Single Instruction (SIMD)

All processing units execute the same instruction at any given clock cycle

Deadlock

At no time do we have a situation where a process is forced to wait forever for an event that will never occur.

Pipelining

Breaking a task into steps performed by different processor units, with inputs streaming through, much like an assembly line; a type of parallel computing.

Busy-Waiting

Busy-waiting is primitive but effective Processesatomically set and testshared variables

What is the difference between concurrency and parallelism?

Concurrency is the task of running and managing multiple computations at the same time. Parallelism is when multiple sets computations are executed at the same time in completely different processes.

Hybrid Model

Currently, a common example of a hybrid model is the combination of the message passing model (MPI) with the threads model (OpenMP). Threads perform computationally intensive kernels using local, on-node data Communications between processes on different nodes occurs over the network using MPI This hybrid model lends itself well to the most popular hardware environment of clustered multi/many-core machines.

What is the difference between deadlock and starvation?

Deadlock is when a process can always access a shared resource. Starvation is when all processes can eventually access shared resource.

Synchronization Techniques

Different approaches are roughly equivalent in expressive power and can be used to implement each other.

Multiple Data (SIMD)

Each processing unit can operate on a different data element

Multiple Instruction (MISD)

Each processing unit operates on the data independently via separate instruction streams.

Multiple Instruction (MIMD)

Every processor may be executing a different instruction stream

Multiple Data (MIMD)

Every processor may be working with a different data stream

Flynn's Classical Taxonomy

Flynn's taxonomy distinguishes multi-processor computer architectures according to how they can be classified along the two independent dimensions of Instruction Stream and Data Stream. Each of these dimensions can have only one of two possible states: Single or Multiple.

Shared Memory

From a strictly hardware point of view, describes a computer architecture where all processors have direct (usually bus based) access to common physical memory. In a programming sense, it describes a model where parallel tasks all have the same "picture" of memory and can directly address and access the same logical memory locations regardless of where the physical memory actually exists.

Thread

Furthermore, each process can run multiple threads within its context. A thread is a set of coded computer instructions. Each thread has its own memory space. It also has access to the process memory and context information. When we run a process, such as Python.exe, it executes the code within its Main thread. The main thread can start up multiple threads. Subsequently, a process can start up multiple subprocesses

Threads Model

Is a type of shared memory programming. In the threads model of parallel programming, a single "heavy weight" process can have multiple "light weight", concurrent execution paths.

Data locality

Keeping data local to the process that works on it conserves memory accesses, cache refreshes and bus traffic that occurs when multiple processes use the same data. Unfortunately, controlling data locality is hard to understand and may be beyond the control of the average user.

Latency vs. Bandwidth

Latency is the time it takes to send a minimal message from point A to point B. Bandwidth is the amount of data that can be communicated per unit of time.

MPMD

MPMD is actually a "high level" programming model that can be built upon any combination of the previously mentioned parallel programming models. All tasks may use different data MPMD applications are not as common as SPMD applications, but may be better suited for certain types of problems, particularly those that lend themselves better to functional decomposition than domain decomposition

Single Data (MISD)

A single data stream is fed into multiple processing units.

Von Neumann Architecture

-Also known as "stored-program computer" - both program instructions and data are kept in electronic memory. Differs from earlier computers which were programmed through "hard wiring". - Comprised of four main components: Memory, Control Unit, Arithmetic Logic Unit, Input/Output -Read/write, random access memory is used to store both program instructions and data -Program instructions are coded data which tell the computer to do something -Data is simply information to be used by the program Control unit fetches instructions/data from memory, decodes the instructions and then sequentially coordinates operations to accomplish the programmed task. -Arithmetic Unit performs basic arithmetic operations Input/Output is the interface to the human operator

Three characteristics of the Data Parallel Model

1) Address space is treated globally 2) Most of the parallel work focuses on performing operations on a data set. 3) The data set is typically organized into a common structure, such as an array or cube. 4) A set of tasks work collectively on the same data structure, however, each task works on a different partition of the same data structure. 5)Tasks perform the same operation on their partition of work, for example, "add 4 to every array element".

Cores and Processes

A CPU can have multiple cores. Each core can run multiple processes. A process is essentially the applications/the instructions of the programs we run on our computer. Each process has its own memory space. Each Python process has its own Python interpreter, memory space, and the GIL (which I will explain shortly).

Semaphores

A boolean variable with two invisible hardware operations possible on it.

Condition Variable

A container for a set of threads waiting for some condition to be met. It provides operations wait and signal similar to those in a semaphore.

What is a distributed system?

A distributed system is a group of computers that work together to appear as a single computer to the end-user.

Why use a distributed system?

A distributed system is used when horizontal scaling is required, to reduce system latency, and to increase fault tolerance.

Task

A logically discrete section of computational work. A task is typically a program or program-like set of instructions that is executed by a processor. A parallel program consists of multiple tasks running on multiple processors.

Monitors

A monitor provides synchronization between threads through mutual exclusion and condition variables.

Single Instruction Single Data (SISD)

A serial (non-parallel) computer. Deterministic execution. This is the oldest type of computer Examples: older generation mainframes, minicomputers, workstations and single processor/core PCs.

Communication and Synchronization

In approaches based on shared variables, processes communicate indirectly. Explicit synchronization mechanisms are needed. In message passing approaches, communication and synchronization are combined. Communication may be either synchronous or asynchronous.

Distributed Memory

In hardware, refers to network based memory access for physical memory that is not common. As a programming model, tasks can only logically "see" local machine memory and must use communications to access memory on other machines where other tasks are executing.

Granularity

In parallel computing, granularity is a qualitative measure of the ratio of computation to communication. Coarse: relatively large amounts of computational work are done between communication events Fine: relatively small amounts of computational work are done between communication events

Communication overhead

Inter-task communication virtually always implies overhead. Communications frequently require some type of synchronization between tasks, which can result in tasks spending time "waiting: instead of doing work.

SPMD

SPMD is actually a "high level" programming model that can be built upon any combination of the previously mentioned parallel programming models. SPMD programs usually have the necessary logic programmed into them to allow different tasks to branch or conditionally execute only those parts of the program they are designed to execute. That is, tasks do not necessarily have to execute the entire program - perhaps only a portion of it.

What are safety and liveness?

Safety is concurrent processes that may corrupt shared data, it ensures consistency. Liveness are when processes may "starve" if not properly coordinated, it ensures progress.

What is scalability?

Scalability is a parallel system's ability to demonstrate a proportionate increase in parallel speed up with the addition of more resources.

Symmetric Multi-Processor (SMP)

Shared memory hardware architecture where multiple processors share a single address space and have equal access to all resources.

Embarrassingly Parallel

Solving many similar, but independent tasks simultaneously; little to no need for coordination between the tasks.

Shared Memory

Some systems allow processes to share physical memory, so that data written to a given location by one process can be read by another.

Synchronous vs. asynchronous communications

Synchronous communications require some type of "handshaking" between tasks that are sharing data, it is often referred to as blocking communications since other work must wait until the communications have completed. Asynchronous communications allow tasks to transfer data independently from one another, it is often referred to as non-blocking communications since work can be done while the communications take place.

What is the CAP Theorem?

The CAP Theorem includes consistency which is what you read and write sequentially is what expected, availability which is when the whole system does not die, every non-failing node always returns a response, and partition tolerant which is when the system continues to function and uphold its consistency and availability guarantees in spite of network partitions

Efficiency of communications

The Message Passing Model is when one MPI implementation may be faster on a given hardware platform than another. Asynchronous communication operations can improve overall program performance

Parallel Overhead

The amount of time required to coordinate parallel tasks, as opposed to doing useful work. Parallel overhead can include factors such as: Task start-up time, Synchronizations, Data communications Software overhead imposed by parallel languages, libraries, operating system, etc., Task termination time

Three characteristics of the Distributed Memory/Message Passing Model

This model demonstrates the following characteristics: A set of tasks that use their own local memory during computation. Multiple tasks can reside on the same physical machine and/or across an arbitrary number of machines. Tasks exchange data through communications by sending and receiving messages. Data transfer usually requires cooperative operations to be performed by each process. For example, a send operation must have a matching receive operation

CPU / Socket / Processor / Core

This varies, depending upon who you talk to. In the past, a CPU (Central Processing Unit) was a singular execution component for a computer. Then, multiple CPUs were incorporated into a node. Then, individual CPUs were subdivided into multiple "cores", each being a unique execution unit. CPUs with multiple cores are sometimes called "sockets" - vendor dependent. The result is a node with multiple CPUs, each containing multiple cores.

Mutual Exclusion

Under no circumstances can two processes be in their critical sections (for the same data structure) at the same time.

Why use Parallel Computing?

Use parallel computing to save time and money, solve larger more complex problems, provide concurrency, and make better use of underlying parallel hardware.

Supercomputing / High Performance Computing (HPC)

Using the world's fastest and largest computers to solve large problems.

What are Vertical and Horizontal Scaling?

Vertical scaling is the increase performance of the same system by upgrading the capabilities of the system. Horizontal scaling is adding more computers instead of increasing the capabilities of the existing system.

Why do we need synchronization mechanisms?

We need synchronization because communications frequently require some kind of synchronization between tasks, it is also a critical design consideration for most parallel programs.

Visibility of communications

With the Message Passing Model, communications are explicit and generally quite visible under the control of the programmer. With the Data Parallel Model, communications often occur transparently to the programer, particularly on distributed memory architectures.

What does Amdahl's Law tell us about adding more processors to solve a problem?

You can never achieve better than 20x speedup , no matter how many processors you add to it.


Related study sets

TONGUE TWISTERS, Tongue Twister and Extra Topics., Tongue Twisters - 002, TONGUE TWISTERS, Tongue Twister 1-10, Tongue Twisters - 004, Tongue Twisters, Tongue Twisters, Tongue Twisters, tongue twister, Tongue Twister, Tongue twister

View Set

ca dmv online practice test missed questions

View Set

Verbs, Adverbs, Prepositions, Conjunctions and Interjections

View Set

Standardizing Documents Using Styles

View Set

immuno exam (Immunoproliferative, SLE, RA review questions)

View Set

HUNEnergy balance and weight management quiz

View Set