Intro & Parallel Computing
Amdahl's Law (Sequential)
'a' is the runtime a sequential program spends on non-parallel computation segments, so S=1/a
Disadvantages of Distributed Memory
- Communication btwn processors decided by programmers - Hard to map existing data structures to memory - Non-uniform memory access times
Properties of Distributed Systems
- Heterogeneity - Openness - Scalability - Transparency - Concurrency - Continuous availability - Independent failures
Disadvantages of Shared Memory
- Lack of scalability btwn memory/CPUs - Programmer needs to correctly access memory
Four categories of computing systems
- SISD - SIMD - MISD - MIMD
Advantages of Distributed Memory
- Scalable - Processor has quick access to its own memory - Cost effective
Hybrid Distributed-Shared Memory
- Shared & Distributed memory architecture - Shared memory machine & GPU - Network communications required to move data
Advantages of Shared Memory
- User friendly programming perspective to memory - Fast/uniform data sharing between tasks
Parallel computing resources are typically....
- a single computer w/ multiple processors/cores - any number of computers connected by a network
Why use parallel computing?
- saves time & money - solve larger/complex problems - provides concurrency
For speedup = 1/(P/N)+(1-P); P/N = ??
0
What 2 elements characterize a distributed system?
1. Composed of multiple independent components 2. Components seen as a single entity by users
3 Parallel programming approaches
1. Data parallelism 2. Process parallelism 3. Farmer-and-worker model
Hadoop Components
1. Distributed File System (HDFS) 2. MapReduce Framework
3 Major milestones that led to cloud computing
1. Mainframes 2. Clusters 3. Grids
MapReduce Data Flow
1. Mappers read input from HDFS 2. Output is partitioned by key and sent to Reducers 3. Reducers sort input 4. Reduce output is written to HDFS
Mainframes
1st example of large computing facilities that leverage multiple CPUs
Distributed system
A collection of independent computers that appear as a single system to users
Hadoop Distributed File System (HDFS)
A highly distributed, fault-tolerant (3x) file system designed to manage large amounts of data at high speeds.
Heterogeneity
A system made of many different parts/CPUs
Virtualization
Abstracts - Hardware - Runtime environments - Storage - Networking
Grid
Aggregation of clusters
Shared memory MIMD systems
All PEs are connected to a single global memory and they all have access to it
Distributed memory MIMD systems
All PEs have a local memory
MapReduce
App programing model developed by google that processes data w/ Map() and Reduce()
How does parallel programming work?
By dividing a sequential program into small chunks so each processor can work on separate chunks of the problem
Parallel Computing
Calculations of large problems are divided into smaller parts and executed simultaneously on different processors
What is a low-cost alternative to mainframes/super-computers?
Clusters
What should happen when Silicon-based processor chips reach their physical limits?
Connect multiple processes working in coordination with each other => Parallel computing
Service orientation
Core reference model that is rapid, low-cost, flexible, interoperable, and evolving
Is the Word Count Execution example data parallelism or task parallelism?
Data Parallelism
Are the following tasks Data Parallelism or Task Parallelism? def mapper(line): foreach word in line.split(): output(word, 1) def reducer(key, values): output(key, sum(values))
Data Parallelism (same functions every time, different data)
Data parallelism
Data partitioned into several blocks which are processed in parallel
What can happen as a result of race conditions?
Deadlocks & Live locks
Clouds
Deployed in large datacenters w/ virtually unlimited resources
Quality of Service (QoS)
Functional & nonfunctional attributes used to evaluate a service
What should be done to prevent race conditions?
Implement locks/semaphores/monitors to ensure serial access
Web 2.0
Interactive, dynamic web pages
Coarse-grain parallelism
Large blocks of code executed in parallel
Threads
Lightweight subtasks
Is the speed-up of applications with fine-grain parallelism lower or higher than that of coarse-grained parallelism?
Lower
Cluster
Machines connected over the network and managed by a server; cheaper than mainframe
Same Program Multiple Data (SPMD)
Multiple copies of the same program run concurrently, each on a different data block
Non-Uniform Memory Access (NUMA)
Multiple groups of memory shared by multiple CPUs (via Bus Interconnect)
Shared memory
Multiple processors operate independently, but share the same memory resources
Multiple-instruction, single-data (MISD) systems
Multiprocessor machine that executes different instructions on different PEs for the same data set
Multiple-instruction, multiple-data (MIMD) systems
Multiprocessor machine that executes multiple instructions on multiple data sets. - Shared-memory & Distributed-memory
Single-instruction, multiple-data (SIMD) systems
Multiprocessor machine that executes the same instruction on all the CPUs for different data streams
Bit level parallelism
Number of bits per clock cycle, often called a word size, increased gradually from 4-bit, to 8-bit, 16-bit, 32-bit, and to 64-bit
Uniform Memory Access (UMA)
One group of memory shared by multiple CPUs
Serial computing uses ______ processor, but parallel computing uses ________ processors.
One; multiple
Mutual Exclusion
Only 1 process can access shared data in a critical region at a time
Apache Hadoop
Open-source framework that processes large data sets
Semaphore - wait() or down() value
P()
Amdahl's Law - If none of the code can be parallelized, then P=__ and speedup=__
P=0, speedup=1
Amdahl's Law - If all the code can be parallelized, then P=__ and speedup=__
P=1, speedup=infinite
Manjrasoft Aneka
PaaS that supports app development & runtime environments
Amdahl's Law (Parallel)
Potential program speedup is defined by the fraction of code (P) that can be parallelized, so S=1/(1-P)
Livelock
Process states constantly change in regard to each other, so none progress
Atomic operation
Process that completes without interruption
Parallel Processing
Processing multiple tasks simultaneously on multiple processors
Parallel Programming
Programming a multiprocessor system to use the divide-and-conquer technique
Why use bit level paralleism?
Reduces number of instructions required to process large operands & Improves performance
Distributed Memory
Requires a communication network to connect each processor with their own local memory
If using a personal computer and a deadlock occurs, how can you escape it?
Restart the computer
Gustafson's Law Formula
S(N) = N - a(N-1)
speed-up formula
S(N) = T(1)/T(N) >>>> S = 1/alpha
Force.com & Salesforce.com
SaaS solution for social enterprise apps & customer relationship management
Google AppEngine
Scalable runtime environment for web apps
Gustafson's Law
Scaled speed-up with N parallel processes
In what order are machine instructions processed for SISD systems?
Sequentially
Semaphore
Shared counter that can be used to implement locks
Hardware Virtualization
Simulates hardware interface & allows different software stacks to coexist on the same hardware
Fine-grain paralleism
Small blocks of code executed in parallel without the need to communicate or sync with other threads/processes
Web Services (WS)
Software components accessible over (HTTP)
In Aneka, developers can choose different abstractions to design their app such as:
Tasks, distributed threads, & map-reduce
What is the primary purpose of distributed systems?
To share & better utilize resources
T/F: A semaphore should be initialized to a non-zero value (ex: 1)
True
T/F: In Shared memory, changes in a memory location effected by one processor are visible to all other processors
True
T/F: In a time-shared or multi-processing system the exact instruction execution order cannot be predicted.
True
What have shared memory machines historically been classified as, based upon memory access times?
UMA & NUMA
Single-instruction, single-data (SISD) systems
Uniprocessor machine that executes a single instruction on a single data stream
Semaphore - signal() or up() value
V()
Microsoft Azure roles
Web role Worker role VM role
Deadlock
When 2 or more competing actions each wait for the other to finish, so neither ever does
Race Conditions
When multiple processes access shared data w/o access control and the final result depends on execution order
Web Service Description Language (WSDL)
XML language that defines characteristics of a service
Simple Object Access Protocol (SOAP)
XML language that defines how to invoke a Web service method and collect the result
Are there limits to the scalability of parallelism?
Yes
T(N)
execution time of N parallel computations
T(1)
execution time of sequential computation
The larger the Parallel portion, the ___________ the speedup
higher
In Parallel computing, increased problem size results in _____________
increased performance
A Service is...
loosely coupled, reusable, programming language independent, location transparent
speed-up S
measures effectiveness of parallelization
speedup=1 means....
no speedup
Software has traditionally been written for _____________
serial computation
Introducing the number of processors performing the parallel fraction of work, the relationship for speedup is modeled by:
speedup = 1/(P/N)+S P=parallel fraction; N=# of processors, S=serial fraction
Amdahl's Law - If 50% of the code can be parallelized, then maximum speedup=__
speedup=2, so it will run twice as fast
Processes
subtask of a parallel program