Parallel Programming Exam 1
When running an MPI program with 10 processes that call MPI_Gather using the default communicator, how many processes will receive the data?
1
When running an MPI program with 10 processes that call MPI_Scatter using the default communicator where the source process scatters an array of 40 elements in total, how many elements does each destination process receive?
4
When running an MPI program with 10 processes that call MPI_Bcast using the default communicator where the source process sends an array of 5 elements, how many elements does each destination process receive?
5
When running an MPI program with 10 processes that call MPI_Reduce using the default communicator where each source process sends an array of 5 elements, how many elements does the destination process receive?
5
When running an MPI program with 10 processes that call MPI_Gather using the default communicator where each source process sends an array of 5 elements, how many elements does the destination process receive?
50
A mutex guarantees fairness whereas a lock does not.
False
A speedup below 1.0 implies a parallelism (i.e., correctness) bug.
False
Acquiring a lock by one thread before accessing a shared memory location prevents other threads from being able to access the same shared memory location, even if the other threads do not acquire a lock.
False
Assigning an equal number of array elements to each thread guarantees that there will be load balance.
False
Barriers and locks perform the same synchronization operation.
False
Data races involve at least two write operations.
False
Embarrassingly parallel code may include barriers.
False
Every parallel program requires explicit synchronization
False
Finding a maximal independent set is NP-hard.
False
GPUs are good at exploiting task parallelism
False
Graphs always have at least as many edges as they have vertices.
False
In all collective communication MPI functions discussed in class, all ranks are both senders and receivers.
False
In the call MPI_Reduce(a, b, c, MPI_DOUBLE, MPI_MAX, 2, MPI_COMM_WORLD), each process contributes 2 elements to the reduction.
False
In the call MPI_Reduce(a, b, c, MPI_DOUBLE, MPI_MAX, 2, MPI_COMM_WORLD), process b is the destination.
False
In the call MPI_Reduce(a, b, c, MPI_DOUBLE, MPI_MAX, 2, MPI_COMM_WORLD), the result is written into the "c" array.
False
In the compressed adjacency-list representation (CSR), one of the arrays has exactly as many elements as there are vertices in the graph.
False
Indeterminacy is a form of synchronization.
False
Indeterminacy is a guaranteed indication of a parallel programming bug.
False
It is possible to have a data race on a private variable (assuming no pointer variables).
False
It is possible to have a data race on a private variable.
False
MPI programs cannot suffer from indeterminacy.
False
MPI programs must be run with at least two processes.
False
MPI_Allgather performs one-to-many communication.
False
MPI_Allreduce performs one-to-many communication.
False
MPI_Gather performs one-to-many communication.
False
MPI_Recv may return before the message has actually been received.
False
MPI_Reduce implies a barrier.
False
MPI_Reduce performs one-to-many communication.
False
MPI_Send performs one-to-many communication.
False
MPI_Ssend may return before the receiver has started receiving the message.
False
Most programs can automatically be efficiently parallelized.
False
Mutexes and locks guarantee fairness.
False
Programs running on pure distributed-memory systems can suffer from data races.
False
Reductions always compute a sum.
False
Reductions are embarrassingly parallel.
False
Returning from an MPI_Reduce call by any process implies that the process receiving the reduced result has already reached its MPI_Reduce call.
False
Shared-memory systems require explicit communication to transfer data between threads.
False
Task parallelism tends to scale more than data parallelism.
False
The adjacency matrix of a graph typically requires O(n) memory, where n is the number of vertices.
False
The call MPI_Reduce(a, b, c, MPI_DOUBLE, MPI_MAX, 2, MPI_COMM_WORLD) performs a sum reduction.
False
The receive buffer size parameter in MPI_Recv calls specifies the exact length of the message to be received (in number of elements).
False
The serial fractal code from the projects contains a loop-carried data dependence in the innermost loop (that iterates over the columns).
False
The spanning tree of a graph is always unique.
False
Two adjacent vertices can be part of the same MIS.
False
When joining a thread, that thread is immediately terminated.
False
When protecting a critical section with a lock, the threads are guaranteed to enter the critical section in the order in which they first tried to acquire the lock.
False
After which of the following MPI calls does the sender know that the receiver has started receiving the message? Check all that apply.
MPI_SSend
Efficiency Formula
Speedup / # of threads
A cyclic distribution of the elements in an array is useful for load balancing when the amount of work per element decreases with increasing array indices.
True
A cyclic distribution of the elements in an array is useful for load balancing when the amount of work per element increases with increasing array indices.
True
A disconnected graph can contain cycles.
True
A lock is a synchronization primitive.
True
A maximal independent set (MIS) can contain all the vertices of a graph.
True
A path through a graph can contain more elements than the total number of vertices in the graph.
True
A single call to MPI_Reduce by each process suffices to reduce local histograms with many buckets into a global histogram.
True
All collective communication MPI functions we have discussed in the lectures imply some synchronization.
True
Data parallelism is a good match for SIMD machines.
True
Data races involve at least two threads.
True
Deadlock is a parallelism bug.
True
Distributed-memory systems generally scale to more CPU cores than shared-memory systems.
True
Embarrassingly parallel programs can suffer from load imbalance.
True
Frequency scaling was replaced by core scaling due to power density concerns.
True
If each process calls MPI_Reduce k > 1 times with the same destination, operator, and communicator, the reductions are matched based on the order in which they are called.
True
In MPI_Gather, every process must pass a parameter for the destination buffer, even processes that will not receive the result of the gather.
True
In MPI_Gather, rank 0 always contributes the first chunk of the result.
True
In the compressed adjacency-list representation (CSR), one of the arrays has exactly as many elements as there are edges in the graph.
True
In this course, we should always call MPI_Init with two NULL parameters.
True
Indeterminacy can result in garbled program output.
True
It is possible to construct a dense graph that contains only three edges.
True
Load imbalance is a situation where some threads/processes execute longer than others.
True
Luby's MIS algorithm is likely to finish in O(log n) parallel steps, where n is the number of vertices.
True
MPI_Aint denotes an integer type that is large enough to hold an address.
True
MPI_Allgather can be emulated with MPI_Gather followed by MPI_Bcast.
True
MPI_Allgather has a similar communication pattern (sends and receives of messages) as MPI_Allreduce.
True
MPI_Allgather implies a barrier.
True
MPI_Allreduce is typically faster than calling MPI_Reduce followed by MPI_Bcast.
True
MPI_Bcast performs one-to-many communication.
True
MPI_Gather may be non-blocking on more than one of the involved processes.
True
MPI_Reduce has a similar communication pattern (sends and receives of messages) as MPI_Gather.
True
MPI_Scatter performs one-to-many communication.
True
Matrix-vector multiplication is embarrassingly parallel.
True
Mutexes and locks are the same kind of synchronization primitive.
True
Parallelism can be used to speed up a computation or to reduce its energy consumption.
True
Programs with data races may produce indeterminate output.
True
Reduction operations can be implemented using a reduction tree.
True
Reduction operations tend to be more frequent in parallel programs than in serial programs.
True
SPMD typically includes task parallelism.
True
Sending one long message in MPI is typically more efficient than sending multiple short messages with the same total amount of data.
True
The MPI_Gather function concatenates the data from all involved processes by rank order.
True
The adjacency list of a vertex can be empty.
True
The busy-waiting code from the slides contains a data race.
True
The communication pattern (one-to-one, one-to-many, many-to-one, or many-to-many) of MPI_Scatter and MPI_Bcast are identical.
True
Undirected graphs always have a symmetric adjacency matrix.
True
When a thread attempts to acquire a lock that is already taken, it is blocked until it obtains the lock.
True
When parallelized, the Collatz code from the projects is likely to suffer from load imbalance.
True