Parrie
In an MPI program with 6 processes, what is the smallest rank that any process will have?
0
How many vertices are in the MIS of a complete graph with 4 vertices
1
When running an MPI program with 6 processes that call MPI_Gather using the default communicator, how many processes will receive the data?
1
When running an MPI program with 6 processes that call MPI_Bcast using the default communicator where the source process sends an array of 12 elements, how many elements does each destination process receive?
12
When running an MPI program with 6 processes that call MPI_Reduce using the default communicator where each source process sends an array of 12 elements, how many elements does the destination process receive?
12
Assuming a serial runtime of 100s, a parallel runtime of 20s on ten cores, and a fixed overhead (cf. Slide Ch03.47), what is the expected runtime in seconds with twenty cores (do not include any units in the answer)?
15
When running an MPI program with 6 processes that call MPI_Scatter using the default communicator where the source process scatters an array of 12 elements, how many elements does each destination process receive?
2
According to Amdahl's law, what is the upper bound on the achievable speedup when 5% of the code is not parallelized?
20%
What is the speedup when 20% of the code is not parallelized and the rest of the code is perfectly parallelized (achieves linear speedup) and executed on six cores?
3
Given a parallel runtime of 10s on five cores and a serial runtime of 30s, what is the efficiency in percent (use a whole number and do not include the "%" symbol in the answer)?
30/(10*5) = 60%
Given a parallel runtime of 10s on five cores and a serial runtime of 30s, what is the speedup?
30/10 = 3
Assuming a serial runtime of 100s, a parallel runtime of 20s on ten cores, and a fixed overhead, what is the expected efficiency in percent with 15 cores (use a whole number and do not include the "%" symbol in the answer)?
40
Given a parallel runtime of 10s on five cores and a serial runtime of 30s, what is the runtime in seconds on ten cores assuming the same efficiency (do not include any units in the answer)?
5
In an MPI program with 6 processes, what is the largest rank that any process will have
5
According to Amdahl's law, what is the upper bound on the achievable speedup when 20% of the code is not parallelized?
5%
Assuming a serial runtime of 100s, a parallel runtime of 20s on ten cores, and a fixed overhead, what is the expected speedup with 15 cores
6
When running an MPI program with 6 processes that call MPI_Scatter using the default communicator, how many processes will receive a chunk of the data?
6
When running an MPI program with 6 processes that call MPI_Gather using the default communicator where each source process sends an array of 12 elements, how many elements does the destination process receive?
72
A speedup below 1.0 implies a parallelism bug.
False
Acquiring a lock by one thread before accessing a shared memory location prevents other threads from being able to access the same shared memory location, even if the other threads do not acquire a lock.
False
All data races involve at least two write operations.
False
All reductions compute a single sum.
False
Assigning the same number of array elements to each thread guarantees that there will be no load imbalance.
False
Embarrassingly parallel programs can suffer from load imbalance.
False
Every graph data structure is also a tree data structure.
False
Every parallel program requires explicit synchronization.
False
Finding a maximal independent set is NP-hard.
False
Graphs always have at least as many edges as they have vertices.
False
In the call MPI_Reduce(a, z, n, MPI_DOUBLE, MPI_SUM, 6, MPI_COMM_WORLD), each process contributes 6 elements to the reduction.
False
In the call MPI_Reduce(a, z, n, MPI_DOUBLE, MPI_SUM, 6, MPI_COMM_WORLD), process 0 is the destination.
False
In the call MPI_Reduce(a, z, n, MPI_DOUBLE, MPI_SUM, 6, MPI_COMM_WORLD), the result is written into the "a" array.
False
In the compressed adjacency-list representation, one of the arrays has exactly as many elements as there are vertices in the graph.
False
Indeterminacy is a guaranteed indication of a parallel programming bug.
False
It is possible to have a data race on a private variable.
False
Load imbalance is a form of synchronization.
False
MPI programs have to be run with more than one process.
False
MPI_Allgather performs many-to-one communication.
False
MPI_Allreduce can be emulated with MPI_Reduce followed by MPI_Scatter
False
MPI_Allreduce performs many-to-one communication.
False
MPI_Bcast performs many-to-one communication.
False
MPI_Gather implies a barrier.
False
MPI_Recv may return before the message has actually been received.
False
MPI_Recv performs many-to-one communication.
False
MPI_Scatter performs many-to-one communication.
False
MPI_Send performs many-to-one communication.
False
MPI_Ssend performs many-to-one communication.
False
Most programs can automatically be efficiently parallelized.
False
Multidimensional C/C++ arrays are stored column by column in main memory.
False
Programs running on shared-memory systems cannot suffer from data races.
False
Reduction operations must be implemented using a reduction tree.
False
Returning from an MPI_Gather call by any process implies that the process receiving the gathered result has already reached its MPI_Gather call.
False
Shared-memory systems scale better to large CPU counts than distributed-memory systems.
False
Task parallelism is a good match for SIMD machines
False
Task parallelism tends to scale much more than data parallelism.
False
The MPI_Scatter function concatenates the data from all involved processes?
False
The receive buffer size parameter in MPI_Recv calls specifies the exact length of the message to be received (in number of elements).
False
The spanning tree of a graph is always unique.
False
Two adjacent vertices can be part of the same MIS.
False
When joining a thread, that thread is killed
False
When protecting a critical section with a lock, the threads are guaranteed to enter the critical section in the order in which they first tried to acquire the lock.
False
A barrier is a synchronization primitive.
True
A cyclic distribution of the elements in an array is useful for load balancing when the amount of work per element decreases with increasing array indices.
True
A disconnected graph can contain cycles.
True
A maximal independent set (MIS) can contain all the vertices of a graph.
True
A path through a graph can contain more elements than the total number of vertices in the graph.
True
A single call to MPI_Reduce by each process suffices to reduce local histograms with many buckets into a global histogram.
True
Data races always involve at least two threads.
True
Data races are a form of indeterminacy.
True
Deadlock is a parallelism bug.
True
Distributed-memory systems generally scale to more CPU cores than shared-memory systems.
True
Distributed-memory systems require explicit communication to transfer data between compute nodes.
True
Frequency scaling was replaced by core scaling due to power density concerns.
True
GPUs are good at exploiting data parallelism.
True
In MPI_Gather, every process has to pass a parameter for the destination buffer, even processes that will not receive the result of the gather.
True
In MPI_Gather, rank 0 always contributes the first chunk of the result.
True
Indeterminacy can result in garbled program output.
True
It is possible to construct a dense graph that contains only a single edge.
True
Load imbalance is a situation where some threads/processes execute longer than others.
True
Luby's MIS algorithm is likely to finish in O(log n) parallel steps, where n is the number of vertices.q
True
MPI programs can suffer from indeterminacy.
True
MPI_Allgather is typically faster than calling MPI_Gather followed by MPI_Bcast.
True
MPI_Allreduce implies a barrier.
True
MPI_Gather performs many-to-one communication.
True
MPI_Reduce has a similar communication pattern (sends and receives of messages) as MPI_Gather.
True
MPI_Reduce may be non-blocking on more than one of the involved processes.
True
MPI_Reduce performs many-to-one communication.
True
MPI_Send may return before the message has actually been sent.
True
MPI_Ssend implies some synchronization.
True
Mutexes and locks are the same kind of synchronization primitive.
True
Reduction operations tend to be more frequent in parallel programs than in serial programs.
True
SPMD typically includes task parallelism.
True
Sending one long message in MPI is typically more efficient than sending multiple short messages with the same total length.
True
The MPI_Barrier call requires a parameter (to be passed to the function).
True
The adjacency list of a vertex can be empty.
True
The adjacency matrix of a graph typically requires O(n^2) memory, where n is the number of vertices.
True
The collatz code from the project is likely to suffer from load imbalance.
True
The communication pattern (one-to-one, one-to-many, many-to-one, or many-to-many) of MPI_Gather and MPI_Reduce are identical.
True
Undirected graphs always have a symmetric adjacency matrix.
True
When a thread attempts to acquire a lock that is already taken, it is blocked until it obtains the lock.
True