Parallel Programming, Homework(1-3) Review

Lakukan tugas rumah & ujian kamu dengan baik sekarang menggunakan Quizwiz!

The communication pattern (one-to-one, one-to-many, many-to-one, or many-to-many) of MPI_Scatter and MPI_Reduce are identical.

False

The default communicator that consists of all the processes created when an MPI program is started is called MPI_WORLD_COMM.

False

When running an MPI program with 10 processes that call MPI_Scatter using the default communicator, how many processes will receive a chunk of the data?

10

In the call MPI_Reduce(a, z, n, MPI_DOUBLE, MPI_SUM, 8, MPI_COMM_WORLD), each process contributes 8 elements to the reduction.

False

In the call MPI_Reduce(a, z, n, MPI_DOUBLE, MPI_SUM, 8, MPI_COMM_WORLD), process 0 is the destination.

False

In the call MPI_Reduce(a, z, n, MPI_DOUBLE, MPI_SUM, 8, MPI_COMM_WORLD), the result is written into the "a" array.

False

In the compressed adjacency-list representation, one of the arrays has exactly as many elements as there are vertices in the graph.

False

Indeterminacy is a guaranteed indication of the presence of a parallel programming bug.

False

It is possible to have a data race on a private variable.

False

Load imbalance is a form of synchronization.

False

MPI programs have to be run with more than one process.

False

MPI_Allgather performs many-to-one communication.

False

MPI_Allreduce performs many-to-one communication.

False

MPI_Bcast performs many-to-one communication.

False

MPI_Recv may return before the message has actually been received.

False

MPI_Recv performs many-to-one communication

False

MPI_Scatter performs many-to-one communication.

False

MPI_Send implies some synchronization.

False

MPI_Send performs many-to-one communication.

False

MPI_Sendrecv performs many-to-one communication.

False

MPI_Ssend performs many-to-one communication.

False

Most programs can be automatically and efficiently parallelized

False

Mutexes enforce an order among the threads.

False

Pure distributed-memory programs can suffer from data races.

False

Returning from an MPI_Gather call by any process implies that the process receiving the gathered result has already reached its MPI_Gather call.

False

Task parallelism is a good match for SIMD machines.

False

Task parallelism tends to scale much more than data parallelism

False

The receive buffer size parameter in MPI_Recv calls specifies the length of the message to be received (in number of elements).

False

The spanning tree of a graph is always unique.

False

Two adjacent vertices can be part of the same MIS.

False

When joining a thread, that thread is killed.

False

You have to use the "private" keyword to declare a private variable in pthreads.

False

According to Amdahl's law, what is the upper bound on the achievable speedup when 50% of the code is not parallelized?

2

Given a parallel runtime of 20s on eight cores and a serial runtime of 80s, what is the speedup?

4

Reduction operations have to be implemented using a tree structure.

False

When protecting a critical section with a lock, the threads are guaranteed to enter the critical section in the order in which they called the Lock function.

False

A barrier is a synchronization primitive.

True

In an MPI program with 8 processes, what is the smallest rank that any process will have?

0

When running an MPI program with 10 processes that call MPI_Gather using the default communicator, how many processes will receive the data?

1

Mutexes and locks are the same kind of synchronization primitive.

True

Assuming a serial runtime of 60s, a parallel runtime of 12s on six cores, and a fixed overhead, what is the expected efficiency in percent with ten cores (do not include the "%" symbol in the answer)?

0.75

How many vertices are in the MIS of a complete graph with 7 vertices total?

1

Given a parallel runtime of 20s on eight cores and a serial runtime of 80s, what is the runtime in seconds on 16 cores assuming the same efficiency (do not include any units in the answer)?

10

If the input buffer of an MPI_Reduce has 10 elements in each process and the reduction operator is MPI_PROD, how many elements will the output buffer have?

10

Graphs must have at least one edge.

False

When running an MPI program with 10 processes that call MPI_Gather using the default communicator where each source process sends an array of 10 elements, how many elements does each destination process receive?

100

Starting with a single thread and assuming that each running thread creates one additional thread in each step (i.e., there will be 2 threads at the end of the first step), how many threads will be active after five steps?

32

According to Amdahl's law, what is the upper bound on the achievable speedup when 20% of the code is not parallelized?

5

What is the speedup when 10% of the code is not parallelized and the rest of the code is perfectly parallelized and executed on 9 cores?

5

Given a parallel runtime of 20s on eight cores and a serial runtime of 80s, what is the efficiency in percent (do not include the "%" symbol in the answer)?

50

In an MPI program with 8 processes, what is the largest rank that any process will have?

7

Assuming a serial runtime of 60s, a parallel runtime of 12s on six cores, and a fixed overhead, what is the expected speedup with ten cores?

7.5

Assuming a serial runtime of 60s, a parallel runtime of 12s on six cores, and a fixed overhead (called "Toverhead" in the slides), what is the expected runtime in seconds with ten cores (do not include any units in the answer)?

8

If a pthreads program calls pthread_create 8 times, how many total program threads will be running?

9

Acquiring a lock by one thread before accessing a shared memory location guarantees that no other thread will access the same shared memory location while the lock is being held.

False

All reductions compute a sum.

False

Busy waiting always results in worse performance than using a pthread mutex.

False

Calling pthread_create(&handle, NULL, abc, (void *)thread) creates a thread called "abc".

False

Data parallelism requires separate program statements for each data item.

False

Every graph data structure is also a tree data structure.

False

Finding a maximal independent set is NP-hard.

False

Global variables are private variables in pthreads.

False

Graphs always have at least as many edges as vertices.

False

A cyclic distribution of the elements in an array is useful for load balancing when the amount of work per element decreases with increasing array indices.

True

A data race is a form of indeterminacy.

True

A disconnected graph can contain cycles.

True

A maximal independent set (MIS) can contain all the vertices of a graph.

True

A path through a graph can contain more elements than the total number of vertices in the graph.

True

A single call to MPI_Reduce by each process suffices to reduce local histograms with many buckets into a global histogram.

True

Busy waiting can cause data races.

True

Calling pthread_create(&handle, NULL, abc, (void *)thread) records information about a thread in "handle".

True

Distributed-memory systems require explicit communication to transfer data between compute nodes.

True

Distributed-memory systems scale to larger numbers of CPU cores than shared-memory systems.

True

Embarrassingly parallel programs can suffer from load imbalance.

True

Frequency scaling was replaced by core scaling due to power density concerns.

True

GPUs exploit data parallelism.

True

In MPI_Gather, every process has to pass a parameter for the destination buffer, even processes that will not receive the result of the gather.

True

Indeterminacy can result in garbled program output.

True

It is possible to construct a dense graph that contains only a single edge.

True

Load imbalance is a situation where some threads/processes execute longer than others.

True

Luby's MIS algorithm is likely to finish in O(log n) parallel steps, where n is the number of vertices.

True

MPI programs can suffer from indeterminacy.

True

MPI_Allgather implies a barrier.

True

MPI_Allgather is typically faster than calling MPI_Gather followed by MPI_Bcast

True

MPI_Allreduce can be emulated with MPI_Reduce followed by MPI_Bcast.

True

MPI_Gather performs many-to-one communication.

True

MPI_Reduce has a similar communication pattern (sends and receives of messages) as MPI_Gather.

True

MPI_Reduce performs many-to-one communication.

True

MPI_Send may return before the message has actually been sent.

True

Multidimensional C/C++ arrays are stored row by row in main memory.

True

Reduction operations are more frequent in parallel programs than in serial programs.

True

SPMD typically includes task parallelism.

True

The adjacency list of a vertex can be empty.

True

The adjacency matrix of a sparse graph requires O(n^2) memory, where n is the number of vertices.

True

The fractal code from the project is likely to suffer from load imbalance.

True

The tree-based reduction approach leads to load imbalance.

True

Undirected graphs always have a symmetric adjacency matrix.

True

When a thread attempts to acquire a lock that is already taken, it is blocked until it obtains the lock.

True

When oversubscribing threads, not all threads are simultaneously executing.

True

When parallelizing the "range" loop of the collatz code, a cyclic distribution of the loop iterations to processes is advisable.

True


Set pelajaran terkait

Life Quiz : Policy provisions, Riders and Options

View Set

Chapter 45: Hormones & The Endocrine System

View Set

Authors notes and chapter 1-4 Life of Pi

View Set

AS 100 Customs & Courtesies/ Dress and Appearance

View Set

COMMERCIAL PACKAGE POLICIES & COMMERCIAL PROPERTY INSURANCE

View Set

[states of matter and gasses] SOLIDS AND PLASMAS *pre-test*

View Set