Outline of Material for Test #2

Parallel Collective Communication Primitives

Be able to define One-to-All, All-to-All, scatter, gather and All-to-All Personalized communication primitives. (In other words, be able to reconstruct Figure's 4.1, 4.8, 4.14, and 4.16 from the text)
One-to-All Broadcast

Be able to describe and analyze the Hypercube One-to-All broadcast algorithm.
Be prepared to answer questions about the algorithms as listed in program form in Algorithm 4.1, and 4.2 from the text.
Understand how the One-to-All broadcast algorithm can be used as a basis for performing the Single-Node Accumulation.
Understand and be prepared to discuss the use of relabeling processor ID's using XOR to achieve the generalized algorithm

Understand the All-to-All broadcast algorithms for the ring and Hypercube topology. In particular, pay attention to message sizes at each step of the algorithm.
Be prepared to answer questions about the algorithms as listed in program form in Algorithm 4.4, 4.5, and 4.6 from handout.
Be able to describe how the All-to-All broadcast can be used to as a basis for performing the Reduction or Multinode Accumulation operations.
Know the difference between an All-to-All reduction and an All-Reduce

Understand how this algorithm is a variation of the One-to-All broadcast except with shrinking message sizes.
Be able to derive and analyze this algorithm on a Hypercube network by generalizing the One-to-All algorithm results.

Be able to describe and analyze the the Hypercube algorithm.
Be able to describe and analyze the E-cube routing algorithm.
Be able to generalize the ts + m tw performance model to take in account network bisection width.

Be able to define Speedup, Efficiency, Cost, Cost Optimal, Problem Size, Overhead Function, and Isoefficiency Function.
Be able to determine if an algorithm is Cost Optimal.
Be able to determine if an algorithm has an Isoefficiency function
Be able to derive the overhead function and use this function to determine the existence of an Isoefficiency function. Be able to show that the overhead function is related to the isoefficiency function when one exists.
Understand why there is a lower bound on the isoefficiency function as described in section 5.4.4 from text.
What is the degree of concurrency of a parallel algorithm?
What is the relationship between degree of concurrency and the isoefficiency function? See section 5.4.5 from text.
What are the sources of parallel overhead? ANS: Communication time, Load Imbalance, and additional computations performed by the parallel algorithm.
What is scaled speedup? How do we measure it?
What is memory and time-constrained scaled speedup? How do we measure these?
Study the Homework Assignments!