Outline of Material for Test #1

Parallel Programming Platforms

Flynn's Taxonomy (SISD,SIMD,MIMD)

Understand discussion in section 2.3.1
Also understand SPMD (Single Program Multiple Data)

Memory Access Architectures

Shared Memory Multiprocessor System [Figure 2.5]

single address space
uniform memory access (UMA)/ nonuniform memory access (NUMA)
distributed shared memory system

Message Passing Multicomputer (Distributed Memory

Network Models

Be able to discuss the implementation, cost, and expected execution times of the following switching networks

cross bar switch
bus
Omega network

Be prepared to identify and discuss blocking in an omega network (figure 2.13)
Be prepared to draw a small omega network configuration
Review homework!

Be able to discuss static network topologies

Star Connected
Completely Connected
Linear Arrays, Meshes, and k-d Meshes
Hypercube
Tree and Fat Tree Networks

Be able to compute Diameter, Bisection Width, and Cost for Linear, Ring, Mesh, Torus, Tree, and Hypercube networks.
Understand how to construct a d-dimensional hypercube recursively.
Be able to define and compute Diameter, Connectivity, Bisection Width, Bisection Bandwidth, and Cost for any arbitrary network.
Communication Cost Models for Static Interconnection Networks

communication latency
Communication times for

Store and Forward Routing
Cut-Through Routing
Using Simplified Cost model (and rational for simplification)

startup time, ts
Per-word transfer time tw
hop time th

Routing Mechanisms for Interconnection Networks

Be able to define:

minimal/nonmiminal routing
deterministic routing
adaptive routing
dimension-ordered routing
E-cube routing

Understand Mapping and its impact on performance

Be able to define and discuss congestion, dilation, and expansion. How does each of these measures of graph mappings impact communication performance.

PRAM Models

Be able to discuss the PRAM machine architecture and assumptions
Understand Types of PRAM models:

Combining CRCW, Priority CRCW, Arbitrary CRCW, Common CRCW, CREW, and CREW.

Know Brent's Theorem!
Review Homework problems!

LogP Model

Be able to define the Model Parameters, L, o, g, and P.
Understand the discussion of estimating running time of the broadcast under the LogP model as discussed in class and the LogP paper.
Be able to use the LogP model to compute running times of programs.

BSP Model

Be able to discuss the BSP execution model. (What is the BSP execution model and what does it consist of?)
Be able to use the cost model for BSP [w + hg + l] to estimate running times of BSP algorithms.
What is a h-relation? A 1-relation? A p-relation?
What is a superstep in the BSP model?
What role does the barrier operation play in the BSP model?
Be able to discuss an optimal broadcast algorithm using a non-balanced tree structure in the BSP model.

MPI and Message Passing

Be prepared to describe or discuss the roll or function of the following MPI primitives

MPI_Init()
MPI_Finalize()
MPI_COMM_WORLD
MPI_Comm_size()
MPI_Comm_rank()
MPI_Send()
MPI_Recv()
MPI_Sendrecv()
MPI_Isend()
MPI_Irecv()
MPI_Test()
MPI_Wait()
MPI_Request_free()
MPI_Barrier()
MPI_Bcast()
MPI_Reduce()

Be able to discuss how deadlock is possible when using MPI_Send() and MPI_Recv() as discussed in class.
Be able to discuss buffering in message passing protocols.< /LI>
Review Homework!

Principles of Parallel Algorithm Design

Algorithm Decomposition

Be able to define Tasks and Task Dependency Graphs
Be able to discuss Task Interactions

granularity (fine or coarse grained)
degree of concurrency (what does it measure)

maximum degree of concurrency
average degree of concurrency

What is a Critical Path? Why is it important to the analysis of parallel algorithm performance?
What is a task interaction graph? How is it different from a task dependency graph? What do we use a task interaction graph for?
Processes, Mapping, and Processors: Understand the roll each of these play in the design of parallel algorithms.
Decomposition Techniques

Recursive Decomposition

What is recursive decomposition
Well suited to divide and conquer algorithms (such as quick-sort)

Data Decomposition

Be able to define data decomposition.
Be able to discuss: Mapping data partitions to task partitions, partitioning input data v.s. Output Data, and owner-computes rule.

Exploratory Decomposition

What is Exploratory Decomposition?
When would you apply Exploratory Decomposition?
What types of problems are well suited for exploratory decomposition?

Speculative Decomposition

What is Speculative Decomposition?
How is Speculative Decomposition different from Exploratory Decomposition?
When would you you apply speculative decomposition?

Hybrid Decompositions

What are hybrid decompositions?
Be able to discuss an example of hybrid decompositions used to design a parallel algorithm.

Characteristics of Tasks and Interactions

Be able to define and discuss dynamic task generation. What problems does this present for parallel programs? How do we deal with these problems?
How does the size of data associated with tasks effect our decisions regarding parallel algorithm implementation?
How can we take advantage of static task interactions to improve performance? How is this question related to one-way v.s. two-way (one-sided, v.s. two-sided) communication?
What is the difference between regular and irregular interactions? What is the difference between irregular and dynamic interactions?
What is the impact of read-only access to data v.s. Read-Write?
What do we mean when we say an interaction is two-way or two-sided?
What are some static mapping techniques? (e.g. Array Distribution schemes, block, cyclic, block-cyclic) What are there so many mappings?
What is graph partitioning? When is it important for parallel algorithm design?
What are hierarchical Mappings?
Schemes for Dynamic Mapping

What is the tension between centralized and distributed schemes?
What is self scheduling and chunk-scheduling?

Methods for Containing Interaction Overheads

Be able to define and discuss the eight methods for containing interaction overheads in parallel programs.

Maximizing Data Locality
Minimizing Volume of Data Exchange
Minimize Frequency of Interactions
Minimizing Contention and Hot Spots
Overlapping Computations with Interactions
Replicating Data or Computations
Using Optimized Collective Interaction Operations
Overlapping Interactions with Other Interactions

Parallel Algorithm Models

Be able to define and compare/contrast the different parallel algorithm models ads discussed in section 3.6.

The Data-Parallel model
The Task-Graph model
The Work Pool Model
The Master-Slave Model
The Pipeline or Producer-Consumer model

Review Homework!