Outline of Material for Test #1
Parallel Programming Platforms
- Flynn's Taxonomy (SISD,SIMD,MIMD)
- Understand discussion in section 2.3.1
- Also understand SPMD (Single Program Multiple Data)
- Memory Access Architectures
Understand:
- Shared Memory Multiprocessor System [Figure 2.5]
- single address space
- uniform memory access (UMA)/ nonuniform memory access (NUMA)
- distributed shared memory system
- Message Passing Multicomputer (Distributed Memory
- Network Models
- Be able to discuss the implementation, cost, and expected execution times of the following switching networks
- cross bar switch
- bus
- Omega network
- Be prepared to identify and discuss blocking in an omega network (figure 2.13)
- Be prepared to draw a small omega network configuration
- Review homework!
- Be able to discuss static network topologies
- Star Connected
- Completely Connected
- Linear Arrays, Meshes, and k-d Meshes
- Hypercube
- Tree and Fat Tree Networks
- Be able to compute Diameter, Bisection Width, and
Cost for Linear, Ring, Mesh, Torus, Tree, and
Hypercube networks.
- Understand how to construct a d-dimensional
hypercube recursively.
- Be able to define and compute Diameter,
Connectivity, Bisection Width, Bisection Bandwidth,
and Cost for any arbitrary network.
- Communication Cost Models for Static Interconnection
Networks
Know and Understand:
- communication latency
- Communication times for
- Store and Forward Routing
- Cut-Through Routing
- Using Simplified Cost model (and rational for simplification)
- startup time, ts
- Per-word transfer time tw
- hop time th
- Routing Mechanisms for Interconnection Networks
- Be able to define:
- minimal/nonmiminal routing
- deterministic routing
- adaptive routing
- dimension-ordered routing
- E-cube routing
- Understand Mapping and its impact on performance
- Be able to define and discuss congestion, dilation, and expansion. How does each of these measures of graph mappings impact communication performance.
- PRAM Models
- Be able to discuss the PRAM machine architecture and
assumptions
- Understand Types of PRAM models:
Combining CRCW, Priority CRCW, Arbitrary CRCW,
Common CRCW, CREW, and CREW.
- Know Brent's Theorem!
- Review Homework problems!
- LogP Model
- Be able to define the Model Parameters, L, o, g, and
P.
- Understand the discussion of estimating running time
of the broadcast under the LogP model as discussed in
class and the LogP paper.
- Be able to use the LogP model to compute running
times of programs.
- BSP Model
- Be able to discuss the BSP execution model. (What is
the BSP execution model and what does it consist of?)
- Be able to use the cost model for BSP [w + hg + l] to
estimate running times of BSP algorithms.
- What is a h-relation? A 1-relation? A p-relation?
- What is a superstep in the BSP model?
- What role does the barrier operation play in the BSP model?
- Be able to discuss an optimal broadcast algorithm using a non-balanced tree structure in the BSP model.
MPI and Message Passing
- Be prepared to describe or discuss the roll or function of the following MPI primitives
- MPI_Init()
- MPI_Finalize()
- MPI_COMM_WORLD
- MPI_Comm_size()
- MPI_Comm_rank()
- MPI_Send()
- MPI_Recv()
- MPI_Sendrecv()
- MPI_Isend()
- MPI_Irecv()
- MPI_Test()
- MPI_Wait()
- MPI_Request_free()
- MPI_Barrier()
- MPI_Bcast()
- MPI_Reduce()
-
Be able to discuss how deadlock is possible when using MPI_Send()
and MPI_Recv() as discussed in class.
- Be able to discuss buffering in message passing protocols.< /LI>
- Review Homework!
Principles of Parallel Algorithm Design
- Algorithm Decomposition
- Be able to define Tasks and Task Dependency Graphs
- Be able to discuss Task Interactions
- granularity (fine or coarse grained)
- degree of concurrency (what does it measure)
- maximum degree of concurrency
- average degree of concurrency
- What is a Critical Path? Why is it important to the analysis of parallel algorithm performance?
- What is a task interaction graph? How is it different from a
task dependency graph? What do we use a task interaction graph for?
- Processes, Mapping, and Processors: Understand the roll each of these play in the design of parallel algorithms.
- Decomposition Techniques
- Recursive Decomposition
- What is recursive decomposition
- Well suited to divide and conquer algorithms (such as quick-sort)
- Data Decomposition
- Be able to define data decomposition.
- Be able to discuss: Mapping data partitions to task partitions, partitioning input data v.s. Output Data, and owner-computes rule.
- Exploratory Decomposition
- What is Exploratory Decomposition?
- When would you apply Exploratory Decomposition?
- What types of problems are well suited for exploratory decomposition?
- Speculative Decomposition
- What is Speculative Decomposition?
- How is Speculative Decomposition different from Exploratory Decomposition?
- When would you you apply speculative decomposition?
- Hybrid Decompositions
- What are hybrid decompositions?
- Be able to discuss an example of hybrid decompositions used to design a parallel algorithm.
- Characteristics of Tasks and Interactions
- Be able to define and discuss dynamic task generation. What problems does this present for parallel programs? How do we deal with these problems?
- How does the size of data associated with tasks effect our decisions regarding parallel algorithm implementation?
- How can we take advantage of static task interactions to improve performance? How is this question related to one-way v.s. two-way (one-sided, v.s. two-sided) communication?
- What is the difference between regular and irregular interactions? What is the difference between irregular and dynamic interactions?
- What is the impact of read-only access to data v.s. Read-Write?
- What do we mean when we say an interaction is two-way or two-sided?
- What are some static mapping techniques? (e.g. Array Distribution schemes, block, cyclic, block-cyclic) What are there so many mappings?
- What is graph partitioning? When is it important for parallel algorithm design?
- What are hierarchical Mappings?
- Schemes for Dynamic Mapping
- What is the tension between centralized and distributed schemes?
- What is self scheduling and chunk-scheduling?
- Methods for Containing Interaction Overheads
Be able to define and discuss the eight methods for containing
interaction overheads in parallel programs.
- Maximizing Data Locality
- Minimizing Volume of Data Exchange
- Minimize Frequency of Interactions
- Minimizing Contention and Hot Spots
- Overlapping Computations with Interactions
- Replicating Data or Computations
- Using Optimized Collective Interaction Operations
- Overlapping Interactions with Other Interactions
- Parallel Algorithm Models
Be able to define and compare/contrast the different parallel algorithm models ads discussed in section 3.6.
- The Data-Parallel model
- The Task-Graph model
- The Work Pool Model
- The Master-Slave Model
- The Pipeline or Producer-Consumer model
- Review Homework!