Outline of Material for Test #1
Parallel Programming Platforms
- Flynn's Taxonomy (SISD,SIMD,MIMD)
- Understand discussion in section 2.3.1
- Also understand SPMD (Single Program Multiple Data)
- Memory Access Architectures
Understand:
- Shared Memory Multiprocessor System [Figure 2.5]
- single address space
- uniform memory access (UMA)/ nonuniform memory access (NUMA)
- distributed shared memory system
- Message Passing Multicomputer (Distributed Memory)
- Network Models
- Be able to discuss the implementation, cost, and expected execution times of the following switching networks
- cross bar switch
- bus
- Omega network
- Be prepared to identify and discuss blocking in an omega network (figure 2.13)
- Be prepared to draw a small omega network configuration
- Review homework!
- Be able to discuss static network topologies
- Star Connected
- Completely Connected
- Linear Arrays, Meshes, and k-d Meshes
- Hypercube
- Tree and Fat Tree Networks
- Be able to compute Diameter, Bisection Width, and
Cost for Linear, Ring, Mesh, Torus, Tree, and
Hypercube networks.
- Understand how to construct a d-dimensional
hypercube recursively.
- Be able to define and compute Diameter,
Connectivity, Bisection Width, Bisection Bandwidth,
and Cost for any arbitrary network.
- Communication Cost Models for Static Interconnection
Networks
Know and Understand:
- communication latency
- Communication times for
- Store and Forward Routing
- Cut-Through Routing
- Using Simplified Cost model (and rational for simplification)
- startup time, ts
- Per-word transfer time tw
- hop time th
- Understand Mapping and its impact on performance
- Be able to define and discuss congestion, dilation, and expansion. How does each of these measures of graph mappings impact communication performance.
- PRAM Models
- Be able to discuss the PRAM machine architecture and
assumptions
- Understand Types of PRAM models:
Combining CRCW, Priority CRCW, Arbitrary CRCW,
Common CRCW, CREW, and CREW.
- Review Homework problems!
MPI and Message Passing
- Be prepared to describe or discuss the roll or function of the following MPI primitives
- MPI_Init()
- What Arguments are passed to this routine?
- What is the purpose of this call?
- MPI_Finalize()
- MPI_COMM_WORLD
- What are communicators and what role does this identifier play in the creation and use of communicators in MPI?
- What calls are related to communicators in the MPI library? What do these calls do?
- MPI_Comm_size()
- MPI_Comm_rank()
- MPI_Comm_dup()
- MPI_Comm_split()
- MPI_Send()
- What is passed to this call and what purpose do these arguments serve?
- MPI_Recv()
- What arguments are passed to this call? What arguments are different from MPI_Send()? Why are there differences?
- MPI_Sendrecv()
- What purpose does this call serve?
- What are the arguments that this call takes?
- MPI non-blocking operations
- Be able to describe how to perform non-blocking operations in MPI.
- Why do we need non-blocking communication operations?
- MPI_Isend()
- How is the Isend different from send? What does it do that is different? How are the arguments different?
- MPI_Irecv()
- How is the Irecv different from the recv? What does it do that is different? How are the arguments different?
- MPI_Test()
- What does this call do? Why would it be useful?
- MPI_Wait()
- What does this call do? How is it different from a MPI_Test()?
- MPI_Request_free()
- What is a MPI_Request? Why would you want to free it?
- MPI collective Communications
- What are collective communications in MPI?
- Can we perform a collective communication of a subset of processors? If so, how do we do that?
- Do collective communications use tags? Why or why not?
- Know the following Collective Calls: What they do and what arguments they have.
- MPI_Barrier()
- MPI_Bcast()
- MPI_Reduce()
- MPI_Allreduce()
How is this different from MPI_Reduce()?
- MPI_Gather()
- MPI_Scatter()
What chapter 4 communication algorithm is this implementing?
- MPI_Allgather()
What chapter 4 communication algorithm is this implementing?
- MPI_Alltoall
What chapter 4 communication algorithm is this implementing?
Be able to discuss how deadlock is possible when using MPI_Send()
and MPI_Recv() as discussed in class.
Be able to discuss buffering in message passing protocols.
Review Homework!
Parallel Collective Communication Primitives
- Be able to define One-to-All, All-to-All, scatter, gather
and All-to-All Personalized communication
primitives. (In other words, be able to reconstruct Figures
4.1, 4.8, 4.14, and 4.16 from the text)
- One-to-All Broadcast
- Be able to describe and analyze the Hypercube One-to-All
broadcast algorithm.
- Be prepared to answer questions about the algorithms as
listed in program form in Algorithm 4.1, and 4.2 from the text.
- Understand how the One-to-All broadcast algorithm can be
used as a basis for performing the Single-Node
Accumulation.
- Understand and be prepared to discuss the use of relabeling processor ID's using XOR to achieve the generalized algorithm
- All-to-All Broadcast
- Understand the All-to-All broadcast algorithms for the
ring and Hypercube topology. In particular, pay attention to message
sizes at each step of the algorithm.
- Be prepared to answer questions about the algorithms as
listed in program form in Algorithm 4.4, 4.5, and 4.6 from handout.
- Be able to describe how the All-to-All broadcast can be
used to as a basis for performing the Reduction or
Multinode Accumulation operations.
- Know the difference between an All-to-All reduction and an All-Reduce
- One-to-All Personalized Communication
- Understand how this algorithm is a variation of the
One-to-All broadcast except with shrinking message
sizes.
- Be able to derive and analyze this algorithm on a Hypercube network by generalizing the One-to-All (scatter,gather) algorithm results.
- All-to-All Personalized Communications
- Be able to describe and analyze the the Hypercube algorithm.
- Be able to describe and analyze the E-cube routing algorithm.
- Be able to generalize the ts + m tw performance model to take in account network bisection width.
- Be able to describe how the all-to-all personalized algorithm relates to matrix transpose.
- Prefix-Sum algorithm
- Know what a prefix-sum is and how it is used
- Be able to describe the hypercube algorithm
- Be able to estimate running time using the ts + m tw communication cost model
- Communication optimizations
- Be able to determine if an argument is optimal based on bisection bandwidth arguments.
- Be able to discuss and analyze the optimizations of communication algorithms as described in section 4.7
- Know table 4.1 and 4.2 for the algorithms we have covered.
- Be able to derive timing measurements for any of the above algorithms!
- Study the Homework Assignments!
Principles of Parallel Algorithm Design
- Algorithm Decomposition
- Be able to define Tasks and Task Dependency Graphs
- Be able to discuss Task Interactions
- granularity (fine or coarse grained)
- degree of concurrency (what does it measure)
- maximum degree of concurrency
- average degree of concurrency
- What is a Critical Path? Why is it important to the analysis of parallel algorithm performance?
- What is a task interaction graph? How is it different from a
task dependency graph? What do we use a task interaction graph for?
- Processes, Mapping, and Processors: Understand the roll each of these play in the design of parallel algorithms.
- Decomposition Techniques
- Recursive Decomposition
- What is recursive decomposition
- Well suited to divide and conquer algorithms (such as quick-sort)
- Data Decomposition
- Be able to define data decomposition.
- Be able to discuss: Mapping data partitions to task partitions, partitioning input data v.s. Output Data, and owner-computes rule.
- Exploratory Decomposition
- What is Exploratory Decomposition?
- When would you apply Exploratory Decomposition?
- What types of problems are well suited for exploratory decomposition?
- Speculative Decomposition
- What is Speculative Decomposition?
- How is Speculative Decomposition different from Exploratory Decomposition?
- When would you you apply speculative decomposition?
- Hybrid Decompositions
- What are hybrid decompositions?
- Be able to discuss an example of hybrid decompositions used to design a parallel algorithm.
- Characteristics of Tasks and Interactions
- Be able to define and discuss dynamic task generation. What problems does this present for parallel programs? How do we deal with these problems?
- How does the size of data associated with tasks effect our decisions regarding parallel algorithm implementation?
- How can we take advantage of static task interactions to improve performance? How is this question related to one-way v.s. two-way (one-sided, v.s. two-sided) communication?
- What is the difference between regular and irregular interactions? What is the difference between irregular and dynamic interactions?
- What is the impact of read-only access to data v.s. Read-Write?
- What do we mean when we say an interaction is two-way or two-sided?
- What are some static mapping techniques? (e.g. Array Distribution schemes, block, cyclic, block-cyclic) What are there so many mappings?
- What is graph partitioning? When is it important for parallel algorithm design?
- What are hierarchical Mappings?
- Schemes for Dynamic Mapping
- What is the tension between centralized and distributed schemes?
- What is self scheduling and chunk-scheduling?
- Methods for Containing Interaction Overheads
Be able to define and discuss the eight methods for containing
interaction overheads in parallel programs.
- Maximizing Data Locality
- Minimizing Volume of Data Exchange
- Minimize Frequency of Interactions
- Minimizing Contention and Hot Spots
- Overlapping Computations with Interactions
- Replicating Data or Computations
- Using Optimized Collective Interaction Operations
- Overlapping Interactions with Other Interactions
- Parallel Algorithm Models
Be able to define and compare/contrast the different parallel algorithm models ads discussed in section 3.6.
- The Data-Parallel model
- The Task-Graph model
- The Work Pool Model
- The Master-Slave Model
- The Pipeline or Producer-Consumer model
- Review Homework!