Skip to content

20. HW architectures

Mar 22, 2021

Symmetric multiprocessing

Symmetric multiprocessing (SMP) involves a multiprocessor computer hardware and software architecture where two or more identical processors are connected to a single, shared main memory, have full access to all input and output devices, and are controlled by a single operating system instance that treats all processors equally, reserving none for special purposes. Most multiprocessor systems today use an SMP architecture.

img

Crossbar architecture

In 1970s: IBM built mainframe computers realizing it is a bottleneck → how to improve? Crossbar architecture with switch at each intersection.

img

  • uses source routing to navigate and to exchange data
  • 4 processors can access 4 banks of data concurrently
  • this system does not scale well due to hardware cost

Hardware cost of crossbar:

\(HW\$ = c_1 \cdot P + c_2 \cdot \text{Memory} + c_3 \cdot \Omega(P^2)\)

Hardware cost for parallel:

\(HW\$ = c_1 \cdot P + c_2 \cdot \text{Memory} + c_3 \cdot \text{wire} = \Theta(P)\)

Another idea: cut crossbars by half

img

\(HW\$ = c_1 \cdot P + c_2 \cdot \text{Memory} + c_3 \cdot \Omega(\frac{P^2}{2})\)

Asymptotically not better but from a practical standpoint this is a significant saving.

Linear processor array

Next suggestion: make a linear processor array (also called systolic array).

\(HW\$ = c_1 \cdot P + c_2 \cdot \text{Memory} + c_3 \cdot \text{Wires}\)

img

Assume we have a PRAM \(P=N\), \(W = P \cdot T_P\). Simulate sequential processor doing DO-ALL. How does P get data from 1? Need:

  • \(P-1\) steps to read
  • 1 step to compute, and
  • \(P-1\) steps to write

Simulation time: \(T = (P-1 \cdot T_P)\)

Simulation work: \(W = \Theta(P^2) \cdot T_P\)

This is not a good algorithm for this architecture, but the algorithm itself is still good. This architecture may be suitable from specialized problems, not for general purpose problems. One suitable application is zero-time sorter.