Main Goal of HPC: Saving wall clock time

Heterogeneous Computer Systems

  • Also known as Accelerated Computing
  • Utilize more than two categories of processors for computing
    • Designed to enhance performance and energy efficiency
  • Specialized hardware called Accelerators: - Accelerate specific tasks faster than general-purpose CPUs - Typically consist of thousands of simple processors - Examples include GPUs, FPGAs, Google’s TPU, NPUs, etc.

General Architecture

  1. Copy input data from CPU memory to GPU memory.
  2. Load and execute GPU code.
  3. Copy the result from GPU memory back to CPU memory.

Supercomputers

  • No universal definition
  • Generally, systems ranked in the Top500 list - Top500: A list of the 500 fastest computer systems in the world (excluding distributed systems) - Benchmark: High-Performance LINPACK (HPL)

Cluster vs. Mainframe

Cluster

  • Multiple computers connected via a high-speed network (e.g., Ethernet, Infiniband)
  • Each computer is called a node
  • Most supercomputers use this architecture

Mainframe

  • A high-speed computer with large memory and processing capacity
  • Capable of processing billions of transactions in real time
  • Used for commercial databases and transaction servers - Offers resilience, security, and agility

Parallel Processing vs. Distributed Processing

Parallel Processing

  • Utilizes multiple connected processors simultaneously

Distributed Processing

  • Uses geographically distributed computer systems connected via a network
  • Examples include cloud computing, edge computing, and SaaS

Wall Clock Time

Speedup

  • R: Ratio of tasks that cannot be parallelized
  • P: Number of processors
  • Amdahl’s law
  • Program: Increasing the parallel portion of tasks is important.
  • Hardware: Increasing the performance of a single processor is important (to handle serial tasks).

Efficiency

  • If Efficiency = 50%, the processor is idle for half of the execution time

Ideally:

  • Speedup = P
  • Efficiency = 100%

Hardware Factors Affecting Supercomputer Performance

Computing speed

  1. Parallelism
    • Number of processors used
  2. Performance of a single processor (clock frequency)
    • How many instructions can be processed per second

Data transfer speed

  1. Between memory and processor
    • Memory bandwidth & latency
    • e.g. RAM > CPU
  2. Between computing units
    • Interconnection network bandwidth & latency
    • e.g. CPU > GPU
  3. Between storage and computing units
    • I/O bandwidth & latency
    • e.g. NAND > CPU