Main Goal of HPC: Saving wall clock time
Heterogeneous Computer Systems
- Also known as Accelerated Computing
- Utilize more than two categories of processors for computing
- Designed to enhance performance and energy efficiency
- Specialized hardware called Accelerators: - Accelerate specific tasks faster than general-purpose CPUs - Typically consist of thousands of simple processors - Examples include GPUs, FPGAs, Google’s TPU, NPUs, etc.
General Architecture
- Copy input data from CPU memory to GPU memory.
- Load and execute GPU code.
- Copy the result from GPU memory back to CPU memory.
Supercomputers
- No universal definition
- Generally, systems ranked in the Top500 list - Top500: A list of the 500 fastest computer systems in the world (excluding distributed systems) - Benchmark: High-Performance LINPACK (HPL)
Cluster vs. Mainframe
Cluster
- Multiple computers connected via a high-speed network (e.g., Ethernet, Infiniband)
- Each computer is called a node
- Most supercomputers use this architecture
Mainframe
- A high-speed computer with large memory and processing capacity
- Capable of processing billions of transactions in real time
- Used for commercial databases and transaction servers - Offers resilience, security, and agility
Parallel Processing vs. Distributed Processing
Parallel Processing
- Utilizes multiple connected processors simultaneously
Distributed Processing
- Uses geographically distributed computer systems connected via a network
- Examples include cloud computing, edge computing, and SaaS
Wall Clock Time
Speedup
- R: Ratio of tasks that cannot be parallelized
- P: Number of processors
- Amdahl’s law
- Program: Increasing the parallel portion of tasks is important.
- Hardware: Increasing the performance of a single processor is important (to handle serial tasks).
Efficiency
- If Efficiency = 50%, the processor is idle for half of the execution time
Ideally:
- Speedup = P
- Efficiency = 100%
Hardware Factors Affecting Supercomputer Performance
Computing speed
- Parallelism
- Number of processors used
- Performance of a single processor (clock frequency)
- How many instructions can be processed per second
Data transfer speed
- Between memory and processor
- Memory bandwidth & latency
- e.g. RAM ←> CPU
- Between computing units
- Interconnection network bandwidth & latency
- e.g. CPU ←> GPU
- Between storage and computing units
- I/O bandwidth & latency
- e.g. NAND ←> CPU