ಣ ᴄᴏᴍᴘᴜᴛᴇʀ ᴄɪᴛʏ ഒ

Search

❯

Computer Science

❯

(24 2) Scalable High Performance Computing

❯

SHPC 01 - Trends

SHPC 01 - Trends

Dec 31, 20242 min read

Main Goal of HPC: Saving wall clock time

Heterogeneous Computer Systems

Also known as Accelerated Computing
Utilize more than two categories of processors for computing
- Designed to enhance performance and energy efficiency
Specialized hardware called Accelerators: - Accelerate specific tasks faster than general-purpose CPUs - Typically consist of thousands of simple processors - Examples include GPUs, FPGAs, Google’s TPU, NPUs, etc.

General Architecture

Copy input data from CPU memory to GPU memory.
Load and execute GPU code.
Copy the result from GPU memory back to CPU memory.

Supercomputers

No universal definition
Generally, systems ranked in the Top500 list - Top500: A list of the 500 fastest computer systems in the world (excluding distributed systems) - Benchmark: High-Performance LINPACK (HPL)

Cluster vs. Mainframe

Cluster

Multiple computers connected via a high-speed network (e.g., Ethernet, Infiniband)
Each computer is called a node
Most supercomputers use this architecture

Mainframe

A high-speed computer with large memory and processing capacity
Capable of processing billions of transactions in real time
Used for commercial databases and transaction servers - Offers resilience, security, and agility

Parallel Processing vs. Distributed Processing

Parallel Processing

Utilizes multiple connected processors simultaneously

Distributed Processing

Uses geographically distributed computer systems connected via a network
Examples include cloud computing, edge computing, and SaaS

Wall Clock Time

Speedup

R: Ratio of tasks that cannot be parallelized
P: Number of processors
Amdahl’s law $Sp ee d u p = \frac{1}{R + \frac{1 - R}{P}}$
Program: Increasing the parallel portion of tasks is important.
Hardware: Increasing the performance of a single processor is important (to handle serial tasks).

Efficiency

$E ff i c i e n cy = \frac{Sp ee d u p}{P} \times 100%$

If Efficiency = 50%, the processor is idle for half of the execution time

Ideally:

Speedup = P
Efficiency = 100%

Hardware Factors Affecting Supercomputer Performance

Computing speed

Parallelism
- Number of processors used
Performance of a single processor (clock frequency)
- How many instructions can be processed per second

Data transfer speed

Between memory and processor
- Memory bandwidth & latency
- e.g. RAM ←> CPU
Between computing units
- Interconnection network bandwidth & latency
- e.g. CPU ←> GPU
Between storage and computing units
- I/O bandwidth & latency
- e.g. NAND ←> CPU

Graph View

Heterogeneous Computer Systems
General Architecture
Supercomputers
Cluster vs. Mainframe
Cluster
Mainframe
Parallel Processing vs. Distributed Processing
Parallel Processing
Distributed Processing
Wall Clock Time
Speedup
Efficiency
Hardware Factors Affecting Supercomputer Performance
Computing speed
Data transfer speed

Backlinks

Scalable High Performance Computing

Created with Quartz v4.2.3 © 2024

GitHub
Twitter(X)
Medium