SHPC 04 - Processes and Threads

Programming Languages

A programming language is a formal language in which computer programs are written.
The level of abstraction from the details of the underlying computer in a high-level programming language is higher than a low-level programming language
- Abstraction은 컴퓨터공학의 기본원리
The definition of a particular programming language consists of both syntax (how the various symbols of the language are combined) and semantics (the meaning of the language constructs)
- The syntax and semantics of a programming language are typically defined in its specification

High-level programming languages (Abstraction level: high)

Python, Java, C, C++, FORTRAN, Scheme, ML, etc.
Close to natural languages
More understandable than a low-level language
Makes the process of developing programs simpler and easier

Low-level programming languages (Abstraction level: low)

Assembly language
Represents machine instructions symbolically
- There exists an assembly instruction that corresponds to a machine instruction, but not vice versa
A program called an assembler is used to translate assembly language instructions into the target computer’s machine code instructions

Compilers and Compilation Process

A compiler is a program that automatically translates another program from some programming language to machine code

Typical Compilation Phases

Preprocessing : Marcro 처리 (#~)
Compilation : object code/machine instruction/기계어로 변환
Linker : Library 링킹 (include ~). 라이브러리 함수를 복사&붙여넣기 한 뒤, object code에서 비어있는 라이브러리 함수의 주소를 채움

Interpreters and Interpretation Process

인터프리터 : 컴파일과 실행을 동시에 하는 프로그램으로, Parser와 Evaluator를 내장하고 있다.
- Parser: 소스 코드를 특정한 중간표현으로 변환한 다음, 프로그램의 structural parse를 생성한다
  - Structural parse : 프로그램의 구조적 표현으로, 일반적으로 프로그램의 문법에 따라 트리 형태(구문 트리(Syntax Tree))로 표현한다.
    - 구문 트리에서 노드는 프로그램의 요소(예: 변수, 연산자, 함수 호출 등)를 나타내고, 트리의 구조는 프로그램의 계층적 구성(예: 제어 흐름, 중첩된 표현식 등)을 나타낸다.
  - 파서는 인터프리터 언어가 아니더라도 의미론적 검증과 구문 분석을 위해 모든 언어에 필수적인 요소이다.
- Evaluator: 중간 표현을 한 줄씩 읽으며 기계어로 변환하면서 실행한다.
  - 실행 중 function call을 만나면 symbol table 혹은 네임스페이스를 확인한다.
    - 파서가 모든 함수와 변수에 대한 정보를 심볼 테이블(Symbo Table) 또는 네임스페이스에 저장해 둔다.
예시) Python, Java bytecode interpreter
- 수업에는 나오지 않았지만, 사실상 파이썬이나 JVM의 경우에는 인터프리터 내부에 컴파일러가 있는 하이브리드 언어이다. 예를 들어 파이썬의 경우, 1) 파서를 통한 구문 분석과 의미론적 검증, 구문 트리 생성 2) 컴파일러가 구문 트리를 받아 바이트코드로 변환 3) 바이트코드를 evaluator가 한줄씩 읽으면서 동시에 기계어로 변환해 실행 의 과정을 통해 실행되는 하이브리드형 언어이다.
- 하이브리드 언어는 개발의 용이성을 유지하면서 실행 속도를 빠르게 하기 위해 사용된다. 구문트리를 실행하는 것보다 바이트코드를 실행하는 것이 빠르기 때문에 속도가 더 빨라진다.

Compilation vs Interpretation

일반적으로 인터프리터보다 컴파일러가 더 빠르다.

Functions

Similar to mathematical functions but much less strict
- A function may return different values each time it is called with the same argument values and may have side effects
Side effect – modification of the state of the system
- Mathematical function과 다른 점에 해당
- Modifying the contents of memory locations, writing data to a display or file, reading some data from other side-effecting functions, etc.

Program Translation Scheme

Simple scheme → Better scheme(using linker)

과거에는 라이브러리까지 전부 넣어서 모놀리틱하게 프로그램을 작성했다
Problems
- Efficiency - small change requires complete recompilation
- Modularity - hard to share common functions (e.g., printf)
Solution: Linker 도입

Running the Executable Program

프로그램이 동작하는 과정

The OS loader loads the application program into the main memory and initialize execution environment
Control is transferred to the starting point of the program machine code
- Execution starts
After finishing the execution, control returns to the OS
- 커널 코드의 instruction 주소가 PC에 담기는 것
- Control transfer == PC가 바뀌는 것
  - if문을 예로 생각해 봤을 때, jump가 일어나는 것도 control transfer

참고: PC란?

PC(Program counter) : 시작 instruction 주소

Control unit은 이 pc를 fetch하는것부터 시작

Clock에 따라 하나의 사이클당 아래를 하나씩 수행

fetch → decode → execution → memory → write back

fetch가 끝나면 pc를 +4 증가시켜 다시 fetch

Processor Registers

A processor register is a quickly accessible location built directly into the processor or CPU
- Registers usually consist of a small amount of fast storage (memory)
- They are used to store and manipulate data during the execution of instructions
- A register may hold an instruction, a memory address, or any kind of data like a bit sequence or individual characters
Registers are typically addressed by mechanisms other than that of the main memory
- add r1,r2,r3
Modern processors most often have 32-bit or 64-bit registers
- Referred to as the 32-bit processors and 64-bit processors
- The size or width of the processor’s registers and the amount of data the processor can handle in a single operation

CISC vs RISC

CISC

Complex instruction set computer
- low-level operation 여러개로 실행 가능한 명령어들의 집합
- E.g., a load from memory, an arithmetic operation, and a memory store
하나의 instruction이 여러 단계의 작은 Operation으로 나눠질 수 있으며, 메모리 지정 방식(addressing)이 다양하다
- 컴파일러가 optimal sequence를 찾기 어려운 원인이 됨
- E.g.,
- A=B+C // Variable: named memory location
- MOV A, B // Copy the contents of B in A
- ADD A, C // Add the contents of A and C, and store the result in A
Intel x86, VAX11, and IBM System 360
RISC 가 아닌 것들을 모두 포함하는 umbrella term이다

RISC

Reduced Instruction Set Computer
Instruction 길이가 통일되어 있으며 load와 store 명령어로 메모리 접근을 분리함
- Instruction sequence는 길어지지만 컴파일러와 하드웨어가 간단해지고, 예측 가능해진다.
- E.g., A=B+C // Variable: named memory location
- LDR R2,B //Load the contents of B in R2
- LDR R2,C //Load the contents of C in R3
- ADD R1,R2,R3 // Add the contents of R1 and R3, and store the result in R1
- STR R1, A // Store the contents of R1 in A
모든 instruction이 1 클락사이클만에 실행된다
ARM, DEC Alpha, IBM PowerPC

CISC vs RISC

Process

A process is an instance of a computer program that is being executed
- A stream of instructions being executed
- Abstraction used by the operating system
A process consists of:
- Registers
- Memory (code, data, stack, heap, etc.)
- I/O status (open file tables, etc.)
- Signal management information
Local variable은 stack에..~

main이 호출
stack frame할당 (local data, register data 등이 저장됨)
stack pointer는 main의 stack frame 끝
foo가 호출
stack frame 할당
stack pointer는 foo의 stack frame 끝
foo가 끝남
stack frame 해제, stack pointer도 내려감
마찬가지로 main이 끝날 때도

Stack을 사용하는 이유

재귀호출 때문
data 영역에 할당하면 z가 글로벌하게 하나뿐. 재귀를 했을 때 z가 덮어씌워짐

Supervisor Mode vs User Mode

Modern processors provides two different modes of execution
Supervisor (kernel) mode
- All instructions can be executed in supervisor mode
- (also known as protected mode, system mode, monitor mode, or privileged mode)
- For the operating system kernel
User mode
- The processor is allowed to execute only a subset of the instructions
- For all other software (including the remaining part of the operating system) than the kernel Example:
I/O instructions are privileged instructions
- An application needs to request an I/O service to the operating system to perform I/O operations
- system call을 통해 이러한 request를 하게 됨

System call

유저모드에서 돌아가는 프로그램이 OS에게 커널모드에서 실행 가능한 명령어를 실행하도록 요청하는 방법
- 일반적으로 trap(exception/fault)으로 구현된다
- A trap instruction invoked by the program triggers a trap, resulting in a switch to kernel mode
- 트랩을 처리하기 위해 커널이 컨트롤을 넘겨받아 커널의 핸들링 서비스가 실행되고, 이후 컨트롤이 프로그램에게 되돌아간다
  - 커널 : 부팅되는 순간에 부트로더에 의해 메모리에 로드되고 종료되는 순간까지 항상 메모리에 존재하는 OS의 프로그램 영역으로, supervisor mode로 실행된다.

Uniprogramming vs. Multiprogramming

Uniprogramming

Only one process at a time
- DOS(Disk Operating System)
Poor resource utilization

Multiprogramming

Multiple processes at a time
- Modern operating systems, such as Windows, Unix, Linux, etc.
Increases resource utilization
타임쉐어링 등을 통해 여러개가 동시에 돌아가는 것처럼 보이게 할 수도 있고, 실제로 CPU가 여러개일 수 있음

Virtual Memory

각각의 프로세스에 대해 실제 물리적 메모리(DRAM)보다 더 큰 용량의 메모리를 사용하는 것처럼 보이도록 하는 abstraction
Logical address vs Physical address
Logical address
- 할당받은 가상 주소
Physical address
- 물리적 메모리 내의 주소
Logical address를 런타임에 OS가 물리 메모리의 physical address로 매핑해 준다
- MMU(memory management unit)라는 하드웨어가 담당
- 매핑 정보는 물리 메모리의 page table에 담김
- Demand paging: 필요할 때만 메인 메모리에 페이지를 올리는 방식(swap in & out)
- 실제로는 logical address space가 더 크기 때문에, 실제 메모리에는 일부만 올리게 된다 (swap in & out)
- 단점: Demand paging으로 인해 page fault가 발생할 경우에는 약간의 overhead가 발생한다.
  - VM 설정을 하지 않도록 설정을 끌 수도 있음
  - 이 기능을 끈다는건 페이징(일부만 메모리에 올리는 기능)이 안된다는 것 뿐, 메모리 매핑은 그대로 한다. VM을 아예 없애는 것과는 다름. 보통 메인 메모리에 프로그램이 사용하는 데이터가 충분히 올라갈 때 사용한다.

Address Space of a Process

Process별로 private한 address space가 존재한다
- 프로세스는 다른 프로세스의 상태에 영향을 미칠 수 없다
- Memory protection의 기능을 수행하는 것
Kernel address space와 user address space로 분리되어 시스템 콜이 발생하면 커널 어드레스 스페이스의 코드가 실행된다
- User address space에는 스택과 힙, 데이터(uninitialized, initialized), 코드 등이 있다
- Shared library
  - 예를 들어 동일한 라이브러리를 사용하는 프로그램이 여러 개 있다고 한다면 물리적 메모리에 같은 라이브러리를 올려두고 매핑을 해당 주소로 해서 메모리를 아낄 수 있다.

Communication between processes

두 개의 방법이 존재

공동으로 참조하는 메인 메모리의 영역이 있으면 해당 영역에 대한 읽기/쓰기를 통해 프로세스 간 통신이 가능
Interprocess communication(IPC)
- OS에서 지정해 둔 통신 방법
- Shared address space를 사용하는 것이 아님
- 비싸다

Concurrency

컴퓨터 시스템에는 여러 개의 프로세스들이 돌아가고 있다
- 커널 프로세스는 시스템 코드를 돌리고 있을 것이고
- 유저 프로세스는 유저 코드를 돌리고 있을 것
이러한 프로세스들이 ‘concurrent’하게 돌아가기 때문에 동시에 돌아간다고 착각하게 됨
- 한 프로세스의 명령어와 다른 프로세스의 명령어들이 interleaving되는 것
- Context switching으로 구현됨
  - 프로세스를 내리고 올릴 때 context를 save 및 recover해주는 것
  - Context: 프로그램을 실행할 때 필요한 정보들
    - e.g. 레지스터 내의 내용
  - OS에 의해 software적으로 이뤄지므로 overhead 존재
- Context switching 간격이 매우 짧아 동시에 돌아가는 것처럼 보임

Context switch

CPU는 여러 프로세스를 넘나들며 실행
- OS의 CPU scheduler가 이를 담당함
이 과정에서 각 프로세스의 context를 save 및 restore해주는 것을 context switch라고 함
- Context: 프로그램을 실행할 때 필요한 정보들
- e.g. 레지스터 내의 내용
이를 위해 OS는 현재 프로세스(p)에서 다른 프로세스(q)로 control을 넘기기 전에 p의 context를 save하고 q의 context를 restore해야 한다
- Control이 넘어간다 == PC가 변경됨

Process State Transitions

프로그램은 다음과 같이 state를 가질 수 있음
- Running: CPU를 사용 중
- Ready: CPU를 기다리는 중
- Blocked: Event를 기다리고 있음(e.g. I/O)
OS가 이 전환을 담당함
- 예를 들어 I/O를 호출했다면 해당 프로세스는 Blocked로 바꾸고, I/O 처리할 수 있는 커널코드를 running으로 변경, 끝나면 다시 Block 된 프로세스를 Running 시킴

Preemptive vs Cooperative

Preemptive multitasking

대부분의 시스템이 사용한다
OS가 프로세스별로 일정 시간을 부여하여 해당 시간이 끝나면 CPU를 내놓도록 뺏는 것

Cooperative multitasking

거의 사용되지 않는다
Task가 리소스를 다 사용하면 CPU 등의 리소스를 내놓도록 직접적으로 프로그래밍 되어 있는 것
CPU를 잡으면 별로 놓고싶어하지 않기 때문

Threads

Thread of control
- Fetch/Decode/Execute loop 하나하나라고 생각하면 된다.
- 스케쥴링이 가능한 가장 작은 단위
  - OS를 위한 abstraction
- 구성
  - 코드
  - 레지스터
  - 스택
  - thread-local data
- User level thread vs Kernel level thread
보통 process 내에 thread가 존재
- 하나의 프로세스 내에 여러 쓰레드가 존재
- stack을 제외한 code, data, OS 자원(open files, signals …)을 공유
프로세스가 할당받은 virtual address space의 일부를 공유 (스택을 제외하고 모두 공유)
프로세스보다 컨텍스트 스위칭 비용이 낮음

Thread 간의 통신

프로세스의 memory 중 쓰레드끼리 공유하는 데이터가 있어서 그곳을 통해 통신함

Thread Libary

유저 레벨 쓰레드를 쉽게 사용하기 위한 라이브러리
- 유저레벨 라이브러리는 커널과 관계없는 쓰레드를 생성
  - 유저레벨에서만 이용하며, 커널은 쓰레드의 존재를 모름
- 커널레벨 라이브러리는 OS가 직접 관리하는 쓰레드와 관련됨
  - 커널이 쓰레드 간의 전환이나 쓰레드 생성을 관리함
  - 운영체제가 스레드 관리와 관련된 데이터 구조(스레드 제어 블록, TCB 등)를 커널 영역에 유지하고 관리
  - 스레드를 생성하거나 관리하기 위해 호출된 API 함수들이 대부분 시스템 콜을 발생시킴
POSIX threads(Pthreads)가 대표적 (커널레벨 라이브러리)

Linux Schedulers

Completely Fair Scheduler (CFS) 를 사용중
- 쓰레드와 프로세스의 구분이 없다
  - 리눅스에서는 쓰레드가 가장 작은 기본단위기에 (쓰레드가 하나도 없는) 프로세스이든 쓰레드든 다 같이 스케쥴링한다
- 프로세서 타임(CPU time)을 공정하게 분배한다
- Run-queue를 각 프로세서가 유지
  - Run-queue: Ready 상태의 프로세스를 담고 있다
- Nice values
  - CFS에서 이용하는 프로세스 간의 상대적 weight를 부르는 말
  - Lower nice value가 higher weight, 즉 higher priority를 의미

Time Slice

CFS에서 사용하는 단위
프로세스가 분배받은 CPU time을 의미
프로세스의 nice value에 따라 분배하며, nice value가 작을수록 더 많이 분배
- 즉 weight에 비례해 분배

Virtual Runtime

특정 프로세스에게 제공된 CPU time을 의미
- 그러나 실제 runtime이 아니라 nice value를 반영해서 계산한 가상 시간
짧을수록 processor를 더 많이 필요로 한다고 해석함
Nice value가 적을수록 실제로 사용한 것보다 덜 사용한 것처럼 계산함
- 즉 weight에 반비례하게 계산함

Run Queue

프로세서별로 레드블랙 트리를 이용해 Ready 프로세스들을 관리함
- 레드블랙 트리
  - 각 노드의 루트까지의 길이가 2배 이상 차이나지 않음
  - Balanced되어 있기에 O(logn)만에 조작이 가능함
Virtual runtime으로 노드를 ordering
- 가장 작은 것이 leftmost node가 되게끔 한다
- (가장 높은 것은 rightmost)
스케쥴러는 CPU를 줄 때 leftmost node를 골라서 준다

CFS Algorithm

각 스케쥴링 틱마다 스케쥴링을 수행
다음과 같은 일을 틱마다 수행함
- Running 프로세스의 time slice를 1 감소시킴
  - 0에 도달하면 flag를 설정
- Running 프로세스의 virtual runtime을 계산함
- Flag를 확인함
  - Flag가 설정되어 있으면 CPU를 preempt하여 run-queue에 집어넣음
  - 새로운 Left-most node 프로세스에게 CPU를 넘김

SMP Scheduling

Symmetric Multiprocessing
여러 프로세서가 존재할 때 사용하는 스케쥴링 방식
우선 CFS는 프로세서 하나당 하나씩 돌아가며 스케쥴링
Run-queue 로드밸런싱을 통해 프로세서 별로 프로세스를 분배함 (이 과정을 SMP scheduling이라 부름)

Scheduling Domain

로드밸런싱이 이뤄지는 단위 (그룹)
- 작업 부하가 커널에 의해 균형을 유지해야 하는 프로세서들의 집합
- 스케줄링 정책을 공유하고, 어떻게 밸런싱을 할지에 대한 정책도 공유
계층적으로 조직
- 최상위 스케줄링 도메인은 시스템 내 모든 프로세서들의 집합을 포함함
- 하위 레벨에서 상위 레벨로 이동하면서 로드밸런싱을 수행
  - 하위 레벨에서 코어 간 작업을 분배했다면, 상위 레벨에서 CPU 간 작업 분배하는 식

Run-queue Balancing

리밸런싱 틱마다 로드밸런싱을 수행
- Push migration
로우레벨에서 하이레벨로 계층을 올라가며 스케쥴링 도메인이 unbalanced되어있나 확인함
- 해당 도메인에서 가장 바쁜 run-queue를 확인
- 해당 런 큐에서 다른 런 큐로 프로세스를 마이그레이션함

Push migration

OS가 틱마다 프로세스를 옮겨주는 것

Pull migration

프로세서의 런 큐가 완전히 빈 경우, 자진해서 프로세스를 끌어오는 것
리눅스는 두 개의 방식을 모두 구현하고 있음
- Linux runs its load balancing algorithm every 200 milliseconds (push migration) or whenever the run-queue for a processor is empty (pull migration)

Negative Aspect of Process Migration

캐시 문제!
- 각각의 프로세서에 private cache가 붙어있는데, 이것이 cold 상태임
- 마이그레이션된 프로세스가 사용하는 데이터를 다시 cache에 업로드해야 하는 단점

ಣ ᴄᴏᴍᴘᴜᴛᴇʀ ᴄɪᴛʏ ഒ

Explorer

SHPC 04 - Processes and Threads

Programming Languages

High-level programming languages (Abstraction level: high)

Low-level programming languages (Abstraction level: low)

Compilers and Compilation Process

Typical Compilation Phases

Interpreters and Interpretation Process

Compilation vs Interpretation

Functions

Program Translation Scheme

Simple scheme → Better scheme(using linker)

Running the Executable Program

Processor Registers

CISC vs RISC

CISC

RISC

CISC vs RISC

Process

Stack을 사용하는 이유

Supervisor Mode vs User Mode

System call

Uniprogramming vs. Multiprogramming

Uniprogramming

Multiprogramming

Virtual Memory

Address Space of a Process

Communication between processes

Concurrency

Context switch

Process State Transitions

Preemptive vs Cooperative

Preemptive multitasking

Cooperative multitasking

Threads

Thread 간의 통신

Thread Libary

Linux Schedulers

Time Slice

Virtual Runtime

Run Queue

CFS Algorithm

SMP Scheduling

Scheduling Domain

Run-queue Balancing

Push migration

Pull migration

Negative Aspect of Process Migration

Graph View

Table of Contents

Backlinks