CS/클라우드컴퓨팅

12, 13. Virtualization

호프 2023. 12. 7. 00:21

OS: Operating System

What is OS

 

Operating System (OS)

  • A piece of SW that manages and virtualizes hardware for applications
    • indirection layer btw applications and hardware
    • provide high-level interface to applications, while interact with hardware devices with low-level interfaces
    • runs privileged instruction(특권 명령) to interact with HW devices
  • Applications can only execute unprivileged instructions
    • Perform system calls or faults to "trap" into OS
  • A giant interrupt handler (HW interrupts, SW interrupts, system calls)
  • manages shared resources
  • OS needs to be privileged

 

Dual-Mode Operation

Dual-Mode Operation of HW

  • Kernel mode: can run privileged instructions
  • User mode: can only run non-privileged instructions

 

Interrupt

Interrupt

  • A mechanism for coordination btw concurrently operating units of a computer system to respond to specific condition within a computer
  • Results in transer of flow of control, forced by HW
  • Hardware Interrupt: I/O devices, Timer
  • Software Interrupt: Exception, System call

하드웨어 interrupt는 주로 하드웨어적인 이벤트나 조건으로 인해 발생하며, 소프트웨어 interrupt는 프로그램이나 운영 체제가 특정한 요청을 위해 명시적으로 발생시키는 것

 

Handling Interrupts

  • Incoming interupts are disabled while the interrupt is being processed to prevent a lost interrupt
  • Interrupt architecture must save the address of the interrupted instruction
  • Interrupt service routine control the process
    • interrupt vector: contains the addresses of all the service routine
  • If interrupt routine modifies process state
    • save the current state of the CPU on the system stack
    • restore the state before returning

Virtualization

Virtualization

Virtualization

  • Adding another level of indirection to run OS on an abstraction of Hardware
  • Virtual Machine (Guest OS = VM)
    • OS that runs on virtualized hardware resources
    • Managed by another software (VMM/Hypervisor)
  • Virtual Machine Monitor (VMM = Hypervisor)
    • Software that creates and manages the execution of virtual machines
    • Runs on bare-metal hardware: 실제 하드웨어 위에서 실행

 

Virtualization in Cloud Computing

  • You don't need to own the hardware
    • Resources are rented from a cloud
  • Various providers allow creating virtual servers
    • Choose the OS and software each instance will have
    • Chosen OS will run on a large server farm
    • initiate or shut down within minutes
  • pay only for what you used

 

Benefits of Virtualization

Server Consolidation

  • Big companies such as datacenters operate many services
    • administrative best practice: run at most one service per machine
    • leads to low utilization, lots of machines, high power bills ..
  • Instead, run one service per virtual machine and consolidate many VMs per physical machines
    • leads to better utilization, easier management

Other Benefits

  • Isolation
    • fault / performance / software isolation
  • Encapsulation
    • operate on VM by operating on file
    • can use preconfigured VM images
    • VM snapshots, clones
  • Portability
    • independent of physical hardware
    • enables migration of live, running VMs
    • VMs can be migrated without application down-time
  • Interposition
    • all guest actions go though monitor(VMM)
    • monitor can inspect, modify, deny operations

 

How Virtualization works

How Virtualization works

  • OS normally run in privileged mode vs. VM OS run in user mode
  • Most instructions can execute by hardware without hypervisor intervening
  • Resource management handled by hypervisor
  • In VM privileged instructions are "trapped" by the hypervisor and emulated
    • 👉 Trap-and-Emulate

Protection Rings

  • Only Ring 0 can execute privileged instructions
  • more privileged rings can access memory of less privileged ones (Ring 0 can access Ring 1 memory)
  • calling across rings can only happen with hardware enforcement

 

Trap-and-Emulate

  • guest OS cannot directly manipulate hardware with sensitive instructions
    • hand off sensitive operations to the hypervisor
    • hypervisor emulates the effect of those operations
  • Performance implications
    • non-sensitive instruction: almost no overhead
    • sensitive instruction: large overhead

sensitive instruction: those can modify or depends on hardware configs

privileged instruction: those that trap when in user mode (run on kernel mode)

 

System Call with Virtualization

System call without Virtualization

System call with Virtualization

 

What if trap-and-emulate is not acceptable?

Requirements of Virtualization

  • A machine is virtualizable if sensitive instructions are subset of privileged instructions

x86 is Not Virtualizable

  • Not all sensitive instructions are privileged
    • Some sensitive instructions don't trap (non-privileged) but behave diffrently in kernel and user mode
    • Therea re 17 sensitive, non-privileged insructions
  • If a deprivileged guest kernel attempts to run those instructions, no trap is generated and the VMM has no way of knowing it shouldn't deliver interrupts to the guest

민감한 명령어 중 일부는 특권 명령으로 간주되지 않고, 특권이 없어도 실행될 수 있습니다. 이런 경우에는 그 명령어가 trap(가로채기)되지 않고, 게스트 운영 체제가 그 명령어를 실행할 때 하이퍼바이저가 알지 못합니다.

non-privileged한 민감한 명령어를 실행하는 경우에, 게스트 운영 체제가 특권이 없는 상태에서도 해당 명령어를 실행할 수 있지만, 이로 인해 예상치 못한 결과가 발생할 수 있습니다.

Solutions to x86

Emulate

  • Interpret each instruction, super slow (e.g. Virtual PC on Mac)
  • Guest code is traversed, instruction classes are mapped to routines that emulate them on the target architecture

Binary translation

  • Hypervisor dynamically rewrite non-virtualizable instructions to invoke hypervisor (e.g. VMware)
  • Pros
    • No need to modify guest OS
    • 성능 어느정도 보장 -> since majority of the instructions still run at close-to native speed
  • Cons
    • Implementing hypervisor can get tricky
    • Performance is not as good as para-virtualization or hardware-assisted virtualization

Para-virtualization

  • Modify guestOS to avoid non-virtualizable instructions (e.g. Xen)
  • GuestOS works with hypervisor and has some exposure to hardware
  • Better performance, but need to modify guest OS

Full Virtualization: no guestOS modification

  • tricky and has performance overhead

 

Hardware-assisted Virtualization

  • Add new hardware assistance (e.g. Intel VT-x)
  • VT(Virtualization Technology) extends the original x86 architecture to eliminate holes that make virtualization hard
  • Operating modes
    • VMX root: fully priviliged, intended for VMM(Hypervisor) (VM Exit Mode)
    • VMX non-root: not fully priviliged, intended for guest software (VM Entry Mode)

VMX: Virtual Machine Extension = intel의 x86 가상화기술

 

Intel VT-x

  • VM Entry (VMM -> guest)
    • Enters VMX non-root operation, loads guest state and exit criteria from VMCs
    • VMLAUNCH: used on initial entry
    • VMRESUME: used on subsequent entries
  • VM Exit (guest -> VMM)
    • Enters VMX root operation
    • Saves guest state in VMCS and loads VMM state from VMCS

VMCS: Virtual Machine Control Structure

 

Benefits: VT helps improve VMMs

  • VM reduces guest OS dependency
    • eliminates need for binary patching/translation
    • facilitates support for legacy OS
  • VT improves robustness
    • eliminates need for complex SW techniques
    • simpler and smaller VMMS & trusted-computing base
  • VT improves performance
    • 불필요한 guest, VMM trasition 줄임

Hypervisor

VMM (Hypervisor)

Virtual Machine Monitor (VMM = Hypervisor)

  • Software that supports VM
  • VMM determines how to map virtual resources to physical resources
  • Phsycal resource may be time-shared, partitioned, or emulated in software
  • VMM is much smaller than a traditional OS

VMM Overhead

  • VMM overhead depends on the workload
  • User-level processor-bound programs: zero-virtualization overhead
    • runs at native speeds since OS rare invoked (e.x. SPEC benchmarks: arithmetic operations)
  • I/O-intensive(OS-intensive) workloads: high virtualization overhead
    • execute many system calls and privileged instructions
    • low processor utilization since waiting for I/O
      • Process virtualization can be hidden, so low virtualization overhead

 

Hypervisor Types

 

Type 1: Hypervisor runs directly on hardware

  • guest OS traps on privileged instructions
  • provides its own device drivers and services
  • ex. Xen, Microsoft Hyper-V

Type 2: Hypervisor runs on OS

  • type 2 hypervisor does binary translation
  • leverages device drivers and services of a host OS
  • ex. VMware Workstation, KVM, QEMU, Parawllels

 

Xen

Xen

  • Para-virtualization
    • required less than 2% of the total lines of code to be modified
    • Pros: better performance on x86, simplifications in VM implementation
    • Cons: must modify the guest OS (but not its applications)
  • Reduce the privilege of the OS
    • hypervisor runs with full privilege (ring 0), OS runs in righ 1, applications in ring 3
    • Xen must intercept interrupts and convert them to events posted to shared region with OS
    • Expose real resource availability to enable OS to optimize behavior (guestOS에게 실제 HW resource에 대한 정보 제공)

Domain 0

  • Run the VMM management at user level
  • Given special access to control interface for platform management
  • Has back-end device drivers

Domain 0: Xen 하이퍼바이저 위에서 실행되는 특정한 가상 머신으로, 주로 다른 가상 머신들을 관리하고 제어하는데 사용됩니다.

Ring 0: x86 아키텍처에서 CPU의 특권 레벨 중 하나로, 커널 모드라고도 합니다. 이 레벨에서는 커널이 실행되고, 모든 특권적인 명령어와 하드웨어 리소스에 접근할 수 있습니다.

 

Hypercalls and Events

  • Hypercalls
    • Synchronous calls from a domain to Xen
    • Allows domains to perform privileged operation via software traps
    • Similar to system call
  • Events
    • Asynchronous notifications from Xen to domains
    • Replace device interrupts

More about Xen

  • Performance overhead of only 2-5%
  • Most popular example of para-virtualization
  • Available as open source but owned by Citrix
    • modified version of Xen powers Amazon EC2
    • Widely used by web hosting companies
  • Many security benefits
    • physical resource sharing with performance isolation across OS instances
    • hypervisor can isolate/contain OS security vulnerabilities
    • hypervisor has smaller attack surface: simpler API, less overall code than traditional OS

 

KVM and QEMU

Kernel-Based Virtual Machine (KVM)

  • Full (hardware-assisted) virtualization
    • designed based on Intel VT (or AMD-V)
    • doesn't require binary translation & modification to guest OS
  • Very active in Linux Community, Easy to use, Fully control, Easy for migration
  • Almost default VM solution for GNU/Linux Distributions including Redhat, Debian, Ubuntu, etc.

QEMU: Device Emulatio

  • QEMU is userspace process (user-level에서 동작)
  • Create one thread for each virtual CPU in guest
  • Basic IO devices are also emulated by QEMU
    • dynamically traslate guest instructions to host instructions
  • Device access from guest is trapped by KVM -> KVM passes control to QEMU to handle IO -> QEMU injects interrupts from devices through KVM

 

Virtual Machine in AWS

  • Amazon EC2 used to be based on Xen
  • AWS Nitro: hypervisor based on KVM
    • Virtualizatio method of current EC2 generation
    • Near-metal performance
  • AWS also offers actual "bare-metal" machines: with no virtualization, access directly to HW
  • Different VM types are represented by AMIs
    • PV: para-virtual (Linux only)
    • HVM: hardware virtual machine (Linux and Windows)