Matt Chung's Site

Category: Advanced Operating Systems

How to build the libvrt documentation from source

If just download the libvert application development guide, click here.

How to build the documentation

libvrt broken documentation

The libvrt developer documentation link is broken (i.e. HTTP 404). But I need the development guide for my advanced OS course so I downloaded the repository and built the documentation from source. If you want to do the same (instead of downloading the PDF I rendered above) you can execute the following instructions:

[code lang=”bash”]
git clone https://libvirt.org/git/libvirt-appdev-guide.git
sudo apt install publican
publican build –langs=en-US –formats=pdf –brand_dir=./
[/code]

September 3, 2020
How to launch and ssh to a KVM instance
This posts assumes that you’ve already synced the OS images on to your virtual box. If you haven’t, check out this blog post.

While launching a new virtual machine for project1, I kept on running into variants of the following error:
```
vagrant@ubuntu-bionic:~$ sudo uvt-kvm ssh aos_vm1 
ubuntu@192.168.122.183: Permission denied (publickey).
```
To get this to work, here are the steps I took:
```
vagrant@ubuntu-bionic:~$ sudo uvt-kvm create aos_vm1 release=bionic --memory 256 --ssh-public-key-file /root/.ssh/id_rsa.pub
vagrant@ubuntu-bionic:~$ sudo uvt-kvm wait aos_vm1
vagrant@ubuntu-bionic:~$ sudo uvt-kvm ssh aos_vm1
```
September 2, 2020
A snapshot of my understanding before beginning project 1 (scheduler, memory coordinator)
Project 1 was released last evening at 08:59 PM PST and this morning, I decided to start on the project by reading through the overview and get the lay of the land. For this project, we’ll need to deliver to operating system components: a scheduler and a memory coordinator (not even sure what that means exactly).

So what I’m doing as part of this post is just taking a snapshot of the questions I have and topics I do not understand, topics that I’ll probably understand in much more depth as the project progresses. More often than not, I often dismissive of all the work I put in over the semester and this post is one way to honor the time and commitment.

Overall, this project’s difficulty sits in the right place — not too hard but not too easy. The sweet spot for Deliberate Practice.

Questions I have
- What algorithm should I implement for my scheduler?
- What algorithms fit the needs for this project
- What the heck is a memory coordinator?
- Why do we have a memory coordinator? What’s it purpose?
- How do you measure the success of a memory coordinator?
- How do I use libvrt library?
- What is QEMU?
- Where does the scheduler sit in relationship to the operating system?
- How will I get the hypervisor to invoke my scheduler versus another scheduler?
Project Requirements
- You need to implement two separate C programs, one for vCPU scheduler (vcpu_scheduler.c) and another for memory coordinator (memory_coordinator.c)
References
1. Introduction to QEMU
September 1, 2020
Advanced Operating Systems – OS Structure – Exokernel

Unlike the SPIN OS Structure, exokernel approaches extensibility by decoupling the authorization to hardware from its actual use, by using something called secure bindings. OS libraries will request secure bindings, the Exokernel returning an encrypted key that grants that particular service to specific resources.

One other way that the exokernel manages these underlying OS services is via CPU scheduling. The exokernel maintains a linear vector “time slots”, allotting each OS service a certain amount of time to run on the CPU.

Exokernel also introduces the concept of a STLB (software translation lookaside buffer), the STLB improving performance since, during each process context switch, the exokernel will copy the hardware TLB to a software TLB structure and when the process runs again, the exokernel will copy the software TLB back into the hardware, eliminating the need for a TLB flush.

Exokernel Approach to Extensibility

Summary

Library OS requests access to a specific hardware resource. If access granted, exokernel returns an encrypted key which will be used for future keys

Examples of Candidate Resources

Summary

An OS library will perform some action, sending the Exokernel its encrypted key, and once Exokernel accepts that request, the data processing is cheap. Like packet processing: OS library requests to install predicates for packet filtering; once predicates installed, exokernel will invoke those rules on behalf.

Implementing Secure Bindings

Summary

There are three mechanisms for implementing secure bindings. First is hardware mechanisms, like fetching the TLB. Second would be software, like caching the hardware TLB inside of the OS, avoiding the cost of flushing the TLB during a context switch. And third would be downloading code into the kernel, a feature analogous to SPIN’s approach with logical domains. I don’t really understand these concepts quite yet: it’s all a bit abstract, for now.

Default Core Services in Exokernel

Default Core Services in Exokernel

Summary

For memory management, the library OS will handle the page fault (uncalled through the register) by presenting a mapping (with the secure binding) to the Exokernel, the Exokernel installing the mapping in the TLB (hardware), this step requiring privileged access.

Secure Binding

Summary

As the instructor put it, this is all a bit dicey. Both SPIN and Exokernel allow library OS privilege for pure performance but how do we guarantee that the insertion of code into the kernel is done … securely? Safely? Will find out soon, probably over the next couple videos

Memory Management using Software TLB

Software TLB

Summary

In exokernel, during a context switch, the hardware TLB for a process is copied into a STLB (software TLB) and during a context switch, new process’s software TLB will be preloaded by the Exokernel. Of course, if there’s a TLB cache miss, then the standard page fault workflow happens

Default Core Services in Exokernel (continued)

Default Core Services continued

Summary

Three is a linear vector of “time slots”, each OS service reserving a slot for its service. If the OS service runs longer than the allotted quantum, the exokernel will punish it, reducing its time slot in the next run. Time is bounded to perform the saving of the context.

Revocation of Resources

Summary

Exokernel can revoke or reclaim permission from a Library OS, the OS sending an up call (and passing a repossession vector) to the Library OS, informing the library OS that it may need to stash away its resources (on disk).

Putting it all together

Summary

While a library OS’s thread is running, an external event will be kicked up to the running process to determine what to do next

Exokernel Data Structures

Exokernel Data Structures

Summary

Exokernel maintains a data structure for each of the running library OS, maintaining a software TLB that gets loaded during a context switch

Performance Results of Spin and Exokernel

Summary

Absolute numbers are meaningless: it’s the trends that count.

September 1, 2020
Advanced Operating Systems: The SPIN Approach

What did I learn? What are the main takeaways?

The concept of border crossing pops up over and over again. This is a new term I never heard of prior to this class. The term is almost synonymous to a context switch but it is subtly different in the sense that a context switch (switch from one process to another or one thread to another) can occur without a border crossing, without changing the underlying hardware address space.

SPIN attempts to enforce protection at the compiler level, by using a restrictive language called Modula-3. Unlike the C language, where you can cast a pointer to whatever data structure you like, Modula-3 enforces type safety, only allowing the developer to cast a pointer to specific types of data structures that they had already specified earlier in the code.

SPIN offers extensibility by allowing different types of OS services to co-exist with one another.

But what are the trade offs with SPIN, when compared with Microkernel and Exokernel?

It appears that SPIN would be more performant than Microkernel due to no border crossings while maintaining flexibilty (multiple types of OS services that cater to application processes) and security (via logical protection domains) with Modula-3, allowing code OS services library code to co-locate with kernel code.

Introduction

Customizing OS with SPIN

Summary

SPIN and Exokernel take two different paths to achieving extensibility. These designs overcome the issue of Microkernel, which compromises in performance due to border crossings, and monolithic, which does not lend itself to extensibility

What we are shooting for in OS Structure

Summary

We want flexibility similar to a microkernel based approach but also want protection and performance of monolithic. We want the best of both worlds: performance protection flexibility

Approaches to Extensibility

Damn, this is a super long video (8 minutes, compared to the other one to two minute videos prior)

Capability based

Hydra OS (1981)

Summary

Hydra did not fully achieve its goal of extensibility

Micro Kernel Based

Summary

Performance took a back seat, since the focus was on extensibility and portability. Bad press for micro kernel based due to the twin goals.

SPIN approach to extensibility

Summary

By co locating the kernel and extension in the same hardware space, the extensions are cheap as procedure call. Doing this by depending on a strongly typed language to provide safety

Logical Protection Domains

Summary

Using a programing language called Modula3, which doesn’t appear to be popular in practice, we can enforce protection at the logical level. This programming language, unlike C, restricts casting pointers to random data structures, only allowing the cast to a particular data type.

Spin mechanisms for protection domains

Summary

The secret sauce of protection and performance are the mechanisms of creating (i.e. expose entry points), resolving (i.e. leverage other entry points), and combining of logical protection domains

Customized OS with Spin

Another example of SPIN os customization

Summary

There can be multiple file systems (written in Modula3), each file system catering to their callers, and each file system using the same underlying hardware address space. And they can share modules with one another, like the networking entry point.

Example Extensions

Summary

Example of Unix Servers implementing their OS on SPIN as well as a video server / display client building on top of spin

Quiz: Border Crossings

Quiz: Least likely for border crossing

Summary

Microkernel and SPIN offer performance since they limit the border crossings. In SPIN, Logical domains do not require border crossings

SPIN Mechanisms for Events

SPIN classifies three types of event handling: one-to-one, one-to-many, many-to-one

Summary

To handle events (like packet arrival) we can have a 1:1 mapping, 1:N mapping or N:1 mapping. For 1:1, an example would be an ICMP packet arriving and the 1 ICMP handler running. In a 1:N mapping, the IP packet arrived event runs and signals three other event handlers like ICMP, UDP, or TCP. Then finally, there is a N:1, and an example of this is an Ethernet and ATM packet event arrives but both funnel into the IP handler

Default Core Services in SPIN

Summary

SPIN offers core services like memory management, CPU scheduling etc. And SPIN will provide a header file that OS programmers need to implement. Remember: these implementations talk to each other through well defined interfaces, offering protection, but are also performant cause there are no border crossings)

Default Core Services in SPIN (Continued)

Summary

Provides primitives, the interface function definition. The semantics are up to the extension itself. SPIN makes sure extensions get time on scheduler

August 31, 2020
Selecting research papers to read for Advanced Operating Systems
For advanced operating systems course, us students are required to sign up for (at least) two research papers that we summarize. Although two papers doesn’t sound like much, many former students of the course suggested ignore the other research papers and instead, focus on watching the lectures (which do cover the papers, sort of), banging out the software project, and (sadly) performing rote memorization based off of previous exams (I’ll need to get my Anki set up for this, stat).

Reviews on OMSCentral related to reading papers

I gave up on trying to read all the papers, electing to focus on projects and memorize the lectures instead. It worked out ok for me, and I averaged 18 hours a week on this course anyway. YMMV.

The downside to me is the number of readings was, well, high. Somewhere around 45 papers are to be read during the semester, and most of them are foundational but also from an earlier time in computing. I enjoy newer research papers with tech I can relate to, so for me in general these were less enjoyable and more difficult to digest. Still worthwhile, I think, but more of a slog.

The sheer number of papers assigned could easily push the time commitment past 30 hours a week if a student were to read them all, but as most have noted, you can largely skip reading them. Even skipping papers, the lectures are long and are difficult to get through. The content is broad and I think it tries to cover too much ground.

Papers I’m going to read

So although I’ll probably skim most of the research papers, I’ll dig into the below papers in depth. Why did I choose these papers? Well, some of the other papers (on networking) played into my strengths, categories that I’m somewhat well versed on. As a result, I chose papers grouped under categories that I not only find interesting (distributed systems and failures + recovery mechanism) but categories I’m unfamiliar with.
- Liu, Kreitz, van Renesse, Hickey, Hayden, Birman, Constable, “Building Reliable High Performance Communication Systems from Components “, 17th ACM Symposium on Operating System Principles, OS Review, Volume 33, Number 5, Dec. 1999.
- D. Porter, O. Hofmann, C. Rossbach, A. Benn, E. Witchel, “Operating System Transactions“, SOSP’09.
Stretch goal
- Lamport, L., “Time, Clocks, and the Ordering of Events in a Distributed System”, Communications of the ACM, 21, 7, pgs. 558-565, July 1978.
References
1. Student Review – https://omscentral.com/review/-M7Z0y3X-ORUdktU4bqT
2. Student Review – https://omscentral.com/review/-M74QupA7ND5_xq-cY2Z
3. Student Review – https://omscentral.com/review/-M69xss3dGXWiXXft9Ha
4. Sign up link
August 29, 2020
Thinking out loud – Not fully understanding virtually indexed physically tagged (VIPT) details
I’m really struggling to intimately understanding virtually index privately tagged concept. From a high level, I get that VIPT is an optimization technique, parallelizing both the translation of virtual addresses to physical and the cache look up of the physical address. This seems impossible at first since the cache depends on the physical frame number (PFN).

So in order to accomplish this, we’ll use the some bits of the VPN to search in the TLB and then the least significant bits when searching the cache. This makes sense. But what I don’t quite fully don’t understand is why we have to limit the size of the cache in order to avoid aliasing (i.e. different virtual addresses that end up mapping to the same physical frame).

Thinking out loud here … I suppose that if the index of the physical bits bleed into the virtual address, then we actually do not avoid the aliasing problem, the issue of having more than one virtual address map to the same physical frame.

Ok, I think I got it.

But what’s the catch?

You only can have a very small cache, the size equal to the page size multiplied by the set associativity. This means that the only way to increase the size of the cache is by increasing the associativity which ultimately increases the look up time as well since the cache has to look the tag up across multiple cache lines.

Reference
1. https://cs.stackexchange.com/questions/68492/does-the-aliasing-problem-show-up-in-a-virtually-indexed-physically-tagged-cache
2. https://cseweb.ucsd.edu/classes/fa10/cse240a/pdf/08/CSE240A-MBT-L18-VirtualMemory.ppt.pdf
August 28, 2020
OS Structure Overview

In the context of an operating system, what does structure even mean and why is it important? Structure determines how the operating system serves application in regards to the underlying hardware and how it balances the following qualities: protection, performance, flexibility, scalability, agility, and responsiveness.

To obtain the above qualities, several designs exist including monolithic and microkernel. In a monolithic OS, there’s a division between applications running in user space vs the operating system’s core services (e.g. file systems, memory, scheduling). Taking this division a step further is microkernel design, separating the core services from the kernel, the core services all running in user space. There are trade offs between each approach. Monolithic operating systems improve performance, reducing the number of system calls between user and privileged mode, a trade off that microkernel operating systems sacrifice. However, microkernel operating systems offer greater extensibilty, allowoing applications to select which OS core services cater to their needs.

Introduction

Summary

Will study structure of OS, whatever that means. To this end, we’ll learn about SPIN and Exokernel approaches, two terms I’m unfamiliar with, and then a micro kernel)

Quiz: OS System Services

Summary

I got most of the services, apart from inter process communication (IPC) and thread scheduling, I named the others including access to I/O devices, access to network, memory management (protection, sharing, demand paging)

OS Structure

Summary

Structure determines how OS serves applications and the underlying hardware

Goals of OS Structure

Summary

There are six goals of an OS: protection (users themselves and between users, and of course the OS), performance (the time taken for OS to carry out its service), Flexibility (not one size fits all and should be able to adapt to different applications), Scalability (essentially if hardware improves, so should the speed of the operating system), Agility (how quickly can the OS respond to changing environment), responsiveness (does OS react to events, liken I/O, quickly)

Commercial OS

Summary

Apparently I’ll learn that somehow, system designers can have their cake and eat it too. That is, they do can achieve both performance without really sacrificing protection and flexibility. How? Who knows, though I should find out soon

Monolithic Structure

Summary

Monolithic OS – division between applications and OS core services (e.g. file system, memory addressing, scheduling)

Why do we call it a monolithic structure? Because the big blob (see photo) contains all the services (e.g. file system, network access, memory management, scheduling) offered by the operating system

DOSlike Structure

Summary

At a glance, the DOS like structure seems indistinguishable from monolithic. But how is it different?
Quiz DOSlike Structure Pros and Cons

Summary

Seems like DOS allows applications to run in the same space as the OS, so no trapping is necessary: just a procedure call. Trade off here is lack of protection; the OS can get corrupted by a running application.

DOSlike Structure (continued)

Summary

DOSlike structure and its lack of protection is unacceptable for general purpose OS. On the other hand, monolitic structure reduces performance, while offering protection. But we need to also consider how we can customize and tailor needs of different applications — via customization

Opportunities for customization

Summary

Learned a little more about why customization might be important. I thought that page fault handling should be the same for every process but there’s a missed opportunity here. Since the OS sort of bets on processes not requiring the entire address space, this bet might be expensive if the page fault handler (along with the eviction algorithm) runs unnecessarily. SO perhaps OS can learn more about the process and inform the cache replacement policy

Microkernel based OS Structure

Summary

Microkernel OS – OS services run in user mode too, indistinguishable from user space apps.

Why the hell do we want to overcomplicate the OS design by introducing a microkernel based OS structure? What’s the gain here? Extensibility! Previously, application 1 and application 2 would access the file system or the memory manager or the scheduler. There’s lack of extensibility here. Also, each of those servers (e.g. file system, memory management, cpu scheduler) also run in user space and will still require some sort of IPC to talk to one another, requiring the servers to talk to the micro kernel via system calls. I’ll soon find out but I think this is expensive; that is, we are sacrificing performance, I think. Will find out shortly in next lesson

Downside to the Mikrokernel

Summary

Biggest down side with microkernel approach is potential performance hit. For application to talk to file system (or other servers), a lot of IPC is needed. Data flowing between services require IPC calls in order to Marshall data in and out between applications and servers, and between servers as well

Why performance loss

Summary

Memory locality changes since we need to context switch, dramatically impeding performance. Same goes with copying data between user space and system space. This strategy is unlike monolithic, which the servers can share data easily with one another, although a little more inflexible.

What do we want?

Summary

Trade off between performance, extensibility and protection

Is it possible to get the OS to exhibit all three desirable characteristics: performance, extensibility, and protection — at the same time? Apparently, there is, but I’ll have to wait until the next module

August 27, 2020
How to go beyond physical memory ?

How can the OS make use of a larger, slower device to transparently provide the illusion of a large virtual address space?

Overview

Why try and create a large virtual address space, a virtual address space that is larger than the physical amount of address memory installed on the system? Well, the illusion of a larger virtual address space serves as a useful abstraction for application developers: they just request more memory and simply get it — thanks to the operating system.

Arpaci-Dusseau and Arpaci-Dusseau 2018) – Swap Space

To this end, the OS creates a memory hierarchy. At the top sits the fastest cache (L1 or L2 or L3) and at the bottom sits hard disks (as much as 10,000 slower). And the speed of the cache is inversely proportional to the cost. That is, fast cache is expensive and slow cache is cheap.

Essentially, the OS reserves portions of the disk as swap, the location in which memory is paged in (i.e. from disk to memory) and out of (from memory to disk). How does the OS determine which pages currently live in memory and which pages live on disk? By looking at the page table entry metadata. More specifically, if the present bit is set then the physical frame lives in memory. Otherwise, it lives on disk.

In the scenario when the page lives on disk, the page handler must get invoked. With the present bit set as 0, the page handler will find the memory living in swap (the address is also stored in the metadata of the page table entry), page the memory into either a physical frame (that was available or freed up by the page replacement policy), set the present bit equal to 1, and finally re-run the instruction that generated the page fault (i.e. memory access when present bit set to 0).

21.1 Swap Space

Summary

The swap space is orders of magnitude larger than physical memory, allowing OS to prevent a large amount of memory to the process and program developers

21.2 The Present Bit

Summary

The present bit is a flag that lets the translate lookaside buffer (TLB), hardware or OS software based, whether the physical frame number itself lives in memory or whether it has been swapped out to disk. If the bit/flag is set to 0, then we must invoke the pagefault handler. Also, the TLB locates the page table based off of the page table register but this rises another question for me: who’s responsibility it is to set the page table register. Probably the OS since the OS is responsible for this per process page table. The OS is a beast!

21.3 The Page Fault

Summary

The TLB checks the PTE and if the PTE’s present bit is set to 0, the page fault handler runs. The page fault handler will check the PTE’s metadata for the location / index of where the page table entry exists on disk (or whatever underlying swap mechanism). Then, the page fault handler will rerun the process and the TLB (hardware or software) will then update its cache. During this page fault workflow, the process is put into a blocked state so other processes can run: concurrency for the win!

21.4 What if Memory is Full

Summary

Need to invoke the page replacement policy (like an LRU)

21.5 Page Fault Control Flow

Summary

When a page fault occurs, the hardware (TLB) and software (OS) must carry out a sequence of actions, depending on the scenario. Focusing on the OS, if a page fault occurs, then the OS must fetch a new physical address, populate the page frame with the right contents, update the PTE, setting the valid bit equal to 1, update the PTE’s page frame number, then finally retry the instruction.

21.6 When replacements really occur

Summary

To optimize the address space and ensure there are always pages available, the swap (or page) daemon runs in the background, freeing up memory for running processes and for the OS.

August 26, 2020
Operating Systems – Memory Virtualization – Paging

In my other blog post on memory segmentation, I talked about diving the process’s virtual address space into segments: code, heap, stack. Memory segmentation is just one approach to memory virtualization and another approach is paging. Whereas segmentation divides the address space into segments, paging chops up the space into fixed-sized pieces.

Each piece is divided into a virtual private number (VPN) and the offset; the number of bits that make up the offset is determined by the size of the page itself and the number of pages dictate the number of bits of the virtual private number.

To support the translation between virtual address to physical address, we need a page table, a per process data structure that maps the virtual address to the physical frame number (PFN).

(more…)

August 25, 2020