Now that I finished writing the vCPU scheduler for project 1, I’m moving on to the second part of the project called the “memory coordinator” and here I’m posting a similar blog post to a snapshot of my understanding of project 1 , the motivation being that I take for granted what I learned throughout graduate school and rarely celebrate these little victories.
This post will focus on my unknowns and my knowledge gaps as it relates to the memory coordinator that we have to write in C using libvrt. According to project requirements, the memory coordinator should:
dynamically change the memory of each virtual machine, which will indirectly trigger balloon driver.
Questions I have
As I chip away at the project, more questions will inevitably pop up and when they do, I’ll capture them (but in a separate blog post). So here’s my baseline understanding of what a memory coordinator:
What are the relevant memory statistics that should be collected?
Will the resident set size (RSS) be relevant to the decision making algorithm? Or it is irrelevant?
What is the upper and lower bounds of memory consumption that triggers the ballooning driver to page memory out or page memory in?
Will the balloon driver only trigger for virtual machines that are memory constrained?
Does the hypervisor’s memory footprint matter (I’m guessing yes, but to what extent)?
As system designers, our goal is to design a “black box” system that create an illusion that our users have full and independent access to the underlying hardware of the system. This is merely an abstraction since we are building multi-tenant systems with many applications and many virtual guest machines running on a single piece of hardware, all at the same time.
To this end, we build what is called a “hypervisor” (code that runs directly on the physical hardware), the software supporting multiple guest machines that run on of it. The guest operating system either be virtualized guest operating system (that has no clue it’s a virtual guest and so that underlying OS binary is the same as it is if you were to install it on a physical server) or be para virtualized operating system (that is aware of the fact that it is virtualized, similar to how the “hosts” in Westworld gain awareness that they are in fact robots)
Quiz: Virtualization
Summary
the concept of virtualization is prolific. We see it in the 60s and 70s when IBM invented the VM/370. We also see it in cloud computing and modern data centers.
Platform virtualization
Summary
As aspiring operating system designers, we want to be able to build the “black box” in which the applications ride on top of, the black box being an illusion of an entire independent hardware system, when really it is not.
Utility Company
Summary
End users resource usage is bursty and we want to amortize the cost of the shared hardware resources. End users have access to large available resources
Hypervisors
Hypervisors
Summary
Inside the black box, there are two types: native and hosted. Native is bare metal, the hypervisors running directly on the hardware. Hosted, on the other hand, runs as a user application on the Host operating system
Connecting the dots
Summary
Concepts of virtualization date back as far as the 70s, when IBM first invented it with IBM VM 370. Fast following was microkernels, extensibility of OS and SIMOS (late (0s, and then most recently, Xen + VMware (in the 2000s). Now we are looking at virtualizing the data center.
Full Virtualization
Full virtualization
Summary
With full virtualization, the underlying OS binaries are untouched, no changes required of the OS code itself. To make this work, hypervisor needs to employ some strategies for some system calls that silently fail
Para Virtualization
Summary
Paravirtualization can directly address some of the issues (like silently failing calls) that happens with full virtualization. But OS needs to be modified ; at the same time, can take advantage of optimizations like page coloring.
Quiz: What percentage of guest OS code may need modification
Quiz – Guest OS changes < 2%
Summary
Less than 2% of code needs modification to support paravirtualization, very minuscule (proof of construction, using Xen)
Para Virtualization (continued)
Summary
With paravirtualization, not many code changes are needed and almost sounds like a no brainer (to me)
I’m really struggling to intimately understanding virtually index privately tagged concept. From a high level, I get that VIPT is an optimization technique, parallelizing both the translation of virtual addresses to physical and the cache look up of the physical address. This seems impossible at first since the cache depends on the physical frame number (PFN).
So in order to accomplish this, we’ll use the some bits of the VPN to search in the TLB and then the least significant bits when searching the cache. This makes sense. But what I don’t quite fully don’t understand is why we have to limit the size of the cache in order to avoid aliasing (i.e. different virtual addresses that end up mapping to the same physical frame).
Thinking out loud here … I suppose that if the index of the physical bits bleed into the virtual address, then we actually do not avoid the aliasing problem, the issue of having more than one virtual address map to the same physical frame.
Ok, I think I got it.
But what’s the catch?
You only can have a very small cache, the size equal to the page size multiplied by the set associativity. This means that the only way to increase the size of the cache is by increasing the associativity which ultimately increases the look up time as well since the cache has to look the tag up across multiple cache lines.
In my other blog post on memory segmentation, I talked about diving the process’s virtual address space into segments: code, heap, stack. Memory segmentation is just one approach to memory virtualization and another approach is paging. Whereas segmentation divides the address space into segments, paging chops up the space into fixed-sized pieces.
Each piece is divided into a virtual private number (VPN) and the offset; the number of bits that make up the offset is determined by the size of the page itself and the number of pages dictate the number of bits of the virtual private number.
To support the translation between virtual address to physical address, we need a page table, a per process data structure that maps the virtual address to the physical frame number (PFN).