On my iPad this morning, I doodled the above figure to help me better understand how I should be calling the function virDomainPinVcpu (as part of project 1 for my advanced operating systems course). The function requires two parameters which I found a bit confusing: a pointer to the cpu map (i.e. bit map) and length of the map itself.
In order to generate the CPU map, we need a couple pieces of information: the number of physical CPUs and the number of virtual CPUs (for a guest virtual machine). This information can be obtained by calling virNodeGetInfo and virDomainGetVcpus, respectively1.
First, you need to get the number of byes needed to represent the physical CPUs. Let’s say our hypervisor runs 8 physical CPUs. In this case, we’d need a single byte: bit 0 represents CPU#0, bit 1 represents CPU#1 … bit 7 represents CPU#7. If we had 16 physical CPUs, then we’d need 2 bytes. Let’s call this physical_cpu_ptr.
Then, each virtual CPU will contain a mapping to its own unique physical_cpu_ptr.
Example
Below is an output from within GDB. In this example, my physical hypervisor has CPUs and each virtual machine runs 2 virtual CPUs. Based off of what I mentioned above, that means mean the virtual machine’s bitmap contains 2 bytes. First byte is for virtual CPU#1 and second byte is for virtual CPU#2 — can run on any one of the four physical CPUs. Hence why only the first four bits are set.
At the park, ran around in circles while holding Elliott in my arms … that sort of counts as exercise, right?
Music
Nothing at all
Graduate School
Migrated from my Virtualbox environment to Digital Ocean. My local Ubuntu instance kept crashing, the KVM service unable to start up the guest operating systems. The instances would just hang for no reason and instead of wasting time troubleshooting, I’d rather focus on the code itself
Wrestled with libvrt’s documentation and finally was able to collect CPU statistics using the API (see snippet below). The documentation sometimes says you need to pass in a struct when they mean pointer (and vice versa)
cpuTime: 119900000000
root@aos-kvm-ubuntu-18:/tmp# ./vcpu-scheduler
Num domains: 2
Active Domain IDs:
2
1
State: 1, nrVirtCpu: 2, cpuTime=311380000000
Domain CPU Stats:
cpuTime: 162140000000
cpuTime: 144030000000
State: 1, nrVirtCpu: 2, cpuTime=268990000000
Domain CPU Stats:
cpuTime: 142960000000
cpuTime: 119900000000
Work
Firmed up my design document at work, addressing other engineer’s feedback that was sprinkled in the Quip document
E-mailed Asians@ so that they include my upcoming event in the newsletter
Organization
Using my ScanSnap IX1500, I scanned all my loose documents (e.g. tax statements, W-2 copies, etc) and then shredded them, clearing up space on my desk and just overall getting a little more organization
Friends and Family
Elliott getting stuck in 3 tier shelf
Popped into Wells Fargo Bank to transfer earnest money to the escrow company. I had originally tried calling the bank at around 04:00 AM (when their support center opened) to have both enable wire transfers on my account and bump up the transfer limit but quickly realized that sending tens of thousands of dollars should be probably be done in person at the branch
Unexpectedly watched Elliott for the about 3 hours yesterday. Jess was already running behind for the inspection for our new home located in Renton and Elliott was still napping. So instead of waking her up, she left her with me in the middle of the work day and now I truly understand the plight of parents all around the world, parents who need to work from home while taking care of young children: it’s impossible.
Scheduled follow up veterinarian appointments for both Metric and Mushroom. Metric’s ear developed a little fissure in her ear, looking as if a tick made a home and then evacuated. And on the back of mushroom’s neck, there’s a maybe 5 or 6 white head looking bumps that are located in the place where I had applied her flea medication a couple weeks ago
Today
Organization
Call Well’s Fargo (again) and see if they can bump the limit on my wire transfer since I want to avoid going to a Wells Fargo branch due to COVID-19
Migrate straggling sticky notes that are sitting on desk and process them into “Writing Ideas” or “Inbox” in Omnifocus
Graduate School
Finish memory virtualization series
Write a few more lines of C code to get a better sense of how I’m going to write the scheduler (all a bit fuzzy right now). Not sure how scheduler is going to integrate with KVM and not sure what algorithms I’ll select and implement
Work
Meet with principle network engineer to close out open questions on my design document
Attend weekly operations meeting and discuss events that popped up throughout the week
Family
Carry out morning routine: walk with Elliott and Jess and the dogs, blend up a delicious strawberry and blueberry and banana smoothie (thanks to Jess, who picked up the necessary ingredients from Trader Joe’s yesterday), feed the dogs their raw food
Reply to loan broker and see if they can lock in the 2.875% rate for our mortgage
Hold a discussion with Shuk (our realtor) and finalize what issues that popped up during home inspection need to get fixed before we move forward into the next step
What are you grateful for?
Spending 1 on 1 time with little Elliott yesterday (despite needing to watch her last minute during a work day). I cannot really explain it but I think I’m developing some sort of paternal love for my daughter, feelings that I’ve never experienced before, not for any human and not for any of my beloved pets. I cannot really describe the emotion … but she’s able to put a wrinkle in my nose just by sitting there and shooting me a smile.
Feelings
After 3 hours of watching Elliott yesterday, I was wiped out, taxed physically and emotionally. That little 11 month year old crawls faster than a rattlesnake and I’m having a hard time keeping up with her. She puts everything in her mouth, from dog hair to shreds of paper (but I would like to call out a victory from yesterday, a proud dad moment, when I was able to free a piece of paper from her mouth before she swallowed it: dad goals).
I learned that with an L3 Microkernel approach, each OS runs in their own address space and that they are indistinguishable from the end-user applications running in user land. Because they run in user land, it seems intuitive that this kill performance due to border crossings (not just necessarily context switching, but address space switching and inter process communication). Turns out that this performance loss has been debunked and that border crossings do not cost 900 cycles — this number can be dropped to around 100 cycles if the mikrokernel, unlike Mach, stray away from writing code that is platform dependent. In other words, the expensive border crossing was a result of a heavy code base, code with portability as one of its main goals.
L02d: The L3 Microkernel Approach
Summary
Learned why you might not need to flush TLB on context switch with address space tag in the TLB.
Introduction
Summary
Focus on this lesson is evaluating L3, a micro kernel based design, a system and design with a contrarian view point
Microkernel-Based OS Structure
Summary
Each of the OS services run in their own address space, the services indistinguishable from applications running in user space
Potentials for Performance Loss
Potentials for performance loss
Summary
The primary performance killer is border crossings, between user space and privileged space, a crossing required for user applications and well as operating system services. Also, that’s the explicit cost. There’s an implicit cost of procedure calls as well: loss of locality
L3 Microkernel
L3 Microkernel
Summary
L3 Microkernel argues that the OS structure is sound but what really needs to be the focus is efficient implementation since its possible to share hardware address space while offering protection domains for the OS services. How that works? Will find out soon
Strikes against the Microkernel
Summary
Three explicit strikes of providing application level service in a microkernel structure (i.e. border crossings, address space switches, thread switches and IPC requiring the kernel. One implicit cost: memory subsystem and loss of locality
Debunking User Kernel Borer Crossing Myth
Debunking user <-> kernel borer crossing myth
Memory Effects
Summary
SPIN and Exokernel used Mach as basis for decrying micro kernel based design. In reality, the border crossing costs is much cheaper: 123 processor cycles versus 900 cycles (in March)
Address Space Switches
Summary
For address space switch, do you need to flush the TLB? It depends. I learned that for address space tags (basically a flag for a process ID), you do not have to. But I also thought that if you used a virtually indexed physically tagged address you don’t have to either.
Address Space Switches with AS Tagged TLB
Summary
With address space tags (discussed in previous section), you do not need to flush the TLB during a context switch. This is because the hardware will not check two fields: the tag and the AS (i.e. the process ID). If and only if those two attributes match do we have a TLB hit).
Liedke’s Suggestions for Avoiding a TLB Flush
Summary
Basically leverage the hardware’s capabilities, like segment registers. By defining a protection domain, the hardware will check whether or not the virtual address being translated falls between the lower and upper bound defined in the segment registered, the base and bound registers. Still don’t understand how/why we can use multiple protection domains.
Large Protection Domains
Summary
If a process occupies all the hardware, then there are explicit and implicit costs. Explicit in the sense that a TLB flush may take up to 800+ CPU cycles. Implicit in the sense that we lose locality
Upshot for Address space switching?
Summary
Small address space then no problem (in terms of expensive costs) for context switching. Large address space becomes problematic because of not only flushing the TLB but more costly is the lost in locality, not having a warm cache
Thread switches and IPC
Summary
The third myth that L3 Microkernel debunks is the expensive cost of thread switching. By construction (not entirely sure what this means or how this is proved)
Memory Effects
Summary
Implicit costs can be debunked by putting protection domains in the same hardware address space, but requires that we use segment registers to protect processes from one another. For large protection domains, the costs cannot be avoided
Reasons for Mach’s expensive border crossing
Summary
Because Mach’s focus was on portability, there’s code bloat and as a result, there’s less locality and incurs longer latency for border crossing. In short, Mach’s memory footprint (due to code bloat and portability) is the main culprit, not the actual philosophy of micro kernel
Met with my therapist, who I see every week (except last week since I had to cancel the session due to being on call)
Shared how I enjoy mentoring other people not just on technology but on the human element of our work. That’s the good stuff.
He thinks that my preference for having an organized mind stems from survival techniques that I developed as a young child
Music
Practiced singing major and minor scale from memory (no instrument leading me with any tones), a skill I picked up from my guitar instructor
Graduate School
Generated the libvrt documentation from source code
Installed libvrt-dev, enabling me to compile the first executable binary with -lvirt flags passed to gcc
Compiled program using example source code from documentation that I built
Watched second part of “memory virtualization” lecture and learned that shadow pages map pages from the guest virtual operating system to machine page number
Work
Met with someone AWS Networking , the two of us chatting about the new feature that we are going to launch in Q1 2021
Debugged a crash with the fuzzer (frustrating because I’m unable to reproduce the crash and neither can the Principle engineer on the team)
Friends and Family
Bathed Elliott last night, the bath running longer than usual (30 minutes instead of 10)
Feeling very pressured and nervous around moving to Renton, so much needed to get done to make it happen (e.g. pack all of our belongings, hire a moving company, cleaning up existing house that we rent). Need to continue taking deep breathes and chip away at each task, slowly, one by one
Today
Organization
Call Well’s Fargo (again) and see if they can bump the limit on my wire transfer since I want to avoid going to a Wells Fargo branch due to COVID-19
Migrate straggling sticky notes that are sitting on desk and process them into “Writing Ideas” or “Inbox” in Omnifocus
Graduate School
Wrap up memory virtualization series
Write a few more lines of C code to get a better sense of how I’m going to write the scheduler (all a bit fuzzy right now). Not sure how scheduler is going to integrate with KVM and not sure what algorithms I’ll select and implement
Work
E-mail Asians@ so that they include my upcoming event in the newsletter
Family
Carry out morning routine: walk with Elliott and Jess and the dogs, blend up a delicious strawberry and blueberry and banana smoothie (thanks to Jess, who picked up the necessary ingredients from Trader Joe’s yesterday), feed the dogs their raw food
Schedule follow up veterinarian appointments for both Metric and Mushroom
What are you grateful for?
To be in a position where I can (and have been for the last 4 years) attend therapy, thanks to my insurance covering a large portion of the bill. Everyone should be able to afford health care.
Oatmeal breakfast that Jess whipped up
Also grateful for a delicious oatmeal breakfast (above) that Jess cooked for breakfast.
Feelings
Same as yesterday: Simultaneously excited and nervous about buying and moving into a new home
If just download the libvert application development guide, click here.
How to build the documentation
libvrt broken documentation
The libvrt developer documentation link is broken (i.e. HTTP 404). But I need the development guide for my advanced OS course so I downloaded the repository and built the documentation from source. If you want to do the same (instead of downloading the PDF I rendered above) you can execute the following instructions:
YouTube’s recommendation engine suggested that I watch a video called “The Cult of Dan Lok”. Mind you, I never even heard of Dan Lok but my intuition lead me to believe that he runs some sort of pyramid scheme. Surprise surprise: he does.
Anyways, in the video below, a student of Dan Lok describes how he dumped $26,000 into an “exclusive” program and how in that program, at every step of the way, Dan Lok (or people working directly for him) upselled a new program, a new promise from rags to riches.
I seriously don’t understand why and how people fall for this sort of crap. Don’t people understand that there’s no quick and easy fix for life? And anybody who is selling you that promise is probably full of shit?
I get livid and upset that people — like Dan Lok — can take advantage of people all over the world. Granted, I understand that these victims are consenting adults but come on.
E-mailed the singing instructor that I’ve been seeing for the last couple years, informing her that lately I’ve been too busy and had to shift around my priorities, now that I’ve stepped into fatherhood. I sorely miss singing and felt that the activity brought a breathe of fresh air into my life. Maybe I can continue and maybe I can do one off lessons: that’s always an option.
Graduate School
Starting working on project 1 by ensuring that I can launch the virtual machines inside of my virtual box environment. Ran into a slew of issues that I’ve document and will publish on this blog
Work
Presented and my design document for a new feature/service that AWS will be offering in the future. I had to shake off my nervousness, a feeling I get despite how well prepared and despite how number of years I’ve practiced and polished my public speaking skills
Starting debugging a crash discovered by our Fuzzer. I never dealt directly with the fuzzer so this is a great learning opportunity to not only fix a problem but deeper understand what the fuzzer exactly is doing
Friends and Family
Excited to design and decorate my new home office
Bathed Elliott last night. She only lasted about 5 minutes (about 1/2 to 1/3 of the time we usually take a bathe for) since she was so sleepy, despite her clocking in a one and a half hour nap, an hour longer than her other naps. Maybe she’s going through some sort of growth spurt? Maybe she’s sleeping better because I hung up curtains in room that shield her from the setting sun?
Video chatted with Martin, the two of us discussing software and architecture design for an authentication system he is working on. Nice that I can share my thoughts around trade offs, trade offs that I’ve picked up from both working at Amazon over the years and from graduate school. For example, talking about the trade offs of caching and caching is not free: need to tackle cache consistency and cache coherency.
Panicked panicked panicked. The offer that we put in on the house the day before has been accepted and my wife and I are officially pending on a new house located in Renton. Although I’m nervous and scared and will miss North Seattle, I know that this relocation is the right step for our family. Elliott needs more space and seeing her crawl around the living room — over and over and over again — reaffirms my decision. Not only that, but I can finally build myself a real work from home office, one that feels warm and one that I can call my own.
Today
Organization
Plan day and week out by reviewing OmniFocus forecast events
Process e-mail inbox down to zero
Migrate sticky notes (written down while walking dogs in the morning) into writing tracker and OmniFocus
Graduate School
Begin second series of lectures for advanced operating systems, lectures on “Memory Virtualization” (exciting stuff, I think)
Work
Revisit the open comments from design review and follow up with AWS Networking teams
Family
Check work calendar and check if I can perform the home inspection at 2:00 PM on Thursday
What are you grateful for?
Despite the fact that we’re in the midst of a pandemic, despite that the massive layoffs in America and 10% unemployment rate, I’m fortunate enough to be in a position to have earned and saved enough money to buy a house. I feel both very blessed and also guilty at the same time. I acknowledge my hard work and perseverance but also acknowledge that I could not have done this on my own: so many people have helped me along the way in my life. I must continue to return the favor.
Feelings
Simultaneously excited and nervous about buying and moving into a new home
If you are executing uvt-simplestreams-libvrt you’ll need to execute the command with sudo and exercise patience (i.e. be okay with waiting 3 minutes while program runs without outputting any informational message to the standard output)
No logging to standard output/error
I had to exercise some patience when executing the command uvt-simplestreams-libvrt sync, the command that downloads OS images to the box. Basically, the command takes several minutes to complete and does not any information messages while running, leaving you wondering if any forward progress is being made.
Tip #1 – Run with sudo
If you do not run the command with sudo, the program will download images but then fail to wrtiting to the socket.
TIp #2 – Verify images have be downloaded
Once you downloaded the images, you can list all the images by using the query subcommand.
Project 1 was released last evening at 08:59 PM PST and this morning, I decided to start on the project by reading through the overview and get the lay of the land. For this project, we’ll need to deliver to operating system components: a scheduler and a memory coordinator (not even sure what that means exactly).
So what I’m doing as part of this post is just taking a snapshot of the questions I have and topics I do not understand, topics that I’ll probably understand in much more depth as the project progresses. More often than not, I often dismissive of all the work I put in over the semester and this post is one way to honor the time and commitment.
Overall, this project’s difficulty sits in the right place — not too hard but not too easy. The sweet spot for Deliberate Practice.
Questions I have
What algorithm should I implement for my scheduler?
What algorithms fit the needs for this project
What the heck is a memory coordinator?
Why do we have a memory coordinator? What’s it purpose?
How do you measure the success of a memory coordinator?
How do I use libvrt library?
What is QEMU?
Where does the scheduler sit in relationship to the operating system?
How will I get the hypervisor to invoke my scheduler versus another scheduler?
Project Requirements
You need to implement two separate C programs, one for vCPU scheduler (vcpu_scheduler.c) and another for memory coordinator (memory_coordinator.c)