I learned that with an L3 Microkernel approach, each OS runs in their own address space and that they are indistinguishable from the end-user applications running in user land. Because they run in user land, it seems intuitive that this kill performance due to border crossings (not just necessarily context switching, but address space switching and inter process communication). Turns out that this performance loss has been debunked and that border crossings do not cost 900 cycles — this number can be dropped to around 100 cycles if the mikrokernel, unlike Mach, stray away from writing code that is platform dependent. In other words, the expensive border crossing was a result of a heavy code base, code with portability as one of its main goals.
L02d: The L3 Microkernel Approach
Learned why you might not need to flush TLB on context switch with address space tag in the TLB.
Focus on this lesson is evaluating L3, a micro kernel based design, a system and design with a contrarian view point
Microkernel-Based OS Structure
Each of the OS services run in their own address space, the services indistinguishable from applications running in user space
Potentials for Performance Loss
The primary performance killer is border crossings, between user space and privileged space, a crossing required for user applications and well as operating system services. Also, that’s the explicit cost. There’s an implicit cost of procedure calls as well: loss of locality
L3 Microkernel argues that the OS structure is sound but what really needs to be the focus is efficient implementation since its possible to share hardware address space while offering protection domains for the OS services. How that works? Will find out soon
Strikes against the Microkernel
Three explicit strikes of providing application level service in a microkernel structure (i.e. border crossings, address space switches, thread switches and IPC requiring the kernel. One implicit cost: memory subsystem and loss of locality
Debunking User Kernel Borer Crossing Myth
SPIN and Exokernel used Mach as basis for decrying micro kernel based design. In reality, the border crossing costs is much cheaper: 123 processor cycles versus 900 cycles (in March)
Address Space Switches
For address space switch, do you need to flush the TLB? It depends. I learned that for address space tags (basically a flag for a process ID), you do not have to. But I also thought that if you used a virtually indexed physically tagged address you don’t have to either.
Address Space Switches with AS Tagged TLB
With address space tags (discussed in previous section), you do not need to flush the TLB during a context switch. This is because the hardware will not check two fields: the tag and the AS (i.e. the process ID). If and only if those two attributes match do we have a TLB hit).
Liedke’s Suggestions for Avoiding a TLB Flush
Basically leverage the hardware’s capabilities, like segment registers. By defining a protection domain, the hardware will check whether or not the virtual address being translated falls between the lower and upper bound defined in the segment registered, the base and bound registers. Still don’t understand how/why we can use multiple protection domains.
Large Protection Domains
If a process occupies all the hardware, then there are explicit and implicit costs. Explicit in the sense that a TLB flush may take up to 800+ CPU cycles. Implicit in the sense that we lose locality
Upshot for Address space switching?
Small address space then no problem (in terms of expensive costs) for context switching. Large address space becomes problematic because of not only flushing the TLB but more costly is the lost in locality, not having a warm cache
Thread switches and IPC
The third myth that L3 Microkernel debunks is the expensive cost of thread switching. By construction (not entirely sure what this means or how this is proved)
Implicit costs can be debunked by putting protection domains in the same hardware address space, but requires that we use segment registers to protect processes from one another. For large protection domains, the costs cannot be avoided
Reasons for Mach’s expensive border crossing
Because Mach’s focus was on portability, there’s code bloat and as a result, there’s less locality and incurs longer latency for border crossing. In short, Mach’s memory footprint (due to code bloat and portability) is the main culprit, not the actual philosophy of micro kernel
I’m Matt Chung. I’m a software engineer, seasoned technology leader, and father currently based in Seattle and London. I love to share what I know. I write about topic developing scalable & fail-safe software running in the AWS cloud, digital organization as a mechanism for unlocking your creativity, and maximizing our full potentials with personal development habits.View all articles