Author: mattchung

  • Barrier Synchronization (Part 1/2)

    Barrier Synchronization (Part 1/2)

    As mentioned previously, there are different types of synchronization primitives that us operating system designers offer.  If as an application designer you nee to ensure only one thread can access a piece of shared memory at a time, use a mutual exclusion synchronization primitive. But what about a different scenario in which you need all threads to reach a certain point in the code and only once all threads reach that point do they continue? That’s where a barrier synchronization comes into play.

    This post covers two types of barrier synchronizations. The first is the naive, centralized barrier and the second is the a tree barrier.

    In a centralized barrier, we basically have a global count variable and as each thread enters the barrier, they decrement the shared count variable.  After decrementing the count, threads will hit a predicate and branch: if the count is not zero, then the thread enters a busy spin loop, spinning while the count is greater than zero. However, if after decrementing the counter equals zero, then that means all threads have arrived at the end of the barrier synchronization.

    Simple enough, right? Yes it is, but the devil is in the details because there’s a subtle bug, a subtle edge case. It is entirely possible (based off of the code snippet below) that when the last thread enters the barrier and decrements the count, all the other threads suddenly move beyond the barrier (since the count is not greater than zero). In other words, the last thread never gets to reset the count back to N number of threads.

    How to avoid this problem? Simple: add another while loop that guarantees that the threads do not leave the barrier until the counter gets reset. Very elegant. Very simple.

    One way to optimize the centralized barrier is to introduce a sense reversing barrier (as I described in “making sense of the sense reversing barrier”).

    The next type of barrier is a tree barrier. The tree barrier groups multiple process together at multiple levels (number of levels is logn where n is the number of processors), each group maintaining its own count and local sense variables. The benefit? Each group spins on its own locksense. Downside? The spin location is dynamic, not static and can impede performance on NUMA architectures.

    Centralized Barrier

    Centralized Barrier

    Summary

    Centralized barrier synchronization is pretty simple: keep a counter that decrements as each thread reaches the barrier. Every thread/process will spin until the last thread arrives, at which point the last thread will reset the barrier counter so that it can be used later on

    Problems with Algorithm

    Summary

    Race condition: last thread, while updating the counter, all other threads move forward

    Counting Barrier

    Summary

    Such a simple and elegant solution by adding a second spin loop (still inefficient, but neat nonetheless). Sense reverse barrier algorithm

    Sense Reversing Barrier

    Sense Reversing Barrier

    Summary

    One way to optimize the centralized barrier is to introduce a sense reversing barrier. Essentially, each process maintains its own unique local “sense” that flips from 0 to 1 (or 1 to 0) each time synchronization barrier is needed. This local variable is compared against a shared flag and only when the two are equal can all the threads/processes proceed past the current barrier and move on to the next

    Tree Barrier

    Tree Barrier

    Summary

    Group processes (or threads) and each group has its own shared variables (count and lock sense). Before flipping the lock sense, the final process needs to move “up to the next level” and check if all other processors have arrived at the next level. Things are getting a little more spicy and complicated with this type of barrier

    Tree Barrier (Continued)

    Summary

    With a tree barrier, a process arrives at its group (of count and lock sense), and will decrement the count variable and will then check the lock sense variable. If lock sense is not equal, then spin. If last

    Tree Barrier (continued)

    Summary

    Once the last process reaches the root, it’s their responsibility to begin waking up the lower levels, traversing back down the tree. At each level, they will be flipping the lock sense flag

    Tree Barrier (Continued)

    Summary

    As always, there’s a trade off or hidden downside with this implementation. First, the spin location is not statically determined. This dynamic allocation may be problematic, especially on NUMA (non uniformed memory access architecture) architecture, because a process may be spinning on a remote memory location. But my question is, are there any systems that do not offer coherence?

  • On letting go & Daily Review – Day ending in 2020/09/17

    With working remote and establishing a (somewhat) daily routine (that has become pretty monotonous), it’s sometimes easy to forget that we’re in the midst of a global pandemic. But that reality is amplified because of the recent wild fires, forcing those of us living in the pacific north west (PNW) — and those living in Northern California — to remain indoors.  This additional layer of lock down definitely impacts the mental health for both my wife and I. I’m almost certain that not breathing in fresh air from outside negatively affects my 11 month year old daughter, who has not stepped (or carried) outside this past week — not even for a few minutes.  I really want her to join me for the daily walks with the dogs, even if it is for a few minutes, but my wife and I agreed that’s its best for the long term if she doesn’t inhale any of the ashes trickling down from the sky.

    Source: https://www.kingcounty.gov/depts/health/covid-19/data/key-indicators.aspx

     

    On Letting Go

    Unrelated to the above comment about the pandemic, I was reading Write it down make it happen on my Kindle while sitting on the can (you’d be surprised how much reading you can get done during your trips to the bathroom) and there’s a section in the book that suggests sometimes the way to make things happen is to simply let go of the reins and relegate all control.

    On some level, I agree that sometimes you’ll find the very thing you’ve been looking for when you stop searching. This concept rings true to me because during my mid twenties, I stopped searching for “the one” and instead, began looking inward and healing myself (my life was spiraling out of control due to my compulsive and addictive behavior) and on the path to recovery, I ended up meeting the one, my now wife, the two of us bumping into each other while volunteering at a children’s orphanage (she often jokes and tells people that I adopted her …  not everyone gets the joke).

    Yesterday

    What I learned yesterday

    • Learned a new data structure called n-ary tree (i.e. each node can have up to N children). Picked up this new graph theory inspired data structure from watching the video lectures for advanced operating systems

    Funny moments

    • Professor made me laugh while I was watching his recorded lecture videos, the professor describing the tournament barrier and how the algorithm “rigs” the competition and how the algorithm seems applicable to real life given professional sports rig competitions all the time (e.g. baseball and world series) Okay, reading this out to myself now doesn’t sound that funny but I guess its something you have to hear first hand.

    Writing

    Best parts of my day

    • Eating dinner with Jess while grubbing on some delicious Vietnamese take out food from our favorite (for now) local restaurant called Moonlight. They sell the best Vegan Vietnamese food, dishes including Ca Co (clay pot fish) and Canh Chua (Vietnamese Sweet and Sour Soup) and Pho. All this wonderful food while the two of us planted in front of the cinema sized television playing “Fresh Off The Boat”.
    • Bridging the gap between theory (computer science) and practice (life as a software engineer at Amazon). While walking through code with a principle engineer, he described the data structure he came up with that (classically) traded space for lookup performance, describing an index that bounds the binary search. To clarify the data structure, he “whiteboarded” with me, using (an internally hosted) draw.io, and the figures made me realize that the data structure resembles how page tables in operating systems are designed.

    Mental and Physical Health

    • Remembered to stretch my hamstrings a couple days throughout the day. Not really enough exercise but the best I could do given that the weather outside still is labeled “very unhealthy” due to the wildfire smokes.

    Graduate School

    • Watched a few more lectures from the “Barrier Synchronization” series, learning more about the tree barrier and the tournament barriers, two slightly more advanced data structures than the simplistic centralized barrier
    • Finished putting together my submission, a .zip file containing: source code, Makefile, log files (from running my program against the test cases).

    Work

    • Met with a Vietnamese senior principle engineer yesterday who I reached out to for two reasons: asking him to participate in a future event that I’ll host on behalf of Asians@ and asking him for some tips given he’s excelled (at least on paper) in his career while bringing his full self to work (i.e. being a father).
    • Walked through some C data structures with a principle engineer and successfully imported the code (with a few minor tweaks) into my team’s package
    • During my 1:1 with my manager, he straight up asked me if I was leaving the team. I realized that I may have given him that impression because during a team meeting, I spoke up about how if the operations on the team continues to disturb me and wake me up consistently in the middle of the night then I would reconsider switching teams, which is true. Although I love my position and love what I’m doing at work, I do value other things such as a good nights sleep because poor sleep means poor mental health which leads me down a very dark path.

    Family and Friends

    • Watched after my daughter yesterday morning from 06:00 to 06:45, allowing my wife to catch up on some sleep due to a difficult night with Elliott, who we think is teething, given that she’s been waking up in the middle of the night, waking up more than usual.
    • Left a couple voice notes in WhatsApp for my brother-in-law, who is getting into writing and who asked me for some suggestions on writing platforms to host his blogging content. In short, I say it doesn’t really matter all that much but what does matter is that he owns his content and syndicates it out

    Miscellaneous

    • Downloaded 30 days of Wells Fargo and Morgan Stanley transaction history, proving that my wife and I have the funds needed to complete the transaction on September 30th, the day we close on our house located in Renton

    Today

    Writing

    • Publish Part 1 of Barrier Synchronization notes
    • Publish this post (my daily review)

    Mental and Physical Health

    • Stretch stretch stretch! Add a 5 minute event that notifies you on your phone so that step back from your desk and reach for your toes. Squat a couple times. Drop to the ground for a couple push ups. Get that heart pumping!

    Graduate School

    Work

    • Attend weekly operational meetings
    • Submit code review for small feature the optimizations look up time on the packet path

    Administrative

    • Attend my dentist appointment at 02:30 in the afternoon. How the hell is that going to work with the pandemic? Obviously I won’t be able to wear a mask with the pandemic … I should probably give them a ring this afternoon, before stepping into the appointment

    Family

    • Pack pack pack! There’s only a couple weeks left until we are out of this house. It’s difficult to pack at the end of the day because both Jess and I are both exhausting, her from watching Elliott and me from working. But maybe we can pack while eating dinner instead of sprawling out on the couch and watching “Fresh off the boat” (one of my favorite TV series, I think)

     

     

  • Making sense of the “sense reversing barrier” (synchronization)

    Making sense of the “sense reversing barrier” (synchronization)

    What’s the deal with a sense reversing barrier? Even after watching the lectures on the topic, I was still confused as to how a single flag could toggle between two values (true and false) can communicate whether or not all processes (or threads) are running in the same critical section. This concept completely baffled me.

    Trying to make sense of it all, I did some “research” (aka google search) and landed on a great article titled Implementing Barriers1.  The article contains source code written in C that helped demystify the topic.

    Sense Reversal Barrier. Credit: (Nkindberg 2013)

     

    The article explains how each process must maintain its own local sense variable and compare that variable against a shared flag. That was the key, the fact that each process maintains its own local variable, separate from the shared flag. That means, each time a process enters a barrier, the local sense toggles, switching from 1 to 0 (or from 0 to 1), depending on the variable’s last value. If the local sense variable and the shared flag differ, then the process (or thread) must wait before proceeding beyond the current barrier.

    For example, let’s say process 0 initializes its local sense variable of 0. The process enters the barrier, flipping the local sense from 0 to 1. Once the process reaches the end of the critical section, the process compares the shared flag (also initialized to 0) with its local sense variable and since the two do not equal to one another, then that process waits until all other processes complete.

    References

    1. Nkindberg, Laine. 2013. “Implementing Barriers.” Retrieved September 17, 2020 (http://15418.courses.cs.cmu.edu/spring2013/article/43).
  • Crashing and burning during lunch & Daily Review – Day ending in 2020/09/16

    Crashing and burning during lunch & Daily Review – Day ending in 2020/09/16

    I had mentioned yesterday that I slept horribly, waking up early and starting day off at around 03:45 AM. That wake up time was brutal and as a result, I crashed and burned in the afternoon, leveraging those precious 30 minutes of my lunch to nap in my wife’s / daughter’s bedroom (a room with an actual mattress, unlike my office, where I’m sleeping on the floor on top of a tri-folding piece of foam).

    I slept so well during that afternoon nap that I didn’t hear the ear piercing alarm that I set on my TIMER YS-390 ; luckily I warned my wife before hand and had asked her to wake me up just in case I overslept. Thankfully she did.

    Yesterday, I had suspected that I woke up so early and slept so poorly due to the loud air conditioner shaking on and off throughout the night, but I’m 100% confident of what woke me up this morning at 03:00: my daughter belting out a loud scream (and then immediately fell back asleep).

    Yesterday

    I’m in great company while working from home. Here’s metric sleeping by my foot

    Writing

    • Published my daily review

    Best parts of my day

    • The afternoon 30 minute nap. Seriously. The nap makes me wonder how I would’ve taken that sort of break if I were not working remote and if I were working back in the office?

    Mental and Physical Health

    • Attended my weekly therapy session with good old Roy. As anticipated, we followed up on our tension filled conversation that occurred last week. What was comforting and brought me solace was that he opened up (just a little) and shared that he was similar to me in the sense that he often will take on additional work that just needs to be done, a person who sees a hole and fills it. That conversation made me think of the term transference: “a phenomenon in which an individual redirects emotions and feelings, often unconsciously, from one person to another.”

    Graduate School

    • Watched no lectures yesterday (as mentioned, I was like a zombie) and instead ran my programs (i.e. virtual CPU scheduler and memory coordinator) across the various use cases, collecting all the terminal output that I’ll need to include as part of the submission

    Work

    • Conducted an “on site” interview. I say onsite but because of COVID-19, all interviews are held over (Amazon) Chime.
    • Debugged an unexpected drop in free memory and realized that it pays off to be able to distinguish memory allocations happening on the stack versus memory allocations happening else where (like shared memory in the kernel).

    Family and Friends

    Miscellaneous

    • Got my second hair cut this year (damn you COVID-19) at nearby hair salon. I love the hair salon for a multiple reasons. First, the entire trip — from the moment I leave my house, to the moment I return back — takes 30 minutes, a huge time saver. Second, the stylist actually listens to what I want (you’d be surprised how many other stylists get offended when I tell them what I want and what I don’t want) and gives me a no non-sense hair cut. And third, the hair cut runs cheap: $20.00.  What a steal! I don’t mind paying more for hair cuts (I was previously paying triple that price at Steele Barber).

    Today

    Mainly will just try to survive the day and not collapse from over exhaustion. If I can somehow afford the time, I’d like to nap this afternoon for half an hour. Will gladly trade off 30 minutes of lunch if that means not being zapped of all my energy for the remainder of the day.

    Writing

    • Publish “Synchronization” notes (part 2)
    • Publish daily review (this one that I’m writing right here)

    Mental and Physical Health

    Graduate School

    Work

    • 1:1 meeting with my manager Tim
    • Follow up with fuzzing issue and determine whether or not the issue can be reproduced on other hosts

    Family

    • Respond to my brother-in-law, who I shared my article with and who wants to get into writing and asked me what tools he suggests
  • Synchronization notes (part 2/2) – Linked Based Queuing lock

    Synchronization notes (part 2/2) – Linked Based Queuing lock

    In part 1 of synchronization, I talked about the more naive spin locks and other naive approaches that offer only marginally better performance by adding delays or reading cached caches (i.e. avoiding bus traffic) and so on. Of all the locks discussed thus far, the array based queuing lock offers low latency, low contention and low waiting time. And best of all, the this lock offers fairness. However, there’s no free lunch and what we trade off performance for memory: each lock contains an array with N entries where N is the number of physical CPUs. That high number of CPUs may be wasteful (i.e. when not all processors contend for a lock).

    That’s where linked based queuing locks come in. Instead of upfront allocating a contiguous block of memory reserved for each CPU competing for lock, the linked based queuing lock will dynamically allocate a node on the heap and maintain a linked list. That way, we’ll only allocate the structure as requests trickle in. Of course, we’ll need to be extremely careful with maintaining the linked list structure and ensure that we are atomically updating the list using instructions such as fetch_and_store during the lock operation.

    Most importantly, we (as the OS designer that implement this lock) need to carefully handle the edge cases, especially while removing the “latest” node (i.e. setting it’s next to NULL) while another node gets allocated. To overcome this classic race condition, we need to atomically rely on a new (to me at least) atomic operation: compare_and_swap. This instruction will be called like this: compare_and_swap(latest_node, me, nil). If the latest node’s next pointer matches me, then set the next pointer to NIL. Otherwise, we know that there’s a new node in flight and will have to handle the edge case.

    Anyways, check out the last paragraph on this page if you want a nice matrix that compares all the locks discussed so far.

    Linked Based Queuing Lock

    Summary

    Similar approach to Anderson’s array based queueing lock, but using a linked list (authors are MCS: mellor-crummey J.M. Scott). Basically, each lock has a qnode associated with it, the structure containing a got_it and next. When I obtain the lock, I point the qnode towards me and can proceed into the critical section

    Link Based Queuing Lock (continued)

    Summary

    The lock operation (i.e. Lock(L, me)) requires a double atomic operation, and to achieve this, we’ll use a fetch_and_store operation, an instruction that retrieves the previous address and then stores mine

    Linked Based Queuing Lock

    Summary

    When calling unlock, code will remove current node from list and signal successor, achieving similar behavior as array based queue lock. But still an edge case remains: new request is forming while current thread is setting head to NIL

    Linked Based Queuing Lock (continued)

    Summary

    Learned about a new primitive atomic operation called compare and swap, a useful instruction for handling the edge of updating the dummy node’s next pointer when a new request is forming

    Linked Based Queuing Lock (continued)

    Summary

    Hell yea. Correctly anticipated the answer when professor asked: what will the thread spin on if compare_and_swap fails?

    Linked Based Queuing Lock (continued)

    Summary

    Take a look at the papers to clarify the synchronization algorithm on a multi-processor. Lots of primitive required like fetch and store, compare and swap. If these hardware primitives are not available, we’ll have to implement them

    Linked Based Queuing Lock (continued)

    Summary

    Pros and Cons with linked based approach. Space complexity is bound by the number of dynamic requests but downside is maintenance overhead of linked list. Also another downside potentially is if no underlying primitives for compare_and_swap etc.

    Algorithm grading Quiz

    Comparison between various spin locks

    Summary

    Amount of contention is low, then use spin w delay plus exponential backoff. If it’s highly contended, then use statically assigned.

  • Tired like a zombie & Daily Review – Day ending in 2020/09/15

    Tired like a zombie & Daily Review – Day ending in 2020/09/15

    Today is going to be rough. I slept horribly, waking up multiple times throughout the night. Ultimately, I rolled out of my tri-folding foam mattress (a temporary bed while my daughter and wife sleep on the mattress in a separate room as to not wake me up: that parent life) at 03:45 AM this morning. Perhaps the gods above are giving me what I deserve since I had complained yesterday that I had “slept in” and as a result didn’t get a chance to put in any meaningful work before work. So now they are punishing me. Touché. Touché.

    Yesterday

    Fell into a black hole of intense focus while hunting down a bug that was crashing my program (for project 1 of advanced operating systems).  Sometimes I make the right call and distance myself from a problem before falling into a viscous mental loop and sometimes (like this scenario) I make the right call and keep at a problem and ultimately solve it.

    Writing

    Best parts of my day

    A poop explosion. Elliott’s nugget rolling out of her diaper and landing on the bathroom floor. Making us two parents chuckle
    • Teaching Elliott how to shake her head and signal “no”. For the past few months, I’ve tried to teach her during our daily bathes but when I had tried to previously teach her, her body and head were not cooperating with her. When she had tried say no, she was unable to interdependently control her head movement, her entire body would turn left and right along with her. But yesterday, she got it and now, she loves saying “no” even though she really means yes. She’s so adorable.
    • Jess yelling out for me to rush over to the bathroom to help her … pick up Elliott’s poop that rolled out of her diaper, two nuggets falling out, one landing on the tile floor while the other smashing on the floor mat
    • Catching up over the phone with my friend Brian Frankel. He’s launched a new start up called Cocoon, his company aiming to solve the problem of slimy mushrooms and slimy strawberries in the refrigerator. I had bought one of his new inventions mainly to support his vision (it’s always nice to support friends) but also I’m a huge fan of refrigerator organization and cleanliness.  Unfortunately, the box arrived broken (looks like something heavy in the delivery truck landed on the box, crushing it into pieces)

    Mental and Physical Health

    • At the top of the hour (not every hour, unfortunately) I hit the imaginary pause button, pulling my hands off the keyboard and stepping back from my standing desk to stretch my hamstrings and strengthen my hips with Asians squats

    Graduate School

    • Watched no lectures yesterday, all the time (about 2 hours) dumped into polishing up the virtual CPU scheduler, adding a new “convergence” feature that skips the section of code that (re)pins the virtual CPUs to physical CPUs, skipping when the standard deviation falls 5.0% (an arbitrary number that I chose)
    • Wrestled with my program crashing. The crashes’s backtrace was unhelpful since the location of the code that had nothing to do with the code that I had just added.

    Work

    • Attended a meeting lead by my manager, the meeting reviewing the results of the “Tech Survey”. The survey is released by the company every year, asking engineers to answer candidly to questions such as “Is your work sustainable?” or “Is your laptop’s hardware sufficient for your work?”. Basically, it allows the company to keep a pulse of how the developer experience is and is good starting point for igniting necessary changes.
    • Stepped through code written by a principle engineer, C code that promised to bound a binary search by trading off 2 bytes to serve as an index.

    Family and Friends

    • Fed Elliott during my lunch. Was extremely tiring (but at the same time, enjoyable) chasing her around the kitchen floor, requiring me constantly squat and constantly crawl. She’s mobile now, working her way up to taking one to two steps.
    • Bathed Elliott last night and taught her how to touch her shoulders, a body part she’s been completely unaware of. Since she loves playing with my wedding right, I let her play with during our night time routine and last night I would take the ring, and place the ring on her infant sized shoulder, pointing to it and guiding her opposite hand to reach out to grab the ring.
    • Caught up with one of our friends over Facetime. Always nice to see a familiar face during COVID-19, a very isolating experience that all of society will look back on in a few years, all of us wondering if it just all a bad dream because that’s what it feels like

    Miscellaneous

    • Got my second hair cut this year (damn you COVID-19) at nearby hair salon. I love the hair salon for a multiple reasons. First, the entire trip — from the moment I leave my house, to the moment I return back — takes 30 minutes, a huge time saver. Second, the stylist actually listens to what I want (you’d be surprised how many other stylists get offended when I tell them what I want and what I don’t want) and gives me a no non-sense hair cut. And third, the hair cut runs cheap: $20.00.  What a steal! I don’t mind paying more for hair cuts (I was previously paying triple that price at Steele Barber).

    Today

    Mainly will just try to survive the day and not collapse from over exhaustion. If I can somehow afford the time, I’d like to nap this afternoon for half an hour. Will gladly trade off 30 minutes of lunch if that means not being zapped of all my energy for the remainder of the day.

    Writing

    • Publish “Synchronization” notes (part 2)
    • Publish daily review (this one that I’m writing right here)

    Mental and Physical Health

    Graduate School

    • Write up the README for the two parts of my project
    • Change the directory structure of project 1 so that the submission validator passes
    • Submit the project to Canvas
    • Watch 30 minutes worth of lectures (if my tired brain can handle it today)

    Work

    • Interview a candidate “on-site” later this afternoon
    • Continue troubleshooting unreproducible fuzzing failure (will try to tweak our fuzzer for a potential out of memory issue)

    Family

    • Pack a couple more boxes with Jess. Only a couple more weeks and we move our family unit into a new home in Renton.
  • Synchronization (Notes) – Part 1

    Synchronization (Notes) – Part 1

    I broke down the synchronization topic into two parts and this will cover material up to and including the array based queuing lock. I’ll follow up with part 2 tomorrow, which will include the linked list based queuing lock.

    There a couple different types of synchronization primitives: mutual exclusion and barrier synchronization. Mutual exclusion ensures that only one thread (or process) at a time can write to a shared data structure, whereas barrier synchronization ensures that all threads (or processes) reach a certain point in the code before continuing. The rest of this summary focuses on the different ways we as operating system designers can implement mutual exclusion and offer it as a primitive to system programmers.

    To implement a mutual exclusion lock, three instructions need to be bundled together: reading, checking, writing. These three operations must happen atomically, all together and at once. To this end, we need to leverage the underlying hardware’s instructions such as fetch_and_set, fetch_and_increment, or fetch_and_phi.

    Whatever way we implement the lock, we must evaluate our implementation with three performance characteristics:

    • Latency (how long to acquire the lock)
    • Waiting time (how long must a single thread must wait before acquiring the lock)
    • Contention (how long long to acquire lock when multiple thread compete)

    In this section, I’ll list each of the implementations ordered in most basic to most advanced

    • Naive Spin Lock – a simple while loop that calls test_and_set atomic operation. Downside? Lots of bus traffic for cache invalidation or cache updating. What can we do instead?
    • Caching Spin Lock – Almost identical to the previous  implementation but instead of spinning on a while loop, we spin on a read, followed by a check for test_and_set. This reduces noisy bus traffic of test_and_test (which again, bypasses cache and writes to memory, every time)
    • Spinlock with delay – Adds a delay to each test_and_test. This reduces spurious checks and ensures that not all threads perform test_and_set at the same time
    • Ticket Lock – Each new request that arrives obtains a ticket. This methodology is analogous to a how a deli hands out tickets for their customers; it is fair but still signals all other threads to wake up
    • Array based queuing lock – An array that is N in size (where N is the number of processors). Each entry in the array can be in one of two states: has lock and must-wait. Fair and efficient. But trades off space: each lock gets its own array with N entries where N is number of processors. Can be expensive for machines with thousands of processors and potentially wasteful if not all of them contend for the lock

    Lesson Summary

    Summary

    There are two types of locks: exclusive and shared lock. Exclusive lock guarantees that one and only one thread can access/write data to a location. Shared allows multiple readers, but guaranteeing no other thread is writing at that time.

    Synchronization Primitives

    Summary

    Another type of synchronization primitive (instead of mutual exclusion) is a barrier. A barrier basically ensures that all threads reach a certain execution point and only then can the threads advance. Similar to real life when you get to a restaurant and you cannot be seated until all your party arrives. That’s an example of a barrier.

    Programmer’s Intent Quiz

    Summary

    We can write that enforces barrier with a simple while loop (but its inefficient) using atomic reads and writes

    Programmer’s Intent Explanation

    Summary

    Programmer’s Intent Quiz

    Because of the atomic reads and writers offered by the instruction set, we can easily use a flag to coordinator between processes. But are these instructions sufficient for creating a mutual exclusion lock primitive?

    Atomic Operations

    Summary

    The instruction set ensures that each read and write operation is atomic. But if we need to implement a lock, there are three instructions (read, check, write) that need to be bundled together into a single instruction by the hardware. This grouping can be done in one of three ways: test_and_set or (generically) fetch_and_increment or fetch_and_phi

    Scalability Issues with Synchronization

    Summary

    What are the three types of performance issues with implementing a lock? Which of the two are under the control of the OS designer? The three scalability issues are as follows: latency (time to acquire lock) and waiting time (how long does a thread wait until it acquires the lock) and contention (how long does it take, when all threads are competing to acquire lock, for lock to be given to a single thread). The first and third (i.e. latency and contention) are under the control of the OS designer. The second is not: waiting time is largely driven by the application itself because the critical section might be very long.

    Naive Spinlock (Spin on T+S)

    Summary

    Why is the SPIN lock is considered naive? How does it work? Although the professor did not explicitly call out why the SPIN lock is naive, I can guess why (and my guess will be confirmed in the next slide, during the quiz). Basically, each thread is eating up CPU cycles by performing the test and test. Why not just signal them instead? That would be more efficient. But, as usual, nothing is free so there must be a trade off in terms of complexity or cost. But returning back to how the naive scheduler works, it’s basically a while loop with the test_and_test (semantically guaranteeing on a single thread will receive the lock).

    Problems with Naive Spin Lock

    Problems with naive spinlock

    Summary

    Spin lock is naive for three reasons: too much contention (when lock is released, all threads, maybe thousands, will jump at the opportunity) and does not exploit cache (test and set by nature must bypass cache and go straight to memory) and disrupts useful work (only one process can make forward progress)

    Caching Spinlock

    Caching Spin lock (spin on read)

    Summary

    To take advantage of the cache, processors will first call while(L == locked), taking advantage of the cache. Once test_and_set occurs, all processors will then proceed with test_and_set. Can we do better than this?

    Spinlocks with Delay

    Summary

    Any sort of delay will improve performance of a spin lock when compared to the naive solution

    Ticket Lock

    Summary

    A fair algorithm that uses a deli or restaurant analogy: people who come before will get lock before people who come after. This is great for fairness, but performance still lacking in the sense that when a thread releases its lock, an update will be sent across the entire bus. I wonder: can this even be avoided or its the reality of the situation?

    Spinlock Summary

    Array-based queuing lock

    Summary

    A couple differentiations toptions 1) Read and Test and Test (no fairness) 2) Test and Set with Delay (no fairness) 3) Ticket Lock (fair but noisy). How can we do better? How can we only signal one and only thread to wake up and attempt to gain access to the lock? Apparently, I’ll learn this shortly, with queueing locks. This lesson reminds me of a conversation I had with my friend Martin around avoiding a busy while() loop for lock contention, a conversation we had maybe about a year or so ago (maybe even two years ago)

    Array Based Queuing Lock

    Summary

    Create an array that is N in size (where N is the number of processors). Each entry in the array can be in one of two states: has lock and must-wait. Only one entry can ever be in has-lock state. I’m proud that I can understand the circular queue (just by looking at the array) with my understanding of the mod operator. But even so, I’m having a hard time understanding how we’ll use this data structure in real life.

    Array Based Queuing Lock (continued)

    Summary

    Lock acquisition looks like: fetch and increment (queue last) followed by a while(flags[mypalce mod N] == must wait). Still not sure how this avoids generating cache bus traffic as described earlier…

    Array-based queue lock

    Summary

    Array-basd queuing lock (mod)

    Using mod, we basically update the “next” entry in the array and set them to “has_lock”

  • Honoring my body’s internal alarm clock & Daily Review – Day ending in 2020/09/14

    Honoring my body’s internal alarm clock & Daily Review – Day ending in 2020/09/14

    This morning my body woke me up later than usual. After a few blinks, I squeezed the corner of my Casio G-Shock watch, the green background lighting up and shining the time: 05:55 AM. Ugh. About an hour later than I wanted to wake up.

    On one hand, I’m bummed because I won’t be able to squeeze in as much uninterrupted time before work but on the other hand, my body and brain probably needed the extra sleep. Otherwise, why “sleep in” ?  I try to honor and listen to my body’s signals, another reason why over the past 5 years I’ve stopped setting an alarm clock and instead permitted my body to wake up naturally, whenever my body is ready.

    Oh well. Let’s get cracking.

    Yesterday

    Writing

    Best parts of my day

    • My co-worker unintentionally making me chuckle. During my team’s daily stand up meeting yesterday, I had asked my co-workers how they were coping with all the smoke blanketing the Seattle skies. And my co-worker’s response caught me by surprise. She said that back home in India, there’s always a thick cloud of smoke always resting above their heads and really, the wildfire smoke in Seattle reminds her of home.

    Mental and Physical Health

    • Yesterday I resumed my ritual of switching back and forth between sitting and standing (thank you Jarvis standing desk). At the top of every hour, I try to hit the imaginary pause button and stretch my hamstrings by reaching for the ground with my finger tips. Not much exercise but every little bit counts.

    Graduate School

    • Started watching Barrier synchronization lectures. It’s so amazing how deep I am diving into computer science. I really do enjoy learning how to an OS system designer (maybe me somebody) implements primitive structures such as mutual exclusion and barrier synchronization.
    • Started updating virtual CPU scheduler code to support the notion of “convergence”. The idea is that if the underlying physical CPU utilization ever so slightly deviate (that exact percentage is yet to be determined) then the scheduler should leave them as is and not try to re-balance the work load.

    Work

    • Scheduled an ad-hoc meeting with a principle engineer that took place that day, the two of us troubleshooting a fuzzing failure, the worst kind of failures: failures that cannot be reproduced.
    • Read a design proposal written up by a different principle engineer who evaluated different types of data structures. Realized that the data structure that I had envisioned us using will not meet performance requirements for IPv6 traffic

    Family

    • Took dogs on a very short walk of about 15 minutes at the local park. I skipped walking them over the weekend because of the thick smoke but the two puppies were getting a bit restless so I compromised, sacrificing a (hopefully) very tiny bit of our long term health in order to get them the physical stimulation they needed for the rest of the day

    Today

    Quote of the day

    Have you ever felt out of place in your place?

    I pulled that quote from a rap line from the song Breathing (Part 2) by Hamzaa featuring Wretch 32 & Ghetts.

    Most of my life, I always felt “out of place”. One instance of being out of place was during my teenage years and being one of the only (or perhaps the only) Vietnamese kids attending Calabasas high school, a high school where mostly everyone is white and where everyone thinks they are white.

     

    Writing

    • Publish “Synchronization” notes
    • Publish daily review (this one that I’m writing right here)

    Mental and Physical Health

    • Won’t be running in this smoke so instead, let’s spend 2 minutes (really, that’s all) throughout the day to get the heart pumping. Maybe some jumping jacks or some push ups. And of course, stretching the hamstrings and hip rotators while working behind by standing desk

    Graduate School

    • Fix memory leak with my recent additions to the CPU scheduler
    • Start writing up documentation that will accompany the submission for project 1
    • Watch 15-20 minutes worth of lectures from the barrier synchronization series

    Work

    • Meetings and meetings and meetings (not a really fun day) of sprint planning, Asians@ planning and development meetings, interview debrief

    Miscellaneous

    • Hair cut at 4:00 PM. This will be the second hair cut this year, the last one taking place in June. Obviously I’m trying to minimize unnecessary interactions with other people but damn, I look like a shaggy dog with my heavy and coarse hair weighing me down.
  • Shared Memory Machine Model (notes)

    Shared Memory Machine Model (notes)

    You need to take away the following two themes for shared memory machine model:

    • Difference and relationship between cache coherence (dealt with in hardware) and cache consistency (handled by the operating system)
    • The different memory machine models (e.g. dance hall, symmetric multiprocessing, and distributed shared memory architecture)

    Cache coherence is the promise delivered by the underlying hardware architecture. The hardware guarantees employs one of two techniques: write invalidate or write update. In the latter, when a memory address gets updated by one of the cores, the system will send a message on the bus to invalidate the cache entry stored in all the other private caches. In the former, the system will instead update all the private caches with the correct data. Regardless, the mechanism in which cache is maintained is an implementation detail that’s not privy to the operating system.

    Although the OS has not insight into how the hardware delivers cache coherence, the OS does rely on the cache coherence to build cache consistency, the hardware and software working in harmony.

    Shared Memory Machine Model

    Summary

    Shared memory model - Dance Hall Architecture
    Shared memory model – Dance Hall Architecture
    Symmetric multiprocessing – each processor’s time to access to the memory is the same
    Shared Memory Machine Model – each CPU has its own private cache and its own memory, although they can access each others addresses

    There are three memory architecture: dance hall (each CPU has its own memory), SMP (from perspective of each CPU, access to memory is the same as other CPU), and Distributed shared memory architecture (some cache and some memory is faster to access for a given CPU). Lecture doesn’t go much deeper than this but I’m super curious as to the distributed architecture.

    Shared Memory and Caches

    Summary

    Because each CPU has its own private cache, we may run into a problem known as cache coherence. This situation can occur if the caches tied to each CPU contain different values for the same memory address. To resolve this, we need a memory consistency model.

    Quiz: Processes

    Summary

    Quiz tests us by offering two processes, each using shared memory, each updating different variables. Then the quiz asks us what are the possible values for the variables. Apart from the last one, all are possible, the last one would break the intuition of a memory consistency model (discussed next)

    Memory consistency model

    Summary

    Introduces the sequential consistency memory model, a model that exhibits two characteristics: program order and arbitrary interleaving. This Isi analogous to someone shuffling a deck of cards.

    Sequential Consistency and cache coherence (Quiz)

    Summary

    Cache Coherence Quiz – Sequential Consistency

    Which of the following are possible values for the following instructions

    Hardware Cache Coherence

    Summary

    Hardware Cache Coherence

    There are two strategies for maintaining cache coherence. Write invalidate and write update. In the former, the system bus will broadcast an invalidate message if one of the other cache’s modifies an address in its private cache. In the latter, the system must will send an update message to each of the cache’s, each cache updating it’s private cache with the correct data. Obviously, the lecture oversimplifies the intricacies of maintaining a coherent cache (and if you want to learn more, check out the high performance computing architecture lectures — or maybe future modules in this course will cover this in more detail)

    Scalability

    Summary

    Scalability – expectation with more processors. Pros: Exploit Parallelism. Cons : Increased overhead

    Adding processors increase parallelism and improves performance. However, performance will decrease due to additional overhead of maintaining the bus (another example of making trade offs and how nothing is free).

  • Losing 2 hours searching for a website bookmark & Weekly Review: week ending in 2020/09/06

    Losing 2 hours searching for a website bookmark & Weekly Review: week ending in 2020/09/06

    My weekly review that normally takes place first thing in the morning on Sundays was completely derailed this time around, all because I could find the URL to a website that I had sworn I bookmarked for my wife’s birthday present. I ended up coughing up two hours of searching: searching directly on Reddit’s website (where I was 100% confident I stumbled upon the post), searching through 6 months of my Firefox browser history, and searching through 20 or so pages of Google Results.

    I ultimately found the page after some bizarre combination of keywords using Google, the result popping up on the 6th page of Google (I would share the URL with you but I want to keep it tucked away for the next two week until my wife’s birthday or at least until her present arrives and I gift it to her).

    How about you — when you stumble on something interesting on the internet, what steps do you take to make sure that you can successful retrieve the page again in the future? Do you simply bookmark the page using your browser’s built in bookmark feature? Do you tag that the entry with some unique or common label? Or do you store it away in some third party bookmarking service like pinboard? Or maybe you archive the entire contents of the page offline to your computer using DevonThink? Or something else?

    So many options.

    Ultimately, I don’t think the tool itself really matters: I just need to save the URL in a consistent fashion.

    Writing

    Family and Friends

    [fvplayer id=”3″]

    • Got around to finally calling my Grandma and video chatting with her so that she could see Elliott, who has grown exponentially over the last couple months
    • Signed off on tons of paper work for the new house and pulled the trigger on selling a butt load of my Amazon stocks that will cover the down payment and the escrow costs that we’re going to get hit with on September 30th (my wife’s birthday)
    • Packed about 5 more boxes worth of our belongings (e.g. books, clothing, kitchen goods)

    Music

    • Recorded about 5 different melodies and harmonies using the voice memo app on my iPhone, moving the recordings off my phone and sending them to my MacBook using AirDrop)
    • Attended my (zoom) bi-weekly guitar lesson with Jared, the lessons focusing on three areas: song writing (creative aspect), jamming (connecting with other musicians, mainly my little brother), developing a deeper understanding of the guitar (mastery).

    Mental and Physical Health

    Graduate School

    • I’d estimate I put in roughly 15 hours into graduate school in order to read research papers, write code for project 1 (i.e. writing a virtual CPU scheduler and memory coordinator) and of course watch the Udacity lectures.
    • For the development project, majority of time gets eaten up trying to grok the API documentation to libvrt. In second place would be debugging crashes in my code (which is why I always riddle my code with assert statements, a practice I picked up working at Amazon).
    • I really enjoyed watching and taking notes for this past week’s lectures. I’m taking the class at the perfect time in my career and in my graduate studies, after taking graduate operating systems and after taking high performance computing architecture. Both these courses prepared me well and provided me the foundation necessary to more meaningfully engage with the lectures. What I mean by this is that instead of passively watching and scribbling down notes, I tend to frequently click on the video to pause the stream and try to anticipate what the professor is about to say or try to answer the questions he raises. This active engagement helps the material stick better.

    Organization

    Brother label maker
    • Tossed out the cheap $25.00 label marker from Target and instead invested in a high quality Brother PTD600V label maker. Well worth the investment.
    • Culled my e-mail inbox, dropping the unread count from hundreds down to zero (will need to perform same activity this week)

    Work

    • Wrapped up my design for a new feature long, getting sign off from the technical leadership team at work. Only open action item will be to benchmark the underlying Intel DPDK’s library against IPv6 look ups (which I think I already have data for)