Author: mattchung

  • Lamport’s Clocks (notes)

    Lamport’s Clocks (notes)

    [ez-toc]

    Introduction

    Summary

    Now that we talked about happened before events, we can talk about lamport clocks

    Lamport’s Logical Clock

    Summary

    A logical clock that each process has and that clock monotonically increases as events unfold. For example, if event A happens before event B, then the event A’s clock (or counter value) must be less than that of B’s clock (or counter). The same applies between “happened before” events. To achieve this, a process will increase its own clock monotonically by setting the counter to the maximum of “receipt of the other process’s clock or its own local counter”. But what happens with concurrent events? Will hopefully learn soon

    Events Quiz

    Summary

    C(a) < C(b) means one of two things. A happened before B in the same process. Or B is the recipient and chose the max between its local clock.

    Logical Clock Conditions

    Lamport’s logical clock conditions

    Summary

    Key Words: partial order, converse of condition

    Lamport clocks give us a partial order of all the events in the distributed system. Key idea: Converse of condition 1 is not true. That is, just because timestamp of x is less than timestamp of y does not necessarily mean that x happened before y. Also, is partial order sufficient for us distributed system designers?

    Need For A Total Order

    Need for total order

    Summary

    Imagine a situation in which there’s a shared resource and individual people want access to that resource. They can send each other a text message (or any message) with a timestamp. The sender with the earliest timestamp wins. But what happens if two (or more) timestamps are the same. Then each individual person (or node) must make a local decision (in this case, oldest age of person wins)

    Lamport’s Total Order

    Lamport’s total order

    Summary

    Use timestamps to order events and break the time with some arbitrary function (e.g. oldest age wins).

    Total Order Quiz

    Total order quiz

    Summary

    Identify the concurrency events and then figure out how to order those concurrent events to break the ties

    Distributed Mutual Exclusion Lock Algorithm

    Distributed mutual exclusion lock

    Summary

    Essentially, this algorithm basically requires each process to send their lock requests via messages and these messages need to be confirmed by the other processes. Each request contains a timestamp and each host will make its own local decision, queueing (and sorting) the lock request in their local process and breaking ties by preferring lowest process ID. What’s interesting to me is that this algorithm makes a ton of assumptions like no broken network links or split brain or unidirectional communication: so many failure modes that haven’t been taken into consideration. Wondering if this will be discussed in next video lectures

    Distributed Mutual Exclusion Lock Algorithm (continued)

    Distributed mutual exclusion lock continued

    Summary

    Lots of assumptions of distributed mutual exclusion lock algorithm including 1) All messages are received in the order they are sent and 2) no messages are lost (this is definitely not robust enough for me at least)

    Messages Quiz

    Messages Quiz

    Summary

    With Lamport’s distributed mutual exclusion lock, we need to send 3(N-1) messages. First round that includes timestamp, second for acknowledge, third for release of lock.

    Message Complexity

    Message Complexity

    Summary

    A process can defer its acknowledgement by sending it during the unlock, reducing the message complexity from 3(N-1) to 2(N-1).

    Real World Scenario

    Real world scenario

    Summary

    Logical clocks may not good be enough in real world scenarios. What happens if a system clock (like the banks clock) drifts? What should we be doing instead?

    Lamport’s Physical Clock

    Lamport’s physical clock

    Summary

    Key Words: Individual Clockdrift, Mutual Clockdrift

    Two conditions must be met in order to achieve Lamport’s physical clock. First is that the individual clock drift for any given node must be relatively small. Second, the mutual clock drift (i.e. between two nodes) must be negligible as well. We shall see soon but the clock drift must be negligible when compared to intercommunication process time

    IPC time and clock drift

    IPC and Clock Drift

    Summary

    Clock drift must be negligible when compared to the IPC time. Collectively, this is captured in the equation M >= Epsilon ( 1 – k) where E is the mutual clock drift and where K is individual clock drift.

    Real World Example (continued)

    Real world example

    Summary

    Mutual clock drift must be lower than IPC time. Or put differently, the IPC time must be greater than the clock drift. So the key take away is this: the IPC time (M) must be greater than the mutual clock drift (i.e. E), where k is the individual clock drift. So we want a individual clock drift to be low and to eliminate anomalies, we need IPC time to be greater than the mutual clock drift.

    Conclusion

    Summary

    We can use Lamport’s clocks to stipulate conditions to ensure deterministic behavior and avoid anomalous behaviors

  • Daily Review – Day Ending in 2020/10/12

    Daily Review – Day Ending in 2020/10/12

    Graduate School

    • Wrote up my analysis on the various barrier synchronization algorithms that I implemented. I had to describe the various algorithms (e.g. dissemination barrier, tournament barrier, centralized sense reversal barrier) for the documentation that will accompany our code and experiments as part of Project 2 for advanced operating systems.
    • Finished watching lectures on Active Networks. Didn’t find the material too interesting however I realized there are certainly concepts (like overlay networks) applied to our (AWS) networks.

    What I learned

    • Quote from Donald KnuthBeware of bugs in the above code. I’ve only proved it correct — not tried it.

    Work

    • Represented my team (Blackfoot Edge Applications) at weekly org wide operations meeting. Every week, our senior manager runs a organization (Blackfoot) wide operations meeting where each team reviews their their high severity events and dashboards from the previous week. During the meeting, I shared three particular interesting events that took place.
  • What I learned from writing synchronization barriers

    What I learned from writing synchronization barriers

    Before starting project 2 (for my advanced operating systems course), I took a snapshot of my understanding of synchronization barriers. In retrospect, I’m glad I took 10 minutes out of my day to jot down what I did (and did not) know because now, I get a clearer pictur eof what I learned. Overall, I feel the project was worthwhile and I gained not only some theoretical knowledge of computer science but I was also able to flex my C development skills, writing about 500 lines of code.

    Discovered a subtle race condition with lecture’s pseudo code

    Just by looking at the diagram below, it’s not obvious that there’s a subtle race condition hidden. I only was able to identify it after whipping up some code (below) and analyzing the concurrent flows. I elaborate a little more on the race condition — which results in a deadlock — in this blog post.

    Centralized Barrier

     

    [code lang=”C”]

    /*
    * Race condition possible here. Say 2 threads enter, thread A and
    * thread B. Thread A scheduled first and is about to enter the while
    * (count &amp;gt; 0) loop. But just before then, thread B enters (count == 0)
    * and sets count = 2. At which point, we have a deadlock, thread A
    * cannot break free out of the barrier
    *
    */

    if (count == 0) {
    count = NUM_THREADS;
    } else {
    while (count &amp;gt; 0) {
    printf("Spinning …. count = %d\n", count);
    }
    while (count != NUM_THREADS){
    printf("Spinning on count\n");
    }
    }

    [/code]

    Data Structures and Algorithms

    How to represent a tree based algorithm using multi-dimensional arrays in C

    For both the dissemination and tournament barrier, I had to build multi-dimensional arrays in C. I initially had a difficult time envisioning the data structure described in the research papers, asking myself questions such as “what do I use to index into the first index?”. Initially, my intuition thought that for the tournament barrier, I’d index into the first array using the round ID  but in fact you index into the array using the rank (or thread id) and that array stores the role for each round.

    [code lang=”C”]
    typedef struct {
    bool myflags[PARITY_BIT][MAX_ROUNDS];
    bool *partnerflags[PARITY_BIT][MAX_ROUNDS];
    } flags_t;

    void flags_init(flags_t flags[MAX_ROUNDS])
    {
    int i,j,k;

    for (i = 0; i < MAX_NUM_THREADS; i++) {
    for (j = 0; j < PARITY_BIT; j++) {
    for (k = 0; k < MAX_NUM_THREADS; k++) {
    flags[i].myflags[j][k] = false;
    }
    }
    }
    }
    [/code]

    OpenMP and OpenMPI

    Prior to starting I never heard of neither OpenMP nor OpenMPI. Overall, they are two impressive pieces of software that makes multi-threading (and message passing) way easier, much better than dealing with the Linux pthreads library.

    Summary

    Overall, the project was rewarding and my understanding of synchronization barriers (and the various flavors) were strengthen by hands on development. And if I ever need to write concurrent software for a professional project, I’ll definitely consider using OpenMP and OpenMPI instead of low level libraries like PThread.

  • Daily Review – Day ending in 2020/10/11

    Daily Review – Day ending in 2020/10/11

    I’m thrilled to be “off call” in about 4.5 hours, no longer tied to my pager and no longer anxious from possibility of waking up to the sound of nasty alarm. Really, the anxiety revolves around the randomness and the unknown of being paged.  What’s also variable is the length of these engagements: sometimes the troubleshooting takes 5 minutes and sometimes 5 hours. You just never know.

    The point it this: I’m happy to return to a normal work week.

    Best parts of my day

    • Laying in bed next to my wife at night. For the past four or five months I’ve been sleeping on the floor on a foam tri fold out mattress laid out on the uncomfortable carpet floor. And finally, now that we are finally moved into our new home, I’m sleeping on a real bed and last night my wife and I laid next to one another. Sure, it was only about 5-10 minutes but hey: it’s the little things right?

    Graduate School

    • Wrote up a paper summary on “Building Reliable High Performance Communication Systems from components
    • Drew figures of barrier synchronization on my iPad
    • Wrote up Project 2 (barrier synchronization) work log into Google Docs to share with my project partner
    My doodle of dissemination barrier
    My doodle of dissemination barrier

    Thoughts

    • Amazon Web Services draws inspiration from academia. For example, the techniques and principles used to build overlay networks within EC2 Network resemble the principles from the paper Active Networks (although there’s probably even earlier papers to draw inspiration from)
  • Weekly Review – Week ending in 2020/10/11

    Weekly Review – Week ending in 2020/10/11

    I’m shattered. This past week really broke me, the numerous 3:30 AM wake ups and the long operational issues running until 09:30 PM (past the time I’d like to be asleep). To recover from this taxing work week, I’m taking next Thursday and Friday off.

    Despite the rough week, I’m relieved that my wife and I are totally moved into our new home — not fully unboxed — but at least I’m waking up to a warm home.

    Work

    • Worked consumed my entire life this week
    • Being tied to the laptop and pager impacts not just me but my family as well
    • Sporadic wake ups and late nights off the entire schedule, breaking many of my rituals
    • Because of all this I’m taking 2 days off next week to recover and catch up on lost time eaten by the heavy work week
    • Over 300+ signed up for my event at Amazon
      • Hosting a panel discussion with (4) senior engineers at Amazon on career growth and promotions
    • This particular week was more difficult than others
      • Waking up at 03:30 AM multiple nights in a row
      • Operational issues lasting until 10:00 pm

    Family

    • First week living in the new house in Renton
    • Having a child underscores the fact how fleet time is
    • Family has changed my entire world, flipping it upside down
    • I don’t notice the changes every day but sometimes I’ll pause and take a look at her and she’s not only radically physically changing but developmentally as well
      • She’s taking her first steps
      • She’s couch surfing
      • She’s uttering her first words (surprisingly its “ball”)
    • Moving to a bigger home in Renton in retrospect has been the best thing that has happened
      • Neighbor was mowing their front lawn and offered to mow ours at the same time (took them up on that offer)

    Marketing

    • Chipping away at “This is marketing” book while in bed at night

     

  • Don’t break the (writing) chain … has been broken

    Don’t break the (writing) chain … has been broken

    This week, my cumulative “write every day” streak has been broken (almost 2 months of consistent writing every day), thanks to one of the roughest weeks at work. I normally start every day off with some light blogging — even if its for 5 or 10 minutes — but almost every day this week I was prematurely woken up due to my pager alarming me out of bed. So honestly, I couldn’t be more happier that it’s Friday (TGIF, for real) even though I have 2 more days (over the weekend) of being on call; I haven’t felt this physically and mentally and emotionally exhausted in a long time. Every time I get paged out of bed I’m forced to get my mental gears in motion and it’s very difficult switch off, making it nearly impossible to go back to sleep for a nap.

    So the days have been … very long.

    Oh well.

    So now, it’s time to reset the “cumulative days” of writing counter

  • Super long day & Having good co-workers

    Super long day & Having good co-workers

    In yesterday’s post, I had mentioned that I was paged out of bed at around 03:30 AM because of an operational issue. And for the rest of the day, my mind was fried and I practically looked like a zombie the rest of the day, my word constantly slurred. On top of all that, a different issue cropped up in the evening, the event lasting several hours and robbing me from having dinner with my family. Ugh.

    Silver lining: my colleague Paul, a god damn saint, overrode the night time shift and he took my on call over night, allowing me to get some much needed rest. The next morning, I sent an e-mail over to my manager, praising Paul and making sure that his good deed(s) does not go unnoticed.

  • Being paged out of bed at 3 AM …

    Being paged out of bed at 3 AM …

    Sadly I didn’t get to start the day off with writing, my morning routine, since my phone paged me out of bed at 3 AM due to an operational issue from work that lasted about about three hours. Because of this, I know that I’ll feel “off” the rest of the day given that I’ve been awake for over 5 hours and the time is barely 9 AM. Oh well. Time to heat up another Chai Latte to keep me awake for the day …

    Sometimes I wonder if this is all worth it …

  • What does “invariant partial ordering” mean in Leslie Lamport’s “Time, Clocks, and the Ordering of Events in a Distributed System”

    What does “invariant partial ordering” mean in Leslie Lamport’s “Time, Clocks, and the Ordering of Events in a Distributed System”

    In the conclusion of Time, Clocks, and the Ordering of Events in a Distributed System, Leslie Lamport states that: the concept of “happening before” defines an invariant partial ordering of the events in a distributed multiprocess system.

    According to a stackoverflow post, Jacob Baskin states that an invariant is a property of the program state that is always true. Tieing that together with the original question that I had asked in the previous paragraph, I think what Leslie is trying to say is that because of the happening order event, we know that [in a distributed system] events will be always be partially ordered — not totally ordered.

  • Farmhouse style doors – sliding door ideas for my home office

    Below are a couple photos of farm house doors that I think have good taste. I’m thinking of adding one of these type of doors to my new home office and wanted to share a couple of styles that I thought were both warm and aesthetically pleasing, two qualities I’m aiming to bring out of this new set up of mine.

    Grey farm house door. Credit: jettsetfarmhouse.com

     

    Beautiful brown grey farmhouse door. Credit – https://www.instagram.com/sheltercustombuiltliving/