Category: Distributed Computing

  • Distributed Computing @ OMSCS over – what a ride!

    Distributed Computing @ OMSCS over – what a ride!

    Last semester, I decided to enroll in the brand spanking new Georgia Tech’s Distributed Computing course offered for the first time (as part of OMSCS) this past Spring 2021. What a ride! Learned a ton, including Lamport’s Logical Clocks, the FLP theorem, and the notorious PAXOS for consensus. Hats off to Professor Ada and the wonderful teacher assistants for delivering a rigorous and rewarding course. The lectures were top notch and the distributed systems labs (originally created at University of Washington by Ellis Michael) were challenging to say the least.

    Only 2 more semesters until graduating from Georgia Tech with my M.S. in Computer Science!

     

  • Distributed Computing CS7210 Distributed Computing – A course review

    Distributed Computing CS7210 Distributed Computing – A course review

    Distributed Computing was offered in the OMSCS program for the first time this past semester (i.e. Spring 2021) and when the course opened up for registration, a storm of newly admitted and seasoned students signed themselves up — me included. I was fully aware that I was walking into unknown territory, a bleeding edge course, and expected a lot of rough edges, of which there were many. That being said, the course is great and with some some tweaks around pacing, has the potential to be the be one of the best courses offered for students, especially those specializing in computing systems.

    Overview

    The course quality is top-notch. The lectures are intellectually challenging, the assigned readers are seminal pieces of work, and the projects really drill the theoretical concepts. Overall, depending on your programming experience, expect putting in at least 20+ hours per week (with some students reporting anywhere between 30-50 hours).

    Recommendation: If you are a seasoned engineer (with at least a couple years of programming experience under your belt) and someone who can handle ambiguity with little hand holding, then I highly recommend taking the course. But if you are just starting out in your computer science journey, then I would hold off for at least a couple semesters; take the recommended pre-requisites (i.e. graduation introduction to operating systems, advanced operating systems, computer networks) and wait until the course’s rough edges are smoothed out. As another student on omscentral pointed out, this class is “for experienced engineers, not students.”

    What to expect from the course

    Pros

    • Lectures are easy to watch and are packed in digestible ~5 minute chunks
    • Assigned readings (from first half of semester) are seminal pieces of work by famous computer scientists like Leslie Lamport and Eric Brewer
    • Skills and knowledge acquired directly apply to my career as a software engineer and computer scientist
    • Instructors and teacher assistants are extremely professional, care about the students well-being, and quite generous with the grading curve

    In this class, you’ll develop a foundation around designing and building distributed systems. You’ll understand the importance of systems keeping track of time and the different ways to implement clocks (e.g. scalar clocks, vector clocks, matrix clocks). In addition, you’ll appreciate how systems achieve consensus and being able to make trade offs between choosing different consistency models such as strict consistency, eventual consistency. You’ll end the semester with learning about the infamous CAP theorem and FLP theorem and how, as a system designer, you’ll make trade offs between consistency, availability, and the ability to withstand network partitions. Of course, you’ll eat and breathe Leslie Lamport’s PAXOS. So if any of these topics interest you, you’re in for a treat.

    Cons

    • Bleeding edge course means that there were lots of rough edges
    • Projects were very demanding, often requiring multiple hours to pass a single test worth very little towards grades
    • Triggered lots of uncertainty and desperation among students throughout the second half of the semester

    As mentioned above, this class induced a lot of unnecessary stress in students. Even for someone like me, who cares less about the actual letter grades on transcripts, felt pretty anxious (this class potentially could’ve held me back another semester, since up until the grades were actually released, I had assumed I would get a C or lower).

    Impact on mental health

    One concerned students published a post on the forum, asking if students were mentally okay:

    I just wanted to check in with everyone on here in the class. I know these projects are stressful and for me it’s been something of a mental health hurdle to keep pushing despite knowing I may very well not succeed. Hope everyone is doing ok and hanging in there. Remember no assignment is worth your sanity or mental health and though we are distanced we are all in this together.

    Anonymous Calc

    Many other students chimed in, sharing their same frustrations

    I found both of the projects very frustrating. Specially this one. I am working for last 2 weeks (spending 50+ hours in writing/rewriting) and still passing only 7/8 tests. I never had unfinished academy projects. This is the first course I am having this.

    Adam

    I couldn’t help but agree:

    Honestly, I was fairly stressed for the past two weeks. Despite loving the course — content and rigor of the project — I seriously contemplated dropping the course (never considered this avenue before, and I’m 2 courses away from graduating after surviving compilers and other difficult systems courses) as to avoid potentially receiving a non-passing grade (got an A on the midterm but its looking pretty bleak for Project 4 with only 12 tests passing). At this point, I’ve fallen behind on lectures and although there is 1 (maybe 2) days left for Project 4, I’ve decided to distance myself from the project. Like many others, I’ve poured an insane number of hours into this project, which doesn’t reflect in the points in Gradescope. I suspect both the professor and the TAs are aware of the large number of people struggling with the project and will take this all into account as part of the final grading process.

    Tips

    Programming Projects

    Here’s a list of the projects, their weight towards the final grade, and the amount allocated to each assignment.

    • Project 1 – Environment Setup – 5% – 2 weeks
    • Project 2 – Client/Server – 10% – 2 weeks
    • Project 3 – Primary/Backup – 15% – 3 weeks
    • Project 4 – PAXOS – 15% – 3 weeks
    • Project 5 – Sharded KV Store – 15% – 4 weeks

    Project 1 and 2 are a walk in the park. The final 3 projects are brutal. Make sure you start early, as soon as the projects are released. I repeat: start early. Some people reported spending over 100+ hours on the latter projects.

    Unless you are one of the handful of people who can pour in 50+ hours per week in the class, do not expect to get an A on the programming projects. But don’t sweat it. Your final grade will be okay — you just need to have faith and ride the curve. All you can do is try and pass as many tests as possible and mentally prepare for the receiving a C or D (or worst) on these assignments.

    Summary

    The course is solid but needs serious tweaking around the pacing. For future semesters, the instructors should modify the logistics for the programming assignments, stealing a couple weeks from the first couple projects and tacking them on to final projects (i.e. Primary/Backup system, PAXOS, Sharded Key-Value Store). With these modifications, students will stress out way less and the overall experience will be much smoother.

     

  • PAXOS made moderately complex – slots

    PAXOS made moderately complex – slots

    In the paper “PAXOS made moderately complex”, the authors introduce unfamiliar concepts not mentioned in the original PAXOS paper, concepts such as slots, slot in, slot out, and WINDOW. I found these concepts difficult to understand despite reading both the accompanying pseudo code as well as their Python implementation.

    This post aims to shed light on these concepts and better understand how the following invariant holds:

    R5: A replica proposes commands only for slots for which it knows the configuration: slot_in < slot_out + WINDOW

    The rest of the post focuses on the replica only. We will not discuss the actual consensus algorithm, also known as the SYNOD; this topic will be covered in a separ

    SLOTS Overview

    • Slots are to be occupied by decisions
    • SLOT_IN points to the next “free” unoccupied slot
    • SLOT_OUT points to the next decision to be executed
    • SLOT_IN advances by 1 with every new decision received by leaders
    • SLOT_OUT advances by 1 with every execution of a decision by the replica
    • SLOT_IN + WINDOW is N number of slots allowed to be buffered
    • SLOT_OUT trails behind the SLOT_IN as more decisions flow into the replica
    • SLOT_OUT == SLOT_IN means replica has processed all it’s commands
    Replica Initial state
    Figure 1 – Imagine a WINDOW=5. Initially, both slot_in and slot_out both point to slot 1

     

    Then, the replica, as it receives requests from clients, will send proposals to the leaders. The leaders will, as part of its consensus algorithm, will respond with decisions commands, each accept command tied to a particular slot position. We won’t discuss the consensus algorithm as part of this post (but perhaps another).

    Figure 2 – Replica sending a proposal for slot 1

     

    Then, as the replica receives decisions, it will fill the slot accordingly. Every time a slot fills, the slot in pointer advances by one.

    Figure 3 – As replica receives decisions, it inserts into its buffer, advancing slot in by 1

    Next is the perform phase. The replica will execute the decision — it’s associated command — and will advance the slot out index by 1.

    After receiving a decision, the replica will fill in the slot associated with that particular decision.

    Figure 4 – Replica receives decision and fill in the particular slot associated with the decision

    Then, for illustrative purposes, imagine that the replica sends out another proposal (not shown in the figure below), advances the slot in by 1, then fills in the second slot.

    Figure 5 – Slot IN points to index 3 and Slot OUT still pointing to slot 1

    Finally, during the perform phase, the replica will advance the slot out pointer.

    Figure 6 – As the replica executes commands in the slot, the slot OUT index advances. The slot, previously yellow, is now green, symbolizing that the occupying command has been executed

     

    Finally, in this example, the replica executes the next command, advancing SLOT OUT which now points to the same (unoccupied) slot as SLOT IN.

    Figure 7 – Replica executes it second outstanding decision, advancing SLOT OUT by 1, which now points to the same (unoccupied) slot as SLOT IN

    Summary

    WORK IN PROGRESS

  • PAXOS – I’m coming for you!

    PAXOS – I’m coming for you!

    I’m now half way through Distributed Computing course at Georgia Tech and us students are now tackling the penultimate project: building a replicated state machine using PAXOS. This project will be challenging (probably going to require 40+ hours) and it’ll put my theoretical knowledge to the test and reflect back, in a couple weeks, how much I learned.

    Presently, here’s my little understanding of the consensus algorithm:

    Current PAXOS understanding

    • Servers play different roles – Proposer, Acceptor, Learner
    • Proposers send proposals that monotonically increase
    • Proposals are accepted if and only if a majority of the quorum accept them
    • The 2PC (2 phased commit) protocol essentially tells us whether or not a particular transaction is committed or aborted
    • Guaranteeing linearzability means that, from the clients perspective, real time (i.e. wall clock) should be respected and the client should view the system as if there is a single replica

    Future PAXOS understanding

    • How exactly PAXOS guarantees consensus via its 2 phased commit protocol
    • How does a server determine its role (or does it play multiple roles)
    • How to handle the edge cases (say two proposals arrive at the same time)
    • What role does a client play? Does it serve as a proposer?
    • How does leader election work in PAXOS?
    • Should I just try and mimic the Python based code described in Paxos made moderately difficult
    • How will replication work as the number of nodes in the system scales (say from 3 to 5 to 10)
    • How to detect (and perhaps avoid) split brain (i.e. multiple leaders)

    References

    1. Majority quorum can be defined as floor(n/2) + 1
    2. Python implementation described https://www.cs.cornell.edu/courses/cs7412/2011sp/paxos.pdf
  • Understanding linearizability

    I’m preparing for my Distributing Systems midterm and I was struggling to understand the differences between serializability and linearizability (why are these two words so difficult to spell right). Apparently, these two concepts are very different. To gain some clarity, I searched online and found this awesome YouTube video posted by Martin Kleppmann ; in the video, he dives deep into linearizability and I wanted to share some of the key take aways

    Key Takeaways

    • Happens before relationship (coined by Leslie Lamport) deals with causality and only applies to message sends and receives, related to logical clocks.
    • Linearizability focuses with not with logical clocks, but with real time
    • Linearzability states that an operation must take place sometime after it started but before it ended. That’s not entirely clear, so let’s let’s imagine a scenario with two clients: client 1 and client 2. Client 1 performs a write key=x value=0. And imagine client 2 performs a get key=x. And finally, suppose that the key value store current contains a key=x, value=100. With linearzability, it’s entirely possible that if both client 1 and client 2 overlap, that client 2 gets either value=0 or value=100.
    • Linearzability is not serializability. They are not the same. The latter is isolation between transactions; as if they are executed in some serial order. The former is multiple replicas behaving as if there is a single replica.
  • Distributed system snapshots: consistent vs inconsistent cuts

    Distributed system snapshots: consistent vs inconsistent cuts

    In “Consistent Global States of Distributed Systems: Fundamental Concepts and Mechanisms”, the authors propose capturing a distributed system’s computation using a time series graph. Each row in the graph represents a process (e.g. P1, P2, P3), and each tick (e.g. e1, e2) within that row represent a an event: a local event, a send message event, a receive message event. For example, looking at the figure above, P1’s event is a local event but P1’s second event represents the reception of the message sent from P2.

    Snapshot of distributed system using cuts

    Now, say we wanted to take a snapshot in time of all the processes. How do we model the system’s state? Well, the authors describe a technique they call cuts. Essentially, a cut is a tuple, each element signifying each process’s last event (i.e. called a frontier) that should be considered part of the snapshot state.

    Cuts of a distributed computation

    In the figure above, two cuts exist. The first cut is (5, 2, 4), the second cut (3,2,6). For the first cut, we include P1’s events up to (and including) event 5. For the same cut, we include P2’s events up to (and including) event 2.

    Consistent versus Inconsistent

    Now, there are two types of cuts. Consistent and inconsistent cuts. Returning back to the figure above, the first cut (C) is considered “consistent” while the latter (C’) “inconsistent”.

    Why?

    A consistent, according to the authors, can be defined using mathematical notation.

    Definition of consistent cut

    If that’s not clear, they offer another explanation: “In other words, a consistent cut is left closed under the causal precedence relation.”

    Inconsistent cut in layman terms

    Okay, what does this really mean? I still find it a bit confusing, so let me try to form it in my own words.

    If an event (event_x) within tuple was caused by another event (event_y), then the causal event (event_y) must be included within the tuple. As an example, take a look back at the figure above. P3’s event 6 was caused by P1’s event 5. However, the cut’s frontier event for process 1 is event 3, failing to include P1’s event 5. Therefore, this cut can be classified as inconsistent.

     

     

     

  • PAXOS (not) made simple

    PAXOS (not) made simple

    Leslie Lamport, a world renounced computer scientist, first published the paper “Part-time parliament” back in 1990. Unfortunately, that paper well not well received, primarily due to the fact that the consensus protocol was described in the context of obscure ancient Paxos civilization. The paper was not was received among computer scientists and Leslie Lamport followed up by publishing another version of paper titled “Paxos made simple.

    Over the last couple days, I’ve been reading and carefully studying the paper, making (so far) three passes. From the title, you would think that the paper is written such that it could be easily understood by us computer scientists. It’s supposed to be simple, right?

    Nope.

    In fact, I’m finding it incredibly difficult to conceptualize the consensus protocol, unable to succinctly describe the protocol in my own words. The paper’s title misleads its readers!

    Frustrated with my own inability to understand the paper, I set it aside and decided to pick up another (optional) reading assignment prescribed by my distributed systems course, the paper titled “In Search of an Understandable Consensus Algorithm.” After reading the first couple paragraphs, I feel a sense of relief because on the second page, the authors share both my frustration and confusion about PAXOS:

    “The first drawback is that Paxos is exceptionally difficult to understand. The full explanation [15] is notoriously opaque; few people succeed in understanding it, and only with great effort. As a result, there have been several attempts to explain Paxos in simpler terms [16, 20, 21]. These explanations focus on the single-decree subset, yet they are still challenging. In an informal survey of attendees at NSDI 2012, we found few people who were comfortable with Paxos, even among seasoned researchers. We struggled with Paxos ourselves; we were not able to understand the complete protocol until after reading several simplified explanations and designing our own alternative protocol, a process that took almost a year”

    After realizing that I’m not alone in failing to grok PAXOS, I started searching the internet for other resources that better describe PAXOS. I found a couple great write-ups as well as the “defacto” lecture describing PAXOS:

    So, sorry “PAXOS made simple”, it’s not me, it’s you.

  • The FLP theorem: impossibility of achieving consensus within distributed systems

    The FLP theorem: impossibility of achieving consensus within distributed systems

    For this week, my distributed systems course just assigned us students a reading assignment: “Impossibility of distributed consensus with one faulty process“.

    Apparently, this paper is a seminal piece of work that goes on to describe and prove that, given a single process failure within a distributed system, the underlying system cannot achieve consensus (i.e. the nodes cannot reach a state where they all agree on some proposed value). That’s right: not difficult, but impossible. Assuming that this theorem holds true, how the hell do we even build robust distributed systems that can tolerate failure? Surely there must be a way. Otherwise, how the hell do popular cloud services like Amazon S3 and Amazon DynamoDB guarantee high availability and consistency.

    My gut tells me that the authors of the paper — Fischer, Lynch, Patterson —  make strict assumptions about the underlying system. And if we can somehow relax these assumptions, then we can in fact build highly available distributed systems. But, I’ll find out soon since I’ll be reading both this paper as well as another seminal piece of work “Paxos made simple” by Leslie Lamport.

    I’ll follow up on a separate post once I’ve read through the paper three times.

  • Why is Lamport’s Scalar Clock only consistent, not strongly consistent?

    Why is Lamport’s Scalar Clock only consistent, not strongly consistent?

    For the last couple days, I’ve been watching the distributed systems video lectures and reading the recommended papers that cover logical clocks. Even after making multiple passes on the material, the concepts just haven’t clicked: I cannot wrap my mind around why Lamport’s clocks satisfy only consistency — not strong consistency. But now I think I have a handle on it after sketching a few of my own process diagrams (below).

    Before jumping into the examples, let’s first define strong consistency. We can categorize a system as strongly consistent if and only if, we can compare any two related event’s timestamps — and only the timestamps — and conclude that the two events are causally related. In other words, can we say “event A caused event B” just by inspecting their timestamps. In short, the timestamps must carry enough information to determine causality.

    Example: Time Diagram of Distributed Execution

    In my example below, I’ve limited the diagram to two process: P1 and P2.

    P1 executes three of its own events (i.e. event 1, event 2, event 3) and receives an event (i.e. event 4) from P2. P2 executes 2 events (i.e. event 1, event 2), sends a message (i.e. event 3) to P1, then P4 executes a fourth and final event. To show the sending of a message, I’ve drawn an error connecting the two events, (P2, 3) to (P1, 4).

    Logical clocks with an arrow from P2 to P1, showing causality between events (P2, 3) and (P1, 4)

     

    Cool. This diagram makes sense. It’s obvious from the arrow that (P2, 3) caused (P1, 4). But for the purposes of understanding why Lamport’s clocks are not strongly consistently, let’s remove the arrow from the diagram and see if we can reach the same conclusion of causality by inspecting only the timestamps.

    Removing arrow that connects (P2, 3) to (P1, 4)

     

    Next, let’s turn our attention to the timestamps themselves.

    Can we in any way say that (P2, 3) caused (P4, 4)?

     

    Just by comparing only the timestamps, what can we determine?

    We can say that the event with the lower timestamp happened before the event with the higher timestamp. But that’s it: we cannot make any guarantees that event 3 caused event 4. So the timestamps in themselves do not carry sufficient information to determine causality.

    If Lamport’s scalar clocks do not determine causality, what can we use instead? Vector clocks! I’ll cover these types of clocks in a separate, future blog post.

  • 8 fallacies of distributed computing

    8 fallacies of distributed computing

    Rotem-Gal-Oz, A. (2005). Fallacies of Distributed Computing Explained.

    Cognitive biases (built-in patterns of thinking) and fallacies (errors in thoughts) creep into our every day lives, sometimes with us not even knowing it. For example, ever wonder why you work just a little harder, a little quicker, when you think someone is standing over your shoulder, watching you? This is known as the Hawthorne effect: people tend to change their behaviors when they know (or think) they are being watched.

    Or how about situations when we are trying to solve a problem but reach for the same tool (i.e. using a hammer)? That might be the exposure effect, people preferring the same tools, processes — just because they are familiar.

    Essentially, cognitive biases and fallacies trick us: they throw off our memory, perception, and rationality. So we must surface these blind spots to the forefront of our brains when designing and building distributed systems.

    8 Fallacies

    According to Schneider, there are 8 fallacies that bite distributed system designers. These fallacies stood the test of time. They were drafted over 25 years ago; designers have been building distributed systems for over 55 years. Despite advances in technology, these fallacies still remain true today.

    1. The network is reliable
    2. Latency is zero
    3. Bandwidth is infinite
    4. The network is secure
    5. Topology doesn’t change
    6. There is one administrator
    7. Transport cost is zero
    8. The network is homogeneous

    Notice anything in common among these fallacies? They all tend to revolve around assumptions we make about the underlying communication, the network.

    Network assumptions

    I wouldn’t go as far and networks are fragile, but failures (e.g. hardware failure, power outages, faulty cable) happen on a daily basis. Sometimes, these failures are not observed by our users, their traffic being automatically routing towards an alternative — albeit sometimes a little slower — path; their dropped messages being resent thanks to the underlying (transport) communication protocol.

    So, what does this mean for you, the system designer?

    To overcome network hiccups, we should either “use a communication medium that supplies full reliable message” or employ the following messaging techniques: retry, acknowledge important messages, identify/ignore duplicates, reorder messages (or do not depend on message order), and very message integrity.

    In addition to hardening our software, be prepared to thoroughly stress test your system. You’ll want to hamper network performance: drop messages, increase latency, congest the pipes. How well can your system tolerate these conditions? Can your system sustain these performance degradations? Or does your system fail in unexpected ways?

    Main takeaway

    In short, we should assume that the failure is unreliable and in response we should both employ techniques (e.g. retry messages) that harden our system as well as thoroughly test our systems/software under non-ideal conditions including device failures, congestion, increased latency.

    References

    1. Hunt Pragmatic Thinking and Learning: Refactor your wetware
    2. https://fallacyinlogic.com/fallacy-vs-bias/