Tag: sequential consistency

Distributed Computing – Lesson 1 Summary
Summary

Distributed systems are everywhere: social media, internet of things, single server systems — all part of larger, distributed systems. But how do you define a distributed system? A distributed system is, according to Leslie Lamport (father of distributed computing), a system in which failure of some component (or node) impacts the larger system, rendering the whole system either unstable or unusable.

A distributed system is one in which the failure of a computer you didn’t even know existed can render your own computer unusable – Leslie Lamport

Lesie Lamport. Source: https://www.heidelberg-laureate-forum.org/laureate/leslie-lamport.html

To better understand distributed systems, we can model their behaviors using nodes and messages. Nodes send messages to one another, over unreliable communication links in a 1:1 or 1:N fashion. Because of the assumed flaky communication, messages may never arrive due to node failure. This simple model of nodes and messages, however, does not take the nodes capturing events into consideration; for this, our model must be extended into a slightly more complex one, introducing the concept of “state” being stored at each node.

When evaluating distributed system, we can and should leverage models. Without them, we’d have to rely on building prototypes and running experiments to prove our system; this approach is untenable due to issues such as lack of resources or due to the scale of the system, which may require hundreds of thousands of nodes. When selecting a model, we want ensure that it can accurately represent the problem and be used to analyze the solution. In other words, the model should hold two qualities: accurate and tractable.

Using Models. Source: Georgia Tech

Building distributed systems can be difficult for three reasons: asynchrony, failures, consistency. Will a system assume messages are sent immediately within a fixed amount of time? Or will it be asynchronous, potentially losing messages? How will the system handle the different types of failures: total, grey, byzantine (cannot tell, misbehaving) ? How will the system remain consistent if we introduce caching or replication? All of these things are part of the larger 8 fallacies of distributed computing systems.

CAP Theorem. Source: Georgia Tech

Ultimately, when it comes to building reliable and robust systems, we cannot have our cake and eat it too. Choose 2 out of the 3 properties: consistency, availability, partitioning. This tradeoff is known as CAP theorem (which has not been proven and is only a theorey). If you want availability and partitioning (e.g. cassandra, dynamodb), then you sacrifice consistency; similarly, if you want consistency and partitioning (e.g. MySQL, Megastore), you sacrifice availability.

References
January 15, 2021
Distributed Shared Memory (Part 1 of 2) Notes

Introduction

Summary

Main question is this: can we make a cluster look like a shared memory machine

Cluster as a parallel machine (sequential program)

Cluster as a parallel machine

Summary

One strategy is to not write explicit parallel programs and instead use language assisted features (like pragma) that signal to the compiler that this section be optimized. But, there are limitations with this implicit approach

Cluster as a parallel machine (message passing)

Cluster as parallel machine (message passing)

Summary

Key Words: message passing

One (of two) styles for explicitly writing parallel programs is by using message passing. Certain libraries (e.g. MPI, PVM, CLF) use this technique and true to a process’s nature, the process does not share its memory and instead, if the process needs to communicate with another entity, it does so by message passing. The downside? More effort from the perspective of the application developer

Cluster as a parallel machine (DSM)

Cluster as a parallel machine (parallel program)

Summary

The advantage of a DSM (distributed shared memory) is that an application developer can ease their way in, their style of programming need not change: they can still use locks, barriers, and pthreads styles, just the way they always have. So the DSM library provides this illusion of a large shared memory address space.

History of shared memory systems

History of shared memory systems

Summary

In a nutshell, software DSM has its roots in the 80s, when Ivy league academics wanted to scale the SMP. And now, (in the 2000s), we are looking at clusters of symmetric memory processors.

Shared Memory Programming

Shared Memory Programming

Summary

There are two types of synchronization: mutual exclusion and barrier. And two types of memory accesses: normal read/writes to shared data, and read/write to synchronization variables

Memory consistency and cache coherence

Memory Consistency vs Cache Coherence

Summary

Key Words: Cache Coherence, Memory consistency

Memory consistency is the contract between programmer and the system and answers the question “when”: when will a change to the shared memory address space reflect in the other process’s private cache. And cache coherence answers the “how”, what mechanism will be used (cache invalidate or write update)

Sequential Consistency

Sequential Consistency

Summary

Key Words: Sequential Consistency

With sequential consistency, program order is respected, but there’s arbitrary interleaving. Meaning, each individual read/write operations are atomic on any processor

SC Memory Model

Sequential Consistency Model

Summary

With sequential consistency, the memory model does not distinguish a read/write access to a synchronization read/write access. Why is this important? Well, we always get the coherence action, regardless of memory access type

Typical Parallel Program

Typical Parallel Program

Summary

Okay, I think I get the main gist and what the author is trying to convey. Basically, since we cannot distinguish the reads and writes for memory accesses — from normal read/write versus synchronization read/write — then that means (although we called it out as a benefit earlier) cache coherence will continue to take place all throughout the critical section. But, the downside here is that that coherence traffic, is absolutely unnecessary. And really, what we probably want is that only after we unlock the critical section should the data within that critical section, be updated across all other processor cache

Release Consistency

Release Consistency

Summary

Key Words: release consistency

Release consistency is an alternative memory consistency model to sequential consistency. Unlike sequential consistency, which will block a process until coherence has been achieved for an instruction, release consistency will not block but will guarantee coherence when the lock has been released: this is a huge (in my opinion) performance improvement. Open question remains for me: does this coherence guarantee impact processes that are spinning on a variable, like a consumer

October 31, 2020

Tag: sequential consistency

Distributed Computing – Lesson 1 Summary

Summary

References

Distributed Shared Memory (Part 1 of 2) Notes

Introduction

Summary

Cluster as a parallel machine (sequential program)

Summary

Cluster as a parallel machine (message passing)

Summary

Cluster as a parallel machine (DSM)

Summary

History of shared memory systems

Summary

Shared Memory Programming

Summary

Memory consistency and cache coherence

Summary

Sequential Consistency

Summary

SC Memory Model

Summary

Typical Parallel Program

Summary

Release Consistency

Summary