Tag: cache coherence

  • Distributed Shared Memory (Part 1 of 2) Notes

    Distributed Shared Memory (Part 1 of 2) Notes

    Introduction

    Summary

    Main question is this: can we make a cluster look like a shared memory machine

    Cluster as a parallel machine (sequential program)

    Cluster as a parallel machine

    Summary

    One strategy is to not write explicit parallel programs and instead use language assisted features (like pragma) that signal to the compiler that this section be optimized. But, there are limitations with this implicit approach

    Cluster as a parallel machine (message passing)

    Cluster as parallel machine (message passing)

    Summary

    Key Words: message passing

    One (of two) styles for explicitly writing parallel programs is by using message passing. Certain libraries (e.g. MPI, PVM, CLF) use this technique and true to a process’s nature, the process does not share its memory and instead, if the process needs to communicate with another entity, it does so by message passing. The downside? More effort from the perspective of the application developer

    Cluster as a parallel machine (DSM)

    Cluster as a parallel machine (parallel program)

    Summary

    The advantage of a DSM (distributed shared memory) is that an application developer can ease their way in, their style of programming need not change: they can still use locks, barriers, and pthreads styles, just the way they always have. So the DSM library provides this illusion of a large shared memory address space.

    History of shared memory systems

    History of shared memory systems

    Summary

    In a nutshell, software DSM has its roots in the 80s, when Ivy league academics wanted to scale the SMP. And now, (in the 2000s), we are looking at clusters of symmetric memory processors.

    Shared Memory Programming

    Shared Memory Programming

    Summary

    There are two types of synchronization: mutual exclusion and barrier. And two types of memory accesses: normal read/writes to shared data, and read/write to synchronization variables

    Memory consistency and cache coherence

    Memory Consistency vs Cache Coherence

    Summary

    Key Words: Cache Coherence, Memory consistency

    Memory consistency is the contract between programmer and the system and answers the question “when”: when will a change to the shared memory address space reflect in the other process’s private cache. And cache coherence answers the “how”, what mechanism will be used (cache invalidate or write update)

    Sequential Consistency

    Sequential Consistency

    Summary

    Key Words: Sequential Consistency

    With sequential consistency, program order is respected, but there’s arbitrary interleaving. Meaning, each individual read/write operations are atomic on any processor

    SC Memory Model

    Sequential Consistency Model

    Summary

    With sequential consistency, the memory model does not distinguish a read/write access to a synchronization read/write access. Why is this important? Well, we always get the coherence action, regardless of memory access type

    Typical Parallel Program

    Typical Parallel Program

    Summary

    Okay, I think I get the main gist and what the author is trying to convey. Basically, since we cannot distinguish the reads and writes for memory accesses — from normal read/write versus synchronization read/write — then that means (although we called it out as a benefit earlier) cache coherence will continue to take place all throughout the critical section. But, the downside here is that that coherence traffic, is absolutely unnecessary. And really, what we probably want is that only after we unlock the critical section should the data within that critical section, be updated across all other processor cache

    Release Consistency

    Release Consistency

    Summary

    Key Words: release consistency

    Release consistency is an alternative memory consistency model to sequential consistency. Unlike sequential consistency, which will block a process until coherence has been achieved for an instruction, release consistency will not block but will guarantee coherence when the lock has been released: this is a huge (in my opinion) performance improvement. Open question remains for me: does this coherence guarantee impact processes that are spinning on a variable, like a consumer

  • Shared Memory Machine Model (notes)

    Shared Memory Machine Model (notes)

    You need to take away the following two themes for shared memory machine model:

    • Difference and relationship between cache coherence (dealt with in hardware) and cache consistency (handled by the operating system)
    • The different memory machine models (e.g. dance hall, symmetric multiprocessing, and distributed shared memory architecture)

    Cache coherence is the promise delivered by the underlying hardware architecture. The hardware guarantees employs one of two techniques: write invalidate or write update. In the latter, when a memory address gets updated by one of the cores, the system will send a message on the bus to invalidate the cache entry stored in all the other private caches. In the former, the system will instead update all the private caches with the correct data. Regardless, the mechanism in which cache is maintained is an implementation detail that’s not privy to the operating system.

    Although the OS has not insight into how the hardware delivers cache coherence, the OS does rely on the cache coherence to build cache consistency, the hardware and software working in harmony.

    Shared Memory Machine Model

    Summary

    Shared memory model - Dance Hall Architecture
    Shared memory model – Dance Hall Architecture
    Symmetric multiprocessing – each processor’s time to access to the memory is the same
    Shared Memory Machine Model – each CPU has its own private cache and its own memory, although they can access each others addresses

    There are three memory architecture: dance hall (each CPU has its own memory), SMP (from perspective of each CPU, access to memory is the same as other CPU), and Distributed shared memory architecture (some cache and some memory is faster to access for a given CPU). Lecture doesn’t go much deeper than this but I’m super curious as to the distributed architecture.

    Shared Memory and Caches

    Summary

    Because each CPU has its own private cache, we may run into a problem known as cache coherence. This situation can occur if the caches tied to each CPU contain different values for the same memory address. To resolve this, we need a memory consistency model.

    Quiz: Processes

    Summary

    Quiz tests us by offering two processes, each using shared memory, each updating different variables. Then the quiz asks us what are the possible values for the variables. Apart from the last one, all are possible, the last one would break the intuition of a memory consistency model (discussed next)

    Memory consistency model

    Summary

    Introduces the sequential consistency memory model, a model that exhibits two characteristics: program order and arbitrary interleaving. This Isi analogous to someone shuffling a deck of cards.

    Sequential Consistency and cache coherence (Quiz)

    Summary

    Cache Coherence Quiz – Sequential Consistency

    Which of the following are possible values for the following instructions

    Hardware Cache Coherence

    Summary

    Hardware Cache Coherence

    There are two strategies for maintaining cache coherence. Write invalidate and write update. In the former, the system bus will broadcast an invalidate message if one of the other cache’s modifies an address in its private cache. In the latter, the system must will send an update message to each of the cache’s, each cache updating it’s private cache with the correct data. Obviously, the lecture oversimplifies the intricacies of maintaining a coherent cache (and if you want to learn more, check out the high performance computing architecture lectures — or maybe future modules in this course will cover this in more detail)

    Scalability

    Summary

    Scalability – expectation with more processors. Pros: Exploit Parallelism. Cons : Increased overhead

    Adding processors increase parallelism and improves performance. However, performance will decrease due to additional overhead of maintaining the bus (another example of making trade offs and how nothing is free).