Tag: watermark

  • Distributed Shared Memory (Part 2 of 2) Notes

    Distributed Shared Memory (Part 2 of 2) Notes

    An example

    Example of release consistency model

    Summary

    Key Words: Conditional variable, pthread_signal, pthread_wait

    in the concrete example (screenshot below), P1 instructions that update memory (e.g. flag = 1) can be run in parallel with that of P2 because of release consistency model

    Advantage of RC over SC

     

    Summary

    In a nutshell, we gain performance in a shared memory model using release consistency by overlapping computation with communication, because we no longer wait for coherence actions for every memory access

    Lazy RC (release consistency)

    Lazy Release Consistency

    Summary

    Key Words: Eager

    The main idea here is that the “release consistency” is eager, in the sense that cache coherence traffic is generated immediately after unlock occurs. But with lazy RC, we defer that cache coherence traffic until the acquisition

    Eager vs Lazy RC

    Eager vs Lazy Consistency

    Summary

    Key Words: Eager, Lazy

    Basically, eager and lazy goes boils down to a push (i.e. eager) versus pull (i.e. lazy) model. In the former, every time the lock is released, coherence traffic broadcasts to all other processes

    Pros and Cons of Lazy and Eager

    Summary

    Advantage of lazy (over eager) is that there are less messages however there will be more latency during acquisition

    Software DSM

    Software DSM

    Summary

    Address space is partitioned, meaning each processor is responsible for a certain set of pages. This model of ownership is a distributed, and each node holds metadata about the page and is responsible for sending coherence traffic (at the software level)

    Software DSM (Continued)

    Software DSM (continued)

    Summary

    Key Words: false sharing

    DSM software runs on each processor (cool idea) in a single writer multiple reader model. This model can be problematic because, coupled with false sharing, will cause significant bus traffic that ping pongs updates when multiple data structures live within the same cache line (or page)

    LRC with Mutli-Writer Coherence Protocol

    Lazy Release Consistency with multi-writer coherence

    Summary

    With lazy release consistency, a process will (during the critical section) generate a diff of the pages that have been modified, the diff later applied when another process performs updates to those same pages

    LRC with Multi-Writer Coherence Protocol (Continued)

    Summary

    Need to be able to apply multiple diffs in a row, say Xd and Xd’ (i.e. prime)

    LRC with Multi Writer Coherence Protocol (Continued)

    Summary

    Key Words: Multi-writer

    The same page can be modified at the same time by multiple threads, just so as long as a separate lock is used

    Implementation

    Implementation of LRC

    Summary

    Key Words: Run-length encoded

    During a write operation (inside of a lock), a twin page will get created, essentially a copy of the original page. Then, during release, a run-length encoded diff is computed. Following this step, the memory access is then write protected

    Implementation (continued)

    LRC Implementation (Continued)

    Summary

    Key Words: Data Race, watermark, garbage collection

    A daemon process (in every node) wakes up periodically and if the number of diffs exceed the watermark threshold, then daemon will apply diffs to original page. All in all, keep in mind that there’s overhead involved with this solution: overhead with space (for the twin page) and overhead in runtime (due to computing the run-length encoded diff)

    Non Page Based DSM

    Non-page-based DSM

    Summary

    Two types of library based that offer alternatives, both that do not require OS support. The two approaches are library-based (variable granularity) and structured DSM (API for structures that triggers coherence actions)

    Scalability

    Scalability

    Summary

    Do our (i.e. programmer’s) expectations get met as the number of processors increase: does performance increase accordingly as well? Yes, but there’s substantial overhead. To be fair, the same is true with true shared memory multiple processor