Tag: transaction

  • Recovery management in Quicksilver  – Notes and Summary

    Recovery management in Quicksilver – Notes and Summary

    The original paper “Recovery management in quicksilver” introduces a transaction manager that’s responsible for managing servers and coordinates transactions. The below notes discusses how this operating system handles failures and how it makes recovery management a first class citizen.

    Cleaning up state orphan processes

    Cleaning up stale orphan processes

    Key Words: Ophaned, breadcrumbs, stateless

    In client/server systems, state gets created that may be orphaned, due to a process crash or some unexpected failure. Regardless, state (e.g. persistent data structures, network resources) need to be cleaned up

    Introduction

    Key Words: first class citizen, afterthought, quicksilver

    Quicksilver asks if we can make recovery a first class citizen since its so critical to the system

    Quiz Introduction

    Key Words: robust, performance

    Users want their cake and eat it too: they want both performance and robustness from failures. But is that possible?

    Quicksilver

    Key Words: orphaned, memory leaks

    IBM identified problems and researched this topic in the early 1980s

    Distributed System Structure

    Distributed System Structure

    Key Words: microkernel, performance, IPC, RPC

    A structure of multiple tiers allows extensibility while maintaining high performance

    Quicksilver System Architecture

    Quicksilver: system architecture

    Key Words: transaction manager

    Quicksilver is the first network operating system to propose transactions for recovery management. To that end, there’s a “Transaction Manager” available as a system service (implemented as a server process)

    IPC Fundamental to System Services

    IPC fundamental to system services

    Key Words: upcall, unix socket, service_q data structure, rpc, asynchronous, synchronous, semantics

    IPC is fundamental to building system services. And there are two ways to communicate with the service: synchronously (via an upcall) and asynchronously. Either, the center of this IPC communication is the service_q, which allows multiple servers to perform the body of work and allows multiple clients to enqueue their request

    Building Distributed IPC and X Actions

    Bundling Distributed IPC and Transactions

    Key Words: transaction, state, transaction link, transaction tree, IPC, atomicity, multi-site atomicity

    During a transaction, there is state that should be recoverable in the event of a failure. To this end, we build transactions (provided by the OS), the secret sauce for recovery management

    Transaction Management

    Transaction management
    Transaction management

    Key Words: transaction, shadow graph structure, tree, failure, transaction manager

    When a client requests a file, the client’s transaction manager becomes the owner (and root) of the transaction tree. Each of the other nodes are participants. However, since client is suspeptible to failing, ownership can be transferred to other participants, allowing the other participants to clean up the state in the event of a failure

    Distributed Transaction

    Key Words: IPC, failure, checkpoint records, checkpoint, termination

    Many types of failures are possible: connection failure, client failure, subordinate transaction manager failure. To handle these failures, transaction managers must periodically store the state of the node into a checkpoint record, which can be used for potential recovery

    Commit Initiated by Coordinator

    Commit initiated by Coordinator

    Key Words: Coordinator, two phase commit protocol

    Coordinator can send different types of messages down the tree (i.e. vote request, abort request, end commit/abort). These messages help clean up the state of the distributed system. For more complicated systems, like a file system, may need to implement a two phased commit protocol

    Upshot of Bundling IPC and Recovery

    Upshot of bundling IPC and Recovery

    Key Words: IPC, in memory logs, window of vulnerability, trade offs

    No extra communication needed for recovery: just ride on top of IPC. In other words, we have the breadcrumbs and the transaction manager data, which can be recovered

    Implementation Notes

    Key Words: transaction manager, log force, persistent state, synchronous IO

    Need to careful about choosing mechanism available in OS since log force impacts performance heavily, since that requires synchronous IO

    Conclusion

    Key Words: Storage class memories

    Ideas in quicksilver are still present in contemporary systems today. The concepts made their way into LRVM (lightweight recoverable virtual machine) and in 2000, found resurgence in Texas operating system

  • Lightweight recoverable virtual machine – Summary and Notes

    Lightweight recoverable virtual machine – Summary and Notes

    Summary and main take away

    As system designers, we can make persistence into the virtual memory manager, offering persistence to application developers. However, it’s no easy feat: we need to ensure that the solution performs well. To this end, the virtual machine manager offers an API that allows developer to wrap their code in transactions; underneath the hood, the virtual machine manager uses redo logs that persists the user changes to disk which can defend against failures.

    Persistence

    Why is persistence needed?

    Key Words: inode, subsystem, virtual memory management, log sequence

    We can bake persistent into the virtual memory manager (VMM) but building an abstraction is not enough. Instead, we need to ensure that the solution is performant and instead of committing each VMM change to disk, we aggregate them into a log sequence (just like the previous approaches in distributed file system) so that 1) we write in a contiguous block

    Server Design

    Server Design – persist metadata, normal data structures

    Key Words: inodes, external data segment

    The designer of the application gets to decide which virtual addresses will be persisted to external data storage

    Server Design (continued)

    Key Words: inodes, external data segment

    The virtual memory manager offers external data segments, allowing the underlying application to map portions of its virtual address space to segments backed by disk. The model is simple, flexible, and performant. In a nutshell, when the application boots up, the application selects which portions of memory must be persisted, giving the application developer full control

    RVM Primitives

    Key Words: transaction

    RVM Primitives: initialization, body of server code

    There are three main primitives: initialize, map, and unmap. And within the body of the application code, we use transactions: begin transaction, end transaction, abort transaction, and set range. The only non obvious statement is set_range: this tells the RVM runtime the specific range of addresses within a given transaction that will be touched. Meaning, when we perform a map (during initialization), there’s a larger memory range and then we create transactions within that memory range

    RVM Primitives (continued)

    RVM Primitives – transaction code and miscellaneous options

    Key Words: truncation, flush, truncate

    Although RVM automatically handles the writing of segments (flushing to disk and truncating log records), application developers can call those procedures explicitly

    How the Server uses the primitives

    How the server uses the primitives – begin and end transaction

    Key Words: critical section, transaction, undo record

    When transaction begins, the LRVM creates an undo record: a copy of the range specified, allowing a rollback in the event an abort occurs

    How the Server uses the primitives (continued)

    How the server uses the primitives – transaction details

     

    Key Words: undo record, flush, persistence

    During end transaction, the in memory redo log will get flushed to disk. However, by passing in a specific mode, developer can explicitly not call flush (i.e. not block) and flush the transaction themselves

    Transaction Optimizations

    Transaction Optimizations – ways to optimize the transaction

     

    Key Words: window of vulnerability

    With no_restore mode in begin transaction, there’s no need to create a in memory copy; similarly, no need to flush immediately with lazy persistence; the trade off here is that there’s an increase window of vulnerability

    Implementation

    Implementation – redo log and commit

     

    Key Words: forward displacement, transaction, reverse displacement

    Redo log allows traversal in both directions (reverse for recovery) and only new values are written to the log: this implementation allows good performance

    Crash Recovery

    Crash Recovery – resuming from a crash

     

    Key Words: crash recovery

    In order to recover from a crash, the system traverses the redo log, using the reverse displacement.Then, each range of memory (along with the changes) are applied

    Log Truncation

    Log truncation – runs in parallel with forward processing

     

    Key Words: log truncation, epoch

    Log truncation is probably the most complex part of LRVM. There’s a constant tug and pull between performance and crash recovery. Ensuring that we can recover is a main feature but it adds overhead and complexity since we want the system to make forward progress while recovering. This end, the algorithm breaks up data into epochs