Category: Advanced Operating Systems

  • Squeezing the most out of your study sessions (midterm and final exam preparation)

    Squeezing the most out of your study sessions (midterm and final exam preparation)

    Why publish my studying techniques?

    This semester, I manage to pull off an A not only for the midterm and final exams, but for the class as a whole. My intention of revealing my grade is not to boast (that’s poor taste), but to give some credibility to the techniques and strategies below, techniques and strategies (rooted in pedagogy) for someone who is working full time, someone who is is growing as a (freshly minted) father and husband, someone who pursues a wide array of other interests outside of graduate school. That being said, receiving the highest letter grade is not my goal in the program (although I admit it once was). My goal is to learn as much as I can, absorb as much computer science, connect the dots between all the other classes, bridge the gap between theory and practice, and apply the knowledge to my job as a software engineer.

    Exam logistics

    Assumptions

    This blog post assumes that you can openly collaborate with other students. During my term (Fall 2020), us students shared our answers using a two pronged approach: studying with one another over Zoom (see my post on War Rooms) and comparing answers in a shared Google Document. Crowd source at its finest.

    Time Bound

    You have up to 3 days to take exams. And before the exam window opens, the study guides (i.e. all the previous exams bundled up in a .zip file) are published. That’s about a week for you to prepare. And once the exam window opens, the professor and teacher assistants publish the actual exam. That’s right: they provide all the questions (even then, the exam is no walk in the park). In other words, you walk into the exam with no surprises, knowing exactly what questions to expect.

    Spaced repetition

    To make the most use out of my time bounded study sessions, I apply three techniques rooted in research: forgetting curve, spacing effect, and testing effect (check out a paper I wrote titled “An argument for a spaced repetition standard”).

    Put simply, instead of one massive, multi-hour study session, I break down the study sessions into a fistful of sessions, around five to ten, and spread the sessions across multiple days (i.e. “spaced repetition”), each session lasting anywhere between 10-20 minutes. During each study session, I test myself (i.e. “testing effect”) and try to answer questions just as I’m about to forget them (i.e. “forgetting curve”). With this approach, I optimize memory retention for both short and long term. This technique aligns with what Bacon stated back in 1620:

    “If you read a piece of text through twenty times, you will not learn it by heart so easily as if you read it ten times while attempting to recite from time to time and consulting the text when your memory fails”.

    Studying previous exams

    Here’s how I study previous exams.

    First, I create a blank page in Google Docs. Then, I select one of the previous exams (only one because of time constraints) and copy all of its questions into this blank document. Next, I attempt to answer each question on my own: no peeking at the solution guide! This is how I test myself.

    My google doc sheet

    After answering each question, I compare my guess — sometimes it is way off — with the answer printed in the answer sheet. If my answer is incorrect, I then copy the correct answer, word by word, below my original answer. The act of copying helps reinforces the right answer and positioning the two answers — mine and the correct one — next to each other helps me see the holes in my logic.

    I apply this circular technique — copy question, answer question, compare the two, copy correct answer — for each question listed in the exam. Bear in mind that I apply this circular technique only once. Repetition will come in later when I study the actual final exam.

    Studying final exam

    Studying for the final exam approach closely resembles my circular technique (describe previously). The main difference is this: instead of only making a single pass through the questions, I make two passes. The first pass is the same (described above) circular technique of copying each questions and taking a stab at them. In the second pass, I review my answers against the collaborated google docs, where all the other students dump their answers.

    Collaborated Google Doc study guide

    Once my answer guide — not the collaborated Google Docs — is filled out with (hopefully) the right answers, I then the contents into my space repetition software: Anki. After copying, I simply rote memorize, drilling each questions into my head until I stop answering incorrectly, which normally takes about an hour in total, spread across 2 days.
    Finally, after the spaced repetition sessions using Anki, with the contents of the exam sitting fresh in my working memory, I take the exam and simply brain dump my answers.

    Using Anki to study for exam

    References

    https://blog.mattchung.me/2020/09/27/my-anki-settings-for-cramming-for-an-exam/

     

  • Giant Scale Services – Summary and notes

    Giant Scale Services – Summary and notes

    Introduction

    We’ll address some questions like “how to program big data systems” and how to “store and disseminate content on the web in scalable manners”

    Quiz: Giant Scale Services

    Basically almost every service is backed by “Giant Scale” services

    Tablet Introduction

    This lesson covers three issues: system issues in giant scale services, programming models for applications working on big data and content distribution networks

    Generic Service Model of Giant Scale Services

    Generic Service Model of Giant Scale Services – Source: Udacity Advanced OS Course

    Key Words: partial failures, state

    A load manager sits between clients and the back end services and is responsible for hiding partial failures by observing state of the servers

    Clusters as workhorses

    Clusters as workhorses – Source: Udacity Advanced OS Course

    Key Words: SMP, backplane, computational clusters

    Treat each node in the cluster as the same, connecting the nodes via a high performance backplane. This strategy offers advantages, allowing easy horizontal scaling, making it easy for a systems administrator manage the load and system

    Load Management Choices

    Load Management Choices – Source: Udacity Advanced OS Course

    Key Words: OSI, application layer

    The higher the layer in which you construct the load manager, the more functionality you can have

    Load management at the network level

    Load Management at the network level – Source: Udacity Advanced OS Course

    Key Words: DNS, semantics, partition

    We can use DNS to load balancer traffic, but this approach does not hide server failures well. By moving up the stack, and performing balancing at the transport layer, we can balance traffic based off of service

    DQ Principle

    DQ Principle – Source: Udacity Advanced OS Course

    Key Words: DF (full data set), corpus of data, harvest (D), yield

    We’re getting a bit more formal here. Basically, there are two ratios: Q (yield) and D (harvest). Q’s formula is Qc (completed requests)/ Q0 (offered load). Ideally want this ratio to be 1, which means all client requests were serviced. For D (harvest), formula is Dv (available data) / DF (full data). Again, want this ratio to be 1 meaning all data available to service client request

    DQ Principle (continued)

     

    Key Words: IOPS, metrics, uptime, assumptions, MTTR, MTBF, corpus of data, DQ

    DQ principle very powerful, helps architect the system. We can increase harvest (data), but keep the yield the same. Increase the yield, but keeping D constant. Also, we can track several metrics including uptime, which is a ration between MTBF (mean time between failures) and MTTR (mean time to repair). Ideally, this ratio is 1 (but probably improbable). Finally, these knobs that a systems administrator can tweak assumes that the system is network bound, not IO bound

    Replication vs Partitioning

    Replication vs Partioning – Source: Udacity Advanced OS Course

    Key Words: replication, corpus of data, fidelity, strategy, saturation policy

    DQ is independent of replication or partioning. And beyond a certain point, replication is preferred (from user’s perspective). During replication, harvest data is unaffected but yield decreases. Meaning, some users fail for some amount of time

    Graceful Degradation

    Graceful Degradation – Source: Udacity Advanced OS Course

    Key Words: harvest, fidelity, saturation, DQ, cost based admission control, value based admission control, reduce data freshness

    DQ provides an explicit strategy for handling saturation. The technique allows the systems administrator to tweak the fidelity, or the yield. Do we want to continue servicing all customers, with degraded performance … or do we want to, once DQ limit is reached, service existing clients with 100% fidelity

    Online Evolution and Growth

    Online evolution and growth – Source: Udacity Advanced OS Course

    Key Words: diurmal server property

    Two approaches for deploying software on large scale systems: fast and rolling. With fast deployment, services are upgraded off peak, all nodes down at once. Then there’s a rolling upgrade, in which the duration is longer than a fast deployment, but keeps the service available

    Online evolution and growth (continued)

    Online evolution and growth – Source: Udacity Advanced OS Course

    Key Words: DQ, big flip, rolling, fast

    With a big flip, half the nodes are down, the total DQ down by half for U units of time

    Conclusion

    DQ is a tool for system designers to optimize for yield or for harvest. Also helps designer deal with load saturation, failed, or upgrades are planned

  • Free E-book: Advanced operating systems (AOS) refresher course – summary and study guide

    Free E-book: Advanced operating systems (AOS) refresher course – summary and study guide

    Click here to download “Advanced OS refresher course – summary and study guide”

    I compiled my various blog posts from the advanced operating systems refresher course and bundled them together into a nicely packed e-book. So, if you are about to enroll in Georgia Tech’s advanced operating system course (AOS) and want to step through the refresher material (without sitting through and watching the entire lecture series) or just get a sense of whether you can skip the refresher course all together, then definitely check out the e-book above.

  • Enterprise Java Beans – notes and summary

    Enterprise Java Beans – notes and summary

    Introduction

    Key Words: EJB, enterprise java beans

    Discuss how we can structure system software for large scale distributed sytem service

    Inter Enterprise View

    Inter Enterprise View: The motivation of using enterprise java beans

    Key Words: monolithic, supply chain model, amalgam, survivability, complexity

    From a user perspective, we view a system (like Ebay or Google) as a blackbox. But in reality, much more complex than that and within the system, there may be multiple enterprise working together and this can be difficult when trying to handle failures, complexity, and so on

    An Example

    Key Words: scheduling, parallel systems, synchronization, communication, atomicity, concurrency, object technology, reuse

    Using a system like Expedia, we face same issues in parallel systems like synchronization and concurrency and scheduling and atomicity and so on

    N-tier applications

    N tier applications

    Key Words: concurrency, parallelism, embarrassingly parallel applications

    Want to reduce network communication (latency), security (for users) by not compromising business logic, and increase concurrency

    Structuring N Tier Applications

    Structuring N tier applications

    Key Words: JEE, protection domain, bean

    We can split the protection domain uses containers, at the client or at the presentation layer, business layer or database layer. Each container contains N beans, bundle of java objects

    Design alternative (Coarsegrain session beans)

    Coarse grain approach. Although this approach protects business logic from outside network, fails to expose concurrency at the EJB container.

    Key Words: monolithic, concurrency

    Each session bean is very granular, the session bean encapsulating most of the logic. The upshot of this approach is that the container offers little service and the logic is not exposed outside the corporate network. The downside is that we do not exploit concurrency. There’s a missed opportunity where the EJB container and pull in data for multiple sessions

    Design Alternative

    Key Words: trade offs, persistent, bean managed persistence, container managed persistence, data access object

    With data access object design, we confront the short comings of the coarse grain session bean, by exploiting concurrency for data access, limiting I/O and network latency when accessing the database. The trade off? Business logic is exposed to the web container

    Design alternative (session bean with entity bean)

    Design alternative: session bean with entity bean. Hybrid approach of coarse grain and data access object.

    Key Words: session facade, design pattern, remote interface, RMI

    We now embed a session facade, that provides concurrency AND does not exposed business logic. To that end, we use RMI (remote interface) to communicate between the containers and/or between the session facade and entity bean

  • Recovery management in Quicksilver  – Notes and Summary

    Recovery management in Quicksilver – Notes and Summary

    The original paper “Recovery management in quicksilver” introduces a transaction manager that’s responsible for managing servers and coordinates transactions. The below notes discusses how this operating system handles failures and how it makes recovery management a first class citizen.

    Cleaning up state orphan processes

    Cleaning up stale orphan processes

    Key Words: Ophaned, breadcrumbs, stateless

    In client/server systems, state gets created that may be orphaned, due to a process crash or some unexpected failure. Regardless, state (e.g. persistent data structures, network resources) need to be cleaned up

    Introduction

    Key Words: first class citizen, afterthought, quicksilver

    Quicksilver asks if we can make recovery a first class citizen since its so critical to the system

    Quiz Introduction

    Key Words: robust, performance

    Users want their cake and eat it too: they want both performance and robustness from failures. But is that possible?

    Quicksilver

    Key Words: orphaned, memory leaks

    IBM identified problems and researched this topic in the early 1980s

    Distributed System Structure

    Distributed System Structure

    Key Words: microkernel, performance, IPC, RPC

    A structure of multiple tiers allows extensibility while maintaining high performance

    Quicksilver System Architecture

    Quicksilver: system architecture

    Key Words: transaction manager

    Quicksilver is the first network operating system to propose transactions for recovery management. To that end, there’s a “Transaction Manager” available as a system service (implemented as a server process)

    IPC Fundamental to System Services

    IPC fundamental to system services

    Key Words: upcall, unix socket, service_q data structure, rpc, asynchronous, synchronous, semantics

    IPC is fundamental to building system services. And there are two ways to communicate with the service: synchronously (via an upcall) and asynchronously. Either, the center of this IPC communication is the service_q, which allows multiple servers to perform the body of work and allows multiple clients to enqueue their request

    Building Distributed IPC and X Actions

    Bundling Distributed IPC and Transactions

    Key Words: transaction, state, transaction link, transaction tree, IPC, atomicity, multi-site atomicity

    During a transaction, there is state that should be recoverable in the event of a failure. To this end, we build transactions (provided by the OS), the secret sauce for recovery management

    Transaction Management

    Transaction management
    Transaction management

    Key Words: transaction, shadow graph structure, tree, failure, transaction manager

    When a client requests a file, the client’s transaction manager becomes the owner (and root) of the transaction tree. Each of the other nodes are participants. However, since client is suspeptible to failing, ownership can be transferred to other participants, allowing the other participants to clean up the state in the event of a failure

    Distributed Transaction

    Key Words: IPC, failure, checkpoint records, checkpoint, termination

    Many types of failures are possible: connection failure, client failure, subordinate transaction manager failure. To handle these failures, transaction managers must periodically store the state of the node into a checkpoint record, which can be used for potential recovery

    Commit Initiated by Coordinator

    Commit initiated by Coordinator

    Key Words: Coordinator, two phase commit protocol

    Coordinator can send different types of messages down the tree (i.e. vote request, abort request, end commit/abort). These messages help clean up the state of the distributed system. For more complicated systems, like a file system, may need to implement a two phased commit protocol

    Upshot of Bundling IPC and Recovery

    Upshot of bundling IPC and Recovery

    Key Words: IPC, in memory logs, window of vulnerability, trade offs

    No extra communication needed for recovery: just ride on top of IPC. In other words, we have the breadcrumbs and the transaction manager data, which can be recovered

    Implementation Notes

    Key Words: transaction manager, log force, persistent state, synchronous IO

    Need to careful about choosing mechanism available in OS since log force impacts performance heavily, since that requires synchronous IO

    Conclusion

    Key Words: Storage class memories

    Ideas in quicksilver are still present in contemporary systems today. The concepts made their way into LRVM (lightweight recoverable virtual machine) and in 2000, found resurgence in Texas operating system

  • Operating System Transactions – Summary and notes

    Operating System Transactions – Summary and notes

    This post is a cliff notes version I scrapped together after reading the paper Operating Systems Transactions. Although I strongly recommend you read the paper if you are interested in how the authors pulled inspiration from database systems to create a transactional operating system, this post should give you a good high overview if you are short on time and need a quick and shallow understanding.

    Abstract

    • System transactions enable application developers to update OS resources in an ACID (atomic, consistent, isolated, and durable) fashion.
    • TxOS is a variant of Linux that implements system transactions using new techniques, allowing fairness between system transactions and non-transaction activities

    Introduction

    • The difficulty lies in making updates to multiple files (or shared data structures) at the same time. One example of this is updating user accounts, which requires making changes to the following files: /etc/passwd, /etc/shadow, /etc/group
    • One way for ensuring that a file is atomically updates is by using a “rename” operation, this system call replacing the contents of a file.
    • But for more complex updates, we’ll need to use something like flock for handling mutual exclusion. These advisory locks are just that: advisory. Meaning, someone can bypass these control, like an administrator, and just update the file directly.
    • Although one approach to fix these concurrency problems is by adding more and more system calls. But instead of taking this approach of constantly identifying and eliminating race conditions, why not percolate the responsibility up to the end user, by allowing system transactions?
    • These system transactions is what the paper proposes and this technique allows developers to group their transaction using system calls: sys_xbegin() and sysx_xend().
    • This paper focuses on a new approach to OS implementation and demonstrates the utility of system transactions by creating multiple prototypes.

    Motivating Examples

    • Section covers two common application consistency problems: software upgrade and security
    • Both above examples and their race conditions can be solved by using ”’system transactions”’

    Software installation or upgrade

    • Upgrading software is common but difficult
    • There are other approaches, each with their own drawbacks
    • One example is using a checkpoint based system. With checpoints, system can rollback. However, files not under the control of the checkpoint cannot be restored.
    • To work around the shortcomings of checkpoint, system transactions can be used to atomically roll forward or rollback the entire installation.

    Eliminating races for security

    • Another type of attack is interleaving a symbolic link in between a user’s access and open system calls
    • By using transactions, the symbolic link is serialized (or ordered) either before or after and cannot see partial updates
    • The approach of adding transactions is more effective long term, instead of fixing race conditions as they pop up

    Overview

    • System transactions make it easy on the developer to implement
    • Remainder of section describes the API and semantics

    System Transactions

    • System transactions provide ACID (atomic, consistent, isolation, durability) semantics – but instead of at the database level, at the operating system level
    • Essentially, application programmer wraps their code in sys_xbegin() and sys_xend()

    System transaction semantics

    • Similar to database semantics, system transactions are serializable and recoverable
    • Transactions are atomic and can be rolled back to a previous state
    • Transactions are durable (i.e. once transaction results are committed, they survive system crashes)
    • Kernel enforces the following invariant: only a single writer at a time (per object)
    • If there are multiple writers, system will detect this condition and abort one of the writers
    • Kernel enforces serialization
    • Durability is an option

    Interaction of transactional and non-transactional threads

    • Serialization of transaction and non-transational updates is caclled strong isolation
    • Other implementations do not take a strong stance on the subject and are semantically murkey
    • By taking a strong stance, we can avoid unexpected behavior in the presence of non-transactional updates

    System transaction progress

    • OS guarantees system transactions do not livelock with other system transactions
    • If two transactions are in progress, OS will select one of the transactions to commit, while restarting the other transaction
    • OS can enforce policies to limit abuse of transactions, similar to how OS can control access to memory, disk space, kernel threads etc

    System transactions for system state

    • Key point: system transactions provide ACID semantics for system state but not for application state
    • When a system transaction aborts, OS will restore kernel data structures, but not touch or revert application state

    Communication Model

    • Application programmer is responsible for not adding code that will communicate outside of a transaction. For example, by adding a request to a non-transactional thread, the application may deadlock

    TxOS overview

    TXOS Design

    • System transactions guarantee strong isolation

    Interoperability and fairness

    • Whether or not a thread is a transactional or non transactional thread, it must check for conflicting annotation when accessing a kernel object
    • Often this check is done at the same time when a thread acquires a lock on the object
    • When there’s a conflict between a transaction and non-transactional thread, this is called asymmetric conflict. Instead of aborting the transaction, TxOS will suspend the non-transactional thread, promoting fairness between transactions and non-transactional threads.

    Managing transactional state

    • Historically, databases and transactional OS will update data in place and maintain an undo log: this is known as eager version management
    • ”Isn’t the undo log approach the approach the light recoverable virtual machine takes?”
    • In eager version management, systems hold lock until the commit is completed and is also known as two-phase locking
    • Deadlocking can happen and one typical strategy is to expose a timeout parameter to users
    • Too short of a timeout starves long transactions. Too long of a deadlock and can starve performance (this is a trade off, of course)
    • Unfortunately, eager version management can kill performance since the transaction must process its redo log and jeopardizes system’s overall performance
    • Therefore, TxOS uses lazy version management, operating on private copies of data structures
    • Main disadvantage of lazy versioning is the additional commit latency due to copying updates of the underlying data structures

    Integration with transactional memory

    • Again, system transactions protect system state: not application state
    • Users can integrate iwth user level transaction memory systems if they want to protect application state
    • System calls are forbidden during user transactions since allowing so would violate transactional semantics

    TxOS Kernel Implementation

    Versioning data

    • TxOS applies a technique that’s borrowed from software transactional memory systems
    • During a transaction, a private copy of the object is made: this is known as a the shadow object
    • The other object is known as “stable”
    • During the commit, shadow object replaces the stable
    • A naive approach would be to simply replace the stable pointer, since the object may be the target of pointers from several other objects
    • For efficient commit of lazy versioned data, need to break up data into header and data.
    • ”Really fascinating technique…”
    • Maintain a header and the header pointers to the object’s data. That means, other objects always access data via the header, the header never replaced by a transaction
    • Transactional code always has speculative object
    • The header splits data into different payloads, allowing the data to be accessed disjointly
    • OS garbage collects via read-copy update
    • Although read only data avoids cost of duplicating data, doing so complicates the programming model slightly
    • Ultimately, RCU is a technique that supports efficient, concurrent access to read-mostly data.

    Conflict detection and resolution

    • TxOS provides transactions for 150 of 303 system calls in Linux
    • Providing transactions for these subset system calls requires an additional 3,300 lines of code – just for transaction management alone
    • A conflict occurs when transaction is about to write to an object but that object has been written by another transaction
    • Header information is used to determine the reader count (necessary for garbage collection)
    • A non-null writer pointer indicates an active transactional writer. Similarly, an empty reader lists means there are no readers
    • All conflicts are arbitrated by the contention manager
    • During a conflict, the contention manager arbitrates by using an osprio policy: the process with the higher scheduling process wins. But if both processes have the same priority, then the older one wins: this policy is known as timestamp.

    Asymmetric conflicts

    • non-transactional threads cannot be rolled back, although transactional threads can always be rolled back. That being said, there must be mechanism to resolve the conflict in favor of the transactional thread otherwise that policy always favor the non-transactional thread
    • non-transactional threads cannot be rolled back but they can be preemted, a recent feature of Linux

    Minimizing conflicts on lists

    • Kernel relies heavily on linked lists data structures

    Managing transaction state

    • TxOS adds transaction objects to the kernel
    • Inside of transaction struct, the status (probably an alias to uint8_t) is updated atomically with a compare and swap operation
    • If transaction system call cannot complete because of conflict, it must abort
    • Roll back is possible by saving register state on the stack at the beginning of the system call, in the “checkpointed_registers” field
    • During abort, restore register state and call longjmp
    • Certain operations must not be done until commit; these operations are stored in deferred_ops. Similarly, some operations must be done during abort, and these operations are stored in undo_ops field.
    • Workset_list is a skip list that contains references to all objects in the transaction and the transaction’s private copies

    Commit protocol

    • When sys_xend (i.e. transaction ends), transaction acquires lock for all items in (above mentioned) workset.
    • Once all locks are acquired, transaction performs one final check in its status word and verifies that the status has been set to abort.

    Abort protocol

    • Abort must happen when transaction detects that it lost a conflict
    • Transaction must decrement the reference count and free the shadow objects

    User level transactions

    • Can only support user-level transactions by coordinating commit of application state with system transaction’s commit

    Lock-based STM requirements

    • Used a simplified variant of two-phase commit protocol
    • Essentially, user uses sys_xend() system call and must inspect the return code so that the user application can then decide what to do based off of the system call’s transaction

    TxOS Kernel Subsystems

    • Remainder will discuss ACID semantics
    • Example will include ext3 file system

    Transactional file system

    • Managed versioned data in the virtual filesystem layer
    • File system only needs to provide atomic updates to stable storage (i.e. via a journal)
    • By guaranteeing writes are done in a single journal transaction, ext3 is now transactional

    Multi-process transactions

    • Forked children execute until sys_xend() or the process exits

    Signal delivery

    • Application can decide whether to defer a signal until a later point
    • If deferred, signals are placed into queue

    Future work

    • TxOS does not provide transactional semantics for all OS resources
    • If attempting to use transaction on unsupported resource, transaction will be aborted
  • RioVista – Summary and notes

    RioVista – Summary and notes

    Introduction

    Lesson outline for RioVista

    Key Words: ACID, transactions, synchronous I/O

    RioVista picks up where LRVM left off and aims for a performance conscience transaction. In other words, how can RioVista reduce the overhead of synchronous I/O, attracting system designers to use transactions

    System Crash

    Two types of failures: power failure and software failure

    Key Words: power crash, software crash, UPS power supply

    Super interesting concept that makes total sense (I’m guessing this is actually implemented in reality). Take a portion of the memory and battery back it up so that it survives crashes

    LRVM Revisited

    Upshot: 3 copies by LRVM

    Key Words: undo record, window of vulnerability

    In short, LRVM can be broken down into begin transaction, end transaction. In the former, portion of memory segment is copied into a backup. At the end of the transaction, data persisted to disk (blocking operation, but can be bypassed with NO_FLUSH option). Basically, increasing vulnerability of system to power failures in favor of performance. So, how will a battery backed memory region help?

    Rio File Cache

    Creating a battery backed file cache to handle power failures

    Key Words: file cache, persistent file cache, mmap, fsync, battery

    In a nutshell, we’ll use a battery backed file cache so that writes to disk can be arbitrarily delayed

    Vista RVM on Top of RIO

    Vista – RMV on top of Rio

    Key Words: undo log, file cache, end transaction, memory resisdent

    Vista is a library that offers same semantics of LRVM. During commit, throw away the undo log; during abort, restore old image back to virtual memory. The application memory is now backed by file cache, which is backed by a power. So no more writes to disk

    Crash Recovery

    Key Words: idempotency

    Brilliant to make the crash recovery mechanism the exact same scenario as an abort transaction: less code and less edge cases. And if the crash recovery fails: no problem. The instruction itself is idempontent

    Vista Simplicity

    Key Words: checkpoint

    RioVista simplifies the code, reducing 10K of code down to 700. Vista has no redo logs, no truncation, all thanks to a single assumption: battery back DRAM for portion of memory

    Conclusion

    Key Words: assumption

    By assuming there’s only software crashes (not power), we can come to an entirely different design

  • Lightweight recoverable virtual machine – Summary and Notes

    Lightweight recoverable virtual machine – Summary and Notes

    Summary and main take away

    As system designers, we can make persistence into the virtual memory manager, offering persistence to application developers. However, it’s no easy feat: we need to ensure that the solution performs well. To this end, the virtual machine manager offers an API that allows developer to wrap their code in transactions; underneath the hood, the virtual machine manager uses redo logs that persists the user changes to disk which can defend against failures.

    Persistence

    Why is persistence needed?

    Key Words: inode, subsystem, virtual memory management, log sequence

    We can bake persistent into the virtual memory manager (VMM) but building an abstraction is not enough. Instead, we need to ensure that the solution is performant and instead of committing each VMM change to disk, we aggregate them into a log sequence (just like the previous approaches in distributed file system) so that 1) we write in a contiguous block

    Server Design

    Server Design – persist metadata, normal data structures

    Key Words: inodes, external data segment

    The designer of the application gets to decide which virtual addresses will be persisted to external data storage

    Server Design (continued)

    Key Words: inodes, external data segment

    The virtual memory manager offers external data segments, allowing the underlying application to map portions of its virtual address space to segments backed by disk. The model is simple, flexible, and performant. In a nutshell, when the application boots up, the application selects which portions of memory must be persisted, giving the application developer full control

    RVM Primitives

    Key Words: transaction

    RVM Primitives: initialization, body of server code

    There are three main primitives: initialize, map, and unmap. And within the body of the application code, we use transactions: begin transaction, end transaction, abort transaction, and set range. The only non obvious statement is set_range: this tells the RVM runtime the specific range of addresses within a given transaction that will be touched. Meaning, when we perform a map (during initialization), there’s a larger memory range and then we create transactions within that memory range

    RVM Primitives (continued)

    RVM Primitives – transaction code and miscellaneous options

    Key Words: truncation, flush, truncate

    Although RVM automatically handles the writing of segments (flushing to disk and truncating log records), application developers can call those procedures explicitly

    How the Server uses the primitives

    How the server uses the primitives – begin and end transaction

    Key Words: critical section, transaction, undo record

    When transaction begins, the LRVM creates an undo record: a copy of the range specified, allowing a rollback in the event an abort occurs

    How the Server uses the primitives (continued)

    How the server uses the primitives – transaction details

     

    Key Words: undo record, flush, persistence

    During end transaction, the in memory redo log will get flushed to disk. However, by passing in a specific mode, developer can explicitly not call flush (i.e. not block) and flush the transaction themselves

    Transaction Optimizations

    Transaction Optimizations – ways to optimize the transaction

     

    Key Words: window of vulnerability

    With no_restore mode in begin transaction, there’s no need to create a in memory copy; similarly, no need to flush immediately with lazy persistence; the trade off here is that there’s an increase window of vulnerability

    Implementation

    Implementation – redo log and commit

     

    Key Words: forward displacement, transaction, reverse displacement

    Redo log allows traversal in both directions (reverse for recovery) and only new values are written to the log: this implementation allows good performance

    Crash Recovery

    Crash Recovery – resuming from a crash

     

    Key Words: crash recovery

    In order to recover from a crash, the system traverses the redo log, using the reverse displacement.Then, each range of memory (along with the changes) are applied

    Log Truncation

    Log truncation – runs in parallel with forward processing

     

    Key Words: log truncation, epoch

    Log truncation is probably the most complex part of LRVM. There’s a constant tug and pull between performance and crash recovery. Ensuring that we can recover is a main feature but it adds overhead and complexity since we want the system to make forward progress while recovering. This end, the algorithm breaks up data into epochs

  • Testing your gRPC services using grpc_cli

    Testing your gRPC services using grpc_cli

    This post may be helpful for you if you are building gRPC services and want a convenient way to test your service using a command line tool. Similar to using cURL when testing HTTP(s) services, I wanted an easy way to test the gRPC services that I’m building.

    Originally, I had originally planned to whip together a tiny C++ program that sends protobuf messages to my MapReduce service that I’m building for advanced operating systems course. Fortunately, a testing tool already exists: grpc_cli. Even better is that the tool ships with the grpc source code.

    So follow along if you want to install the grpc command line tool, enable server reflection, and execute a few examples.

    Note: This post assumes that you are programming in C++ and your operating system is Ubuntu

    Install grpc_cli

    Follow the steps below if you are running on Ubuntu. If you are running gRPC on your mac, then you’ll want to substitute the apt-get command with brew install, as described in the grpc command line documentation.

    1. Clone the grpc repository

      [code lang=”bash”]git clone git@github.com:grpc/grpc.git[/code]

    2. Add submodules

      [code lang=”bash”]$ git submodule update –init[/code]

    3. Install libflags-dev

      [code lang=”bash”]$ sudo apt-get install libgflags-dev[/code]

    4. Compile grpc

      [code lang=”bash”]$ mkdir -p cmake/build
      $ cd cmake/build
      $ cmake -DgRPC_BUILD_TESTS=ON ../..
      $ make grpc_cli
      [/code]

    Enable Reflection for your service

    In your grpc service code, before you bind and listen, you’ll need to enable reflection. Although not entirely necessary for interacting with your service, not having reflection enabled means you cannot use many of the grpc_cli commands like list.

    [code lang=”bash”]
    grpc::reflection::InitProtoReflectionServerBuilderPlugin();
    grpc::ServerBuilder builder;
    builder.AddListeningPort(server_address, grpc::InsecureServerCredentials());
    builder.RegisterService(&service);

    [/code]

    Add grpc++_reflection library

    Make sure that you are adding grpc++ library when building your project. If you are a student in advanced operating systems, you’ll need to update GeneratedProtos.cmake and link the gRPC::grpc++_reflection library as follows:

    [code lang=”bash”]
    % git diff
    diff –git a/src/GenerateProtos.cmake b/src/GenerateProtos.cmake
    index c6a80bc..cce2d51 100644
    — a/src/GenerateProtos.cmake
    +++ b/src/GenerateProtos.cmake
    @@ -75,5 +75,5 @@ add_custom_command(
    )

    add_library(p4protolib ${ProtoHeaders} ${ProtoSources})
    -target_link_libraries(p4protolib PUBLIC protobuf::libprotobuf gRPC::grpc++)
    +target_link_libraries(p4protolib PUBLIC protobuf::libprotobuf gRPC::grpc++ gRPC::grpc++_reflection)
    target_include_directories(p4protolib PUBLIC ${CMAKE_CURRENT_BINARY_DIR} ${CMAKE_CURRENT_BINARY_DIR})
    [/code]

     

    Using grpc_cli

    I’d encourage you to explore the grpc_cli by checking out the official user guide. However, for the purpose of this post, I want to show you how to list your service as well as how to invoke the “MapReduce” service.

    Examples

    Start your gRPC service

    In the example below, I’m staring my worker process that listens on localhost:50051. Here’s a snippet of my very bare service:

    [code lang=”proto3″]
    service MapReduce {
    rpc map(MapRequest) returns (MapResponse) {}
    rpc reduce(ReduceRequest) returns (ReduceResponse) {}
    [/code]

    Starting the service

    [code lang=”bash”]
    $ ./mr_worker localhost:50051
    Listening on ip address: localhost
    Listening on port: 50051
    Server listening on localhost:50051
    [/code]

    List the services

    After starting my service (that has reflection enabled), the service shows up when I execute the list command:

    [code lang=”bash”]
    $ ./grpc_cli ls localhost:50051
    masterworker.MapReduce
    grpc.reflection.v1alpha.ServerReflection
    grpc.health.v1.Health
    [/code]

    Invoking “MapReduce”

    [code lang=”bash”]
    $ ./grpc_cli call localhost:50051 masterworker.MapReduce.map "shard_data: ‘123’"
    connecting to localhost:50051
    Rpc succeeded with OK status
    [/code]

    Passing in multiple fields

    [code lang=”bash”]
    $ ./grpc_cli call localhost:50051 masterworker.MapReduce.reduce "filename: ‘123’ worker_hostname: ‘localhost’ worker_port: 50051"
    connecting to localhost:50051
    Rpc succeeded with OK status
    [/code]

    References

    • https://grpc.github.io/grpc/cpp/md_doc_server_reflection_tutorial.html
    • https://github.com/grpc/grpc/blob/master/doc/command_line_tool.md
    • https://stackoverflow.com/questions/61641174/unable-to-link-the-grpc-libraries-in-windows