Matt Chung's Site

Author: mattchung

RioVista – Summary and notes

Introduction

Lesson outline for RioVista

Key Words: ACID, transactions, synchronous I/O

RioVista picks up where LRVM left off and aims for a performance conscience transaction. In other words, how can RioVista reduce the overhead of synchronous I/O, attracting system designers to use transactions

System Crash

Two types of failures: power failure and software failure

Key Words: power crash, software crash, UPS power supply

Super interesting concept that makes total sense (I’m guessing this is actually implemented in reality). Take a portion of the memory and battery back it up so that it survives crashes

LRVM Revisited

Upshot: 3 copies by LRVM

Key Words: undo record, window of vulnerability

In short, LRVM can be broken down into begin transaction, end transaction. In the former, portion of memory segment is copied into a backup. At the end of the transaction, data persisted to disk (blocking operation, but can be bypassed with NO_FLUSH option). Basically, increasing vulnerability of system to power failures in favor of performance. So, how will a battery backed memory region help?

Rio File Cache

Creating a battery backed file cache to handle power failures

Key Words: file cache, persistent file cache, mmap, fsync, battery

In a nutshell, we’ll use a battery backed file cache so that writes to disk can be arbitrarily delayed

Vista RVM on Top of RIO

Vista – RMV on top of Rio

Key Words: undo log, file cache, end transaction, memory resisdent

Vista is a library that offers same semantics of LRVM. During commit, throw away the undo log; during abort, restore old image back to virtual memory. The application memory is now backed by file cache, which is backed by a power. So no more writes to disk

Crash Recovery

Key Words: idempotency

Brilliant to make the crash recovery mechanism the exact same scenario as an abort transaction: less code and less edge cases. And if the crash recovery fails: no problem. The instruction itself is idempontent

Vista Simplicity

Key Words: checkpoint

RioVista simplifies the code, reducing 10K of code down to 700. Vista has no redo logs, no truncation, all thanks to a single assumption: battery back DRAM for portion of memory

Conclusion

Key Words: assumption

By assuming there’s only software crashes (not power), we can come to an entirely different design

November 16, 2020
Lightweight recoverable virtual machine – Summary and Notes

Summary and main take away

As system designers, we can make persistence into the virtual memory manager, offering persistence to application developers. However, it’s no easy feat: we need to ensure that the solution performs well. To this end, the virtual machine manager offers an API that allows developer to wrap their code in transactions; underneath the hood, the virtual machine manager uses redo logs that persists the user changes to disk which can defend against failures.

Persistence

Why is persistence needed?

Key Words: inode, subsystem, virtual memory management, log sequence

We can bake persistent into the virtual memory manager (VMM) but building an abstraction is not enough. Instead, we need to ensure that the solution is performant and instead of committing each VMM change to disk, we aggregate them into a log sequence (just like the previous approaches in distributed file system) so that 1) we write in a contiguous block

Server Design

Server Design – persist metadata, normal data structures

Key Words: inodes, external data segment

The designer of the application gets to decide which virtual addresses will be persisted to external data storage

Server Design (continued)

Key Words: inodes, external data segment

The virtual memory manager offers external data segments, allowing the underlying application to map portions of its virtual address space to segments backed by disk. The model is simple, flexible, and performant. In a nutshell, when the application boots up, the application selects which portions of memory must be persisted, giving the application developer full control

RVM Primitives

Key Words: transaction

RVM Primitives: initialization, body of server code

There are three main primitives: initialize, map, and unmap. And within the body of the application code, we use transactions: begin transaction, end transaction, abort transaction, and set range. The only non obvious statement is set_range: this tells the RVM runtime the specific range of addresses within a given transaction that will be touched. Meaning, when we perform a map (during initialization), there’s a larger memory range and then we create transactions within that memory range

RVM Primitives (continued)

RVM Primitives – transaction code and miscellaneous options

Key Words: truncation, flush, truncate

Although RVM automatically handles the writing of segments (flushing to disk and truncating log records), application developers can call those procedures explicitly

How the Server uses the primitives

How the server uses the primitives – begin and end transaction

Key Words: critical section, transaction, undo record

When transaction begins, the LRVM creates an undo record: a copy of the range specified, allowing a rollback in the event an abort occurs

How the Server uses the primitives (continued)

How the server uses the primitives – transaction details

Key Words: undo record, flush, persistence

During end transaction, the in memory redo log will get flushed to disk. However, by passing in a specific mode, developer can explicitly not call flush (i.e. not block) and flush the transaction themselves

Transaction Optimizations

Transaction Optimizations – ways to optimize the transaction

Key Words: window of vulnerability

With no_restore mode in begin transaction, there’s no need to create a in memory copy; similarly, no need to flush immediately with lazy persistence; the trade off here is that there’s an increase window of vulnerability

Implementation

Implementation – redo log and commit

Key Words: forward displacement, transaction, reverse displacement

Redo log allows traversal in both directions (reverse for recovery) and only new values are written to the log: this implementation allows good performance

Crash Recovery

Crash Recovery – resuming from a crash

Key Words: crash recovery

In order to recover from a crash, the system traverses the redo log, using the reverse displacement.Then, each range of memory (along with the changes) are applied

Log Truncation

Log truncation – runs in parallel with forward processing

Key Words: log truncation, epoch

Log truncation is probably the most complex part of LRVM. There’s a constant tug and pull between performance and crash recovery. Ensuring that we can recover is a main feature but it adds overhead and complexity since we want the system to make forward progress while recovering. This end, the algorithm breaks up data into epochs

November 15, 2020
Distributed File Systems – Summary and notes

This lesson introduces network file system (NFS) and presents the problems with it, bottlenecks including limited cache and expensive input/output (I/O) operations. These problems motivate the need for a distributed file system, in which there is no longer a centralized server. Instead, there are multiple clients and servers that play various roles including serving data

Quiz

Key Words: computer science history

Sun built the first ever network file system back in 1985

NFS (network file system)

NFS – clients and server

Key Words: NFS, cache, metadata, distributed file system

A single server that stores entire network file system will bottle neck for several reasons, including limited cache (due to memory), expensive I/O operations (for retrieving file metadata). So the main question is this: can we somehow build a distributed file system?

DFS (distributed file system)

Distributed File Server – each file distributed across several nodes

Key Words: Distributed file server

The key idea here is that there is no longer a centralized server. Moreover, each client (and server) can play the role of serving data, caching data, and managing files

Lesson Outline

Key Words: cooperative caching, caching, cache

We want to cluster the memory of all the nodes for cooperative caching and avoid accessing disk (unless absolutely necessary)

Preliminaries (Striping a file to multiple disks)

Key Words: Raid, ECC, stripe

Key idea is to write files across multiple disks. By adding more disks, we increase the probability of failure (remember computing those failures from high performance computing architecture?) so we introduce a ECC (error correcting) disk to handle failures. The downside of striping is that it’s expensive, not just in cost (per disk) but expensive in terms of overhead for small files (since a small file needs to be striped across multiple disks)

Preliminaries

Preliminaries: Log structured file system

Key Words: Log structured file system, log segment data structure, journaling file system

In a log structured file system, the file system will store changes to a log segment data structure, the file system periodically flushing the changes to disk. Now, anytime a read happens, the file is constructed and computed based off of the delta (i.e. logs). The main problem this all solves is the small file problem (the issue with striping across multiple disks using raid). With log structure, we now can stripe the log segment, reducing the penalty of having small files

Preliminaries Software (RAID)

Preliminaries – Software Raid

Key Words: zebra file system, log file structure

The zebra file system combines two techniques for handling failures: log file structure (for solving the small file problem) and software raid. Essentially, error correction lives on a separate drive

Putting them all together plus more

Pputting them all together: log based, cooperative caching, dynamic management, subsetting, distributed

Key Words: distributed file system, zebra file system

The XFS file system puts all of this together, standing on top of the shoulders who built Zebra and built cooperating caching. XFS also adds new technology that will be discussed in later videos

Dynamic Management

Dynamic Management

Key Words: Hot spot, metadata, metadata management

In a traditional NFS server, data blocks reside on disk and memory includes metadata. But in a distributed file system, we’ll extend caching to the client as well

Log Based Striping and Stripe Groups

Log based striping and stripe groups

Key Words: append only data structure, stripe group

Each client maintains its own append only log data structure, the client periodically flushing the contents to the storage nodes. And to prevent reintroducing the small file problem, each log fragment will only be written to a subset of the storage nodes, those subset of nodes called the stripe group

Stripe Group

Stripe Group

Key Words: log cleaning

By dividing the disks into stripe groups, we promote parallel client activities and increases availability

Cooperating Caching

Cooperative Caching

Key Words: coherence, token, metadata, state

When a client requests to write (to a block), the manager (who maintains state, in the form of metadata, about each client) will cache invalidate the clients and grant the writer a token to write for a limited amount of time

Log Cleaning

Log Cleaning

Key Words: prime, coalesce, log cleaning

Periodically, node will coalesce all the log segment differences into a single, new segment and then run a garbage collection to clean up old segments

Unix File System

Unix File System

Key Words: inode, mapping

On any unix file system, there are inodes, which map filenames to data blocks on disk

XFS Data Structures

XFS Data Structures

Key Words: directory, map

Manager node maintains data structures to map a filename to the actual data blocks from the storage servers. Some data structures include the file directory, and i_map, and stripe group map

Client Reading a file own cache

Client Reading a file – own cache

Key Words: Pathological

There are three scenarios for client reading a file. The first (i.e. best case) is when the data blocks sit in the unix cache of the host itself. The second scenario is the client querying the manager, and the manager signals another peer to send its cache (instead of retrieving from disk). The worst case is the pathological case (i.e. see previous slide) where we have to go through the entire road map of talking to manager, then looking up metadata for the stripe group, and eventually pulling data from the disk

Client Writing a File

Client Writing a file

Key Words: distributed log cleaning

When writing, client will send updates to its log segments and then update the manager (so manager has up to date metadata)

Conclusion

Techniques for building file systems can be reused for other distributed systems

November 12, 2020
Daily Review: Day ending in 2020/11/11
The last couple days at work have taken a toll on me emotionally. To lift me up, Jess used her arts and crafts skills to make a doll — made from Metric’s shedded hair (see featured image).

Family
- Felt myself tear up when pushing Elliott in the stroller. The two of us were having a blast while walking the dogs at the local park this morning. With Elliott strapped into her stroller, I pushed her across the green grassy park, the two of us racing against our imaginary nascar opponents. Elliott had such a blast and there was a little wrinkle in her nose and she put on a wide smile. And in that moment, a tremendous amount of sadness poured over me — totally unexpected emotion. Suddenly I was reminded of my own child hood and how often I felt alone … I never want Elliott to feel that same way. Isn’t that the point of parenting? Making our children’s lives a little (or a lot) better than ours? It’s true when they say that your children will bring out your best and worst memories from childhood. And this was my first experience of own childhood creeping back into my parenting life … I wonder what’s in store for me in the future.
Work
- Reviewed pull requests from multiple colleagues. As a project lead, I’m trying my best to divide my own time between implementing features while ensuring that the project makes forward progress.
- Read through the description and lightly scanned all the outstanding pull requests against our code base. With 50+ developers working on the team, it’s nearly impossible to stay on top of what features are being developed. One way to stay in touch is to simply read through the pull requests that are coming through.
- Finished filling out the threat model for a security review that I’m submitting. Many of the questions are unrelated to the feature that I’m launching.
Graduate School
- Laid out the threading model that allows me and my project partner to use synchronous communication over gRPC while achieving asynchronous handling.
Writing
- Published a blog post on how to build and easily test grpc service using a command line tool. After writing up the blog post that targets other students enrolled in my advanced operating systems course, I posted the link on Piazza and it was nice to see that my write up will assist other students, the entire point of me spending an extra 30 minutes writing the documentation.
November 12, 2020
Testing your gRPC services using grpc_cli
This post may be helpful for you if you are building gRPC services and want a convenient way to test your service using a command line tool. Similar to using cURL when testing HTTP(s) services, I wanted an easy way to test the gRPC services that I’m building.

Originally, I had originally planned to whip together a tiny C++ program that sends protobuf messages to my MapReduce service that I’m building for advanced operating systems course. Fortunately, a testing tool already exists: grpc_cli. Even better is that the tool ships with the grpc source code.

So follow along if you want to install the grpc command line tool, enable server reflection, and execute a few examples.

Note: This post assumes that you are programming in C++ and your operating system is Ubuntu

Install grpc_cli

Follow the steps below if you are running on Ubuntu. If you are running gRPC on your mac, then you’ll want to substitute the apt-get command with brew install, as described in the grpc command line documentation.
1. Clone the grpc repository
  [code lang=”bash”]git clone git@github.com:grpc/grpc.git[/code]
2. Add submodules
  [code lang=”bash”]$ git submodule update –init[/code]
3. Install libflags-dev
  [code lang=”bash”]$ sudo apt-get install libgflags-dev[/code]
4. Compile grpc
  [code lang=”bash”]$ mkdir -p cmake/build
  $ cd cmake/build
  $ cmake -DgRPC_BUILD_TESTS=ON ../..
  $ make grpc_cli
  [/code]
Enable Reflection for your service

In your grpc service code, before you bind and listen, you’ll need to enable reflection. Although not entirely necessary for interacting with your service, not having reflection enabled means you cannot use many of the grpc_cli commands like list.

[code lang=”bash”]
grpc::reflection::InitProtoReflectionServerBuilderPlugin();
grpc::ServerBuilder builder;
builder.AddListeningPort(server_address, grpc::InsecureServerCredentials());
builder.RegisterService(&amp;amp;amp;service);
…
[/code]

Add grpc++_reflection library

Make sure that you are adding grpc++ library when building your project. If you are a student in advanced operating systems, you’ll need to update GeneratedProtos.cmake and link the gRPC::grpc++_reflection library as follows:

[code lang=”bash”]
% git diff
diff –git a/src/GenerateProtos.cmake b/src/GenerateProtos.cmake
index c6a80bc..cce2d51 100644
— a/src/GenerateProtos.cmake
+++ b/src/GenerateProtos.cmake
@@ -75,5 +75,5 @@ add_custom_command(
)

add_library(p4protolib ${ProtoHeaders} ${ProtoSources})
-target_link_libraries(p4protolib PUBLIC protobuf::libprotobuf gRPC::grpc++)
+target_link_libraries(p4protolib PUBLIC protobuf::libprotobuf gRPC::grpc++ gRPC::grpc++_reflection)
target_include_directories(p4protolib PUBLIC ${CMAKE_CURRENT_BINARY_DIR} ${CMAKE_CURRENT_BINARY_DIR})
[/code]

Using grpc_cli

I’d encourage you to explore the grpc_cli by checking out the official user guide. However, for the purpose of this post, I want to show you how to list your service as well as how to invoke the “MapReduce” service.

Examples

Start your gRPC service

In the example below, I’m staring my worker process that listens on localhost:50051. Here’s a snippet of my very bare service:

[code lang=”proto3″]
service MapReduce {
rpc map(MapRequest) returns (MapResponse) {}
rpc reduce(ReduceRequest) returns (ReduceResponse) {}
[/code]

Starting the service

[code lang=”bash”]
$ ./mr_worker localhost:50051
Listening on ip address: localhost
Listening on port: 50051
Server listening on localhost:50051
[/code]

List the services

After starting my service (that has reflection enabled), the service shows up when I execute the list command:

[code lang=”bash”]
$ ./grpc_cli ls localhost:50051
masterworker.MapReduce
grpc.reflection.v1alpha.ServerReflection
grpc.health.v1.Health
[/code]

Invoking “MapReduce”

[code lang=”bash”]
$ ./grpc_cli call localhost:50051 masterworker.MapReduce.map "shard_data: ‘123’"
connecting to localhost:50051
Rpc succeeded with OK status
[/code]

Passing in multiple fields

[code lang=”bash”]
$ ./grpc_cli call localhost:50051 masterworker.MapReduce.reduce "filename: ‘123’ worker_hostname: ‘localhost’ worker_port: 50051"
connecting to localhost:50051
Rpc succeeded with OK status
[/code]

References
- https://grpc.github.io/grpc/cpp/md_doc_server_reflection_tutorial.html
- https://github.com/grpc/grpc/blob/master/doc/command_line_tool.md
- https://stackoverflow.com/questions/61641174/unable-to-link-the-grpc-libraries-in-windows
November 11, 2020
3 tips on getting eyeballs on your code review
“Why is nobody reviewing my code?”

I sometimes witness new engineers (or even seasoned engineers new to the company) submit code reviews that end up sitting idle, gaining zero traction. Often, these code reviews get published but comments never flow in, leaving the developer left scratching their head, wondering why nobody seems to be taking a look. To help avoid this situation, check out the 3 tips below for more effective code reviews.

3 tips for more effective code reviews

Try out the three tips for more effective code reviews. In short, you should:
1. Assume nobody cares
2. Strive for bite sized changes
3. Add a descriptive summary
1. Assume nobody cares

After you hit the publish button, don’t expect other developers to flock to your code review. In fact, it’s safe to assume that nobody cares. I know, that sounds a bit harsh but as Neil Strauss suggests,

“Your challenge is to assume — to count on — the completely apathy of the reader. And from there, make them interested.”

At some point in our careers, we all fall into this trap. We send out a review, one that lacks a clear description (see section below “Add a descriptive summary”) and then the code review would sometimes sits there, patiently waiting for someone to sprinkle comments. Sometimes, those comments never come.

Okay, it’s not that people don’t necessary care. It has more to do with the fact people are busy, with their own tasks and deliverable. They too are writing code that they are trying to ship. So your code review essentially pulls them away from delivering their own work. So, make it as easy as possible for them to review.

One way to do gain their attention is simply by giving them a heads up.

Before publishing your code review, send them an instant message or e-mail, giving them a heads up. Or if you are having a meeting with that person, tell them that you plan on sending out a code review and ask them if they can take a look at the code review. This puts your code review on their radars. And if you don’t see traction in an appropriate (which varies, depending on change and criticality), then follow up with them.

2. Strive for bite sized code reviews

Anything change beyond than 100-200 lines of code requires a significant amount of mental energy (unless the change itself is a trivial updates to comments or formatting). So how can you make it easier for your reviewer?

Aim for small, bite sized code reviews.

In my experience, a good rule of them is submit less than 100 lines of code. What if there’s no way your change can squeeze into double digits? Then consider breaking down the single code review into multiple, smaller sized code reviews and once all those independent code reviews are approved, submit a single code review that merges all those changes in atomically.

And if you still cannot break down a large code review into these lengths and find that it’s unavoidable to submit a large code review, then make sure you schedule a 15-30 minute meeting to discuss your large code review (I’ll create a separate blog post for this).

3. Add a descriptive summary for the change

I’m not suggesting you write a miniature novel when adding a description to your code review. But you’ll definitely need to write something with more substance than a one-liner: “Adds new module”. Rob Pike put’s it succinctly and his criteria for a good description includes “What, why, and background”.

In addition to adding this criteria, be sure to describe how you tested your code — or, better yet, ship your code review with unit tests. Brownie points if you explicitly call out what is out of scope. Limiting your scope reduces the possibility of unnecessary back-and-forth comments for a change that falls outside your scope.

Finally, if you want some stricter guidelines on how to write a good commit message, you might want to check out Kabir Nazir’s blog post on “How to write good commit messages.”

Summary

If you are having trouble with getting traction on your code reviews, try the above tips. Remember, it’s on you, the submitter of the code review, to make it as easy as possible for your reviews to leave comments (and approve).

Let’s chat more and connect! Follow me on Twitter: @memattchung
November 9, 2020
Daily Review – Day ending in 2020/11/05
Family
- Assembled our Berkey water filtering system. The instructions are quite complicated, actually. In addition to reading the manuals, I had to pull up a couple instructional YouTube videos to make sure that I was priming the filters correctly.
- Took Elliott on a late night walk. She had missed her nap and we needed to stretch her and keep her awake until her 7pm bed time.
Work
- Spent about an hour or so chipping away at codifying our operational dashboard, as part of hackathon.
- Read through a ton of documentation and watched a couple videos on virtual private connection (VPC). I should’ve read and watched all this training when I first joined the organization. Oh well. At least the material makes more sense now, now that I’ve had first hand exposure to the dataplane code. Anyways, all this material helped me build a better mental model needed to deliver my tech talk.
Graduate School
- The word coordinator kept popping up in the lectures. This coordinator often shows up in different code bases and systems within Amazon and I didn’t realize that the term has roots in the theory of distributed systems.
- Learning more about the value of checkpoints. I can definitely see me adopting and integrating checkpoints into the software I design and build, at least for the stateful applications that need to have robust recovery mechanisms.
- I’d like to learn more about two phase commit protocol. Similar to the word coordinator, two phase commit protocol term continues to pop up in the lectures and I bet it’s worth learning more about the specifics of this transaction strategy.
- Window of vulnerability. The trade off between persisting in memory log records to disk
November 6, 2020
Daily Review – Day ending in 2020/11/04
My favorite part of the day was waking up to a video that Jess recorded while I was asleep, a video capturing a little frog dancing on our window facing the backyard. My wife: she’s super cute.

Work
- Hosted and moderated a tech panel on career growth and promotions. On behalf of Asians@ at Amazon, I lead a conversational fire side chat with four (Asian) senior software development engineers about some of the non technical barriers that our community often faces throughout our career. Not speaking up. Not advocating for one self. Everyone on the panel nailed it (despite how nervous they were) and I look forward to creating and hosting new events in the future.
Health
- Fought off a killer headache that kicked in after I ate che (a Vietnamese dessert). My body rejects processed sugars and when I consume too much of it, my body sends me a strong signal in the form of a painful headache that lasts several hours, the only way to fight it off is to gulp down as much water as possible and flush the sugars out of my system.
Daily Photo

Dancing frog on the window
November 5, 2020
Daily Review – Day ending in 2020/11/03
Family
- Jess and I felt nervous about the presidential election results while watching the news online. Despite Biden leading in the polls, just as Hilary four years ago back in 2016, I’m sitting at the edge of my seat as the results come in, not confident at all that Biden can pull off a victory. Again, as I mentioned in earlier posts, although I cast my vote for Biden, I’m more voting to get Trump out.
Graduate School
- Learned of a brilliant technique to simplify crash recovery by removing an edge case and exercising the same code as the abort transaction. Technique from RioVista papers. The crash recovery is also idempotent so it can withstand power failure during the crash recovery itself. Idempotent. First heard of this term when I was learning Ansible back in the day. Something to think about again.
Work
- Checked in some GRE header code that basically implements RFC 2784 and 2890. Stepping through the RFC, I learned how much unnecessary overhead the protocol carries with it. A 3 bit version field that MUST always be zero? Granted, I understand that the protocol designers were trying to future proof the protocol but now I understand why a protocol like IPIP prevails given that IPIP will encapsulate only IP.
November 4, 2020
Daily Review – Day ending in 2020/11/02

Family

Elliott and I drove to Lowe’s home improvement store and picked up a new faucet. Normally, Elliott will cry after sitting in the car for more than 10 minutes but not this trip. She did amazing and the two us had a blast at Lowe’s and she helped me select our new faucet since the one installed in our kitchen busted, causing water to leak into and underneath the sink. Luckily, my dad is in town and helped replace the existing broken faucet with the one I just purchased.

Graduate School

Submitted project 3 assignment on gRPC and multi-threading. Similar to project 2, I partnered up with another Seattle based student (who also happens to be Vietnamese as well) and we divided the work down the middle and stitched our code together.

Watched half the lectures on RioVista. This lecture series seems to be a continuation of the last lecture series (on lightweight recoverable virtual machines), adding on top of the idea of using transactions to keep a log record of changes to the virtual memory address space. The key different with RioVista though is that they are targeting performance as their primary goal.

Health

Walked about 3 miles on the treadmill yesterday while working and studying. Last week, I could not find the remote that controls my treadmill but luckily Jess found it buried in the back of my office and luckily I was able to get some exercise in.

Work

Held a dry-run kick off meeting for a panel that I’m hosting and moderating this upcoming Wednesday. The purpose of the panel is to give people a sense of what it looks like to grow one’s career as a software engineer at Amazon.

November 3, 2020