Author: mattchung

  • Friday night in

    My wife (Jess) and I were both dead tired from yesterday—friends had come over to our house and cooked a Vietnamese meal the night before and we fell asleep around just before midnight, a little over two hours past our bedtime—and we had decided to spend the Friday night staying in doors, eating leftover, vegan soup and streaming a movie.

    Whenever we plop ourselves down in front of the TV, a “smart” LED television, we spend what feels like an eternity searching for the perfect movie that matches our mood, loading Netflix and scrolling up and down through the vast collection of their originals, switching to Amazon Prime, watching trailer after trailer after trailer.

    Eventually, we settled on Hidden Figures.

    Hidden figures centers on three black women, all working for the prestigious NASA during the civil rights movement, in the 1960s.  The movie’s protagonist is named Katherine, a math prodigy who was widowed and left with raising three children while juggling a full time job as a “computer”.  Because of her mathematical genius, Katherine is pulled into Freedom 7, a project aiming to send an American astronaut into orbit, a response to Russia’s recent victory of sending the first man into space.

    But I’m not here to discuss the movie.

    I’m here to reflect on my feelings that immediately followed watching the film.  If I had to put a label on my emotions, I would say that I felt inspired, followed by disappointment.

    I was inspired by how three women accomplished such great feats: one woman petitioning to take night classes at a local, segregated high school and becoming the first African American engineer at NASA; another woman who saw her role as a computer becoming obsolete and ended up teaching herself Fortran, eventually leading the IBM team; another crunching numbers for the rocket’s landing coordinates.  How can you not be inspired?

    Until I started reflecting on my own, ordinary life.  A number of questions popped into my head: What have I done with my life so far?  What have I accomplished? How am I sitting here on the warm, leather couch after a “long” day of work, when these three, underprivileged, hardworking women were out hustling.

    But then I return to my blessings.  I’m lucky. I have a beautiful, loving, nurturing wife who I adore.  I have two, tail wagging dogs that snuggle with us in bed, keeping us warm on those unexpecting, cold nights.

  • A brief introduction to cache organization

    As a software programmer, I always had a vague understanding of how an operating system fetches data from memory.  At an abstract level, I understood that a processor requests data from the memory controller, sending a message (with the memory address encoded) across the bus.

    But I learned recently that in addition to the system’s physical memory—the same memory I used to squeeze into the motherboard when installing RAM—there are multiple hardware caches, completely abstracted from the programmer.

    These caches sit between the processor and main memory, called: L1 cache and L2 cache and L3 cache.  Each cache differs: in cost, in size, and in distance from the CPU. The lower the digit, the higher cost, the smaller in size, and the closer it sits to CPU.  For example, if we compare L1 and L2 cache, L1 costs more, holds less data, and sits closer to the processor.

    When the processor wants to retrieve data from memory, it sends a request first lands in L1 cache’s world.  If L1 has that memory page cached, it immediately sends that data back to the processor, preventing the request from unnecessarily flowing towards the memory controller.  This pattern of checking local cache and forwarding requests repeats until the request eventually reaches the memory controller, where data is actually stored.

    The further we allow the CPU’s request to travel down the bus, we penalize the CPU, forcing it to wait, like a car at a stop sign, for longer cycles. For example, the CPU waits 4 cycles for L1 cache, 12 cycles for L2, 36 cycles for L3, and—wait for it—62 cycles when accessing main memory.  Therefore, we strive to design systems that cache as much data as possible and as close to the CPU, increasing overall system performance.

    We break down a cache into the following components:

    • Blocks
    • Lines
    • Sets
    • Tag
    sets, lines, blocks
    Cache organized into sets, lines, and blocks

    As you can see from the image above, we organize our cache sets (S), lines (L), and blocks (B).  One block of data represents 8 bits (1 byte) and every block of data is represented by a physical memory address. For example, the memory address 0x0000 may store 010101010 and 0x0001 may store 01110111 another.  We group these blocks together into a line, which store sequential blocks.  A line may store two or four or eight or sixteen bytes—it all depends on how we design the system.  Finally, each line belongs to a set, a bucket that stores one or more lines.  Like the number of bytes a line stores, a set can store one or two or three or forty—again, it all depends on our design.

    Together, the total number of sets, number of lines, and number of bytes determine the cache’s size, calculated with the following formula: cache size = S x E x B.

    In the next post, I’ll cover how a cache processes a memory address, determining whether it retrieves memory from cache or forwards the request to the next cache (or memory controller).

  • Defusing a Binary Bomb (phase 1)

    http://blog.itsmemattchung.com/2017/02/28/csapp-defusing-the-bomb-phase-1/

    I password protected the original post (email me for the password if you are interested in reading it).  When I posted the original link on reddit/r/compsci, multiple commenters suggested that I delete the article to avoid students from cheating (which was not my intention).  I then sent an e-mail to the professors (at CMU) and they kindly replied, asking me to remove the post:

    Matt,

    Thanks so much for your kind words. It’s great to hear that the book is helpful to you. While every student gets a slightly different bomb, the solution strategies for each phase are very similar. So it would be good if you could remove those posts.
    Thanks!
    Dave
  • Protected: Defusing a Binary Bomb (phase 1)

    This content is password protected. To view it please enter your password below:

  • How does the modulus operator work?

    As a programmer, I’ve written a line or two of code that includes the modulus operator (i.e. “return x % 2“).  But, never have I paused to think: “How does the underlying system carry out this operation?” In this post, I limit “underneath the hood” to the lowest level (human readable) programming language: assembly.

    So, I’ll take a program (written in C language) and dump it into assembly instructions. Then, I’ll explain each instruction in detail.

    My background in assembly programming

    Up until a few weeks ago, I had never studied assembly—I did flip threw a page or two of an assembly book, when a colleague mentioned, about five years ago, that his uncle programmed in assembly—and I certainly underestimated the role that assembly plays in my career.

    Sure—the days of writing pure assembly language evaporated decades ago, but the treasure lies in understanding how programs, written in higher level languages (e.g “C, Perl, PHP, Python”), ultimately boil down to assembly instructions.

    Modulus program

    So, I contrived a trivial modulus program written in C—let’s dissect it:

    // modulus.c
    
    int modulus(int x, int y){
        return x % y;
    }

    Converting C code into assembly

    Before a program can be executed by an operating system, we must first convert the program into machine code—we call this compiling.  Before we run our program (modulus.c) through the compiler, we need to discuss two arguments that we’ll pass to the compiler, in order to alter the default behavior.  The first argument, -0g, disables the compiler from optimizing the assembly instructions.  By default, the compiler (we’re using gcc) intelligently optimizes code—one way is by reducing the number of instructions.  The second argument, -S, instructs the compiler to stop at  just before it creates the machine code (unreadable by humans), and, instead, directs the compiler to create a file (modulus.s) containing the assembly instructions.

    # gcc -Og -S modulus.c
    

    The command above outputs a file, modulus.s, with the contents (unrelated lines removed):

    modulus:
    movl	%edi, %eax
    cltd
    idivl	%esi
    movl	%edx, %eax
    ret

    Let’s step through and explain each of the assembly instructions, one-by-one.

    mov

    When we want our CPU to perform any action, such as adding, substracting, multiplying, or dividing numbers, we need to first move bytes of data (an intermediate value, or from memory) to a register.  We move data to registers with the mov command, which is capable of moving data from:

    • register to memory
    • memory to register
    • register to register
    • immediate value to memory
    • immediate value to register

    The assembly above first moves data from one register (%edi) to another register (%eax).  This is necessary since subsequent instructions, such as cltd, rely on data being present in the %eax register.

    cltd

    cltd stands for “convert long to double long.”  But before we dig into why we need this instruction, we must detour and briefly explain the next instruction in line, idivl.

    When we send an idivl (divide long) instruction, the processor calcutes the numbers and store the quotient in one register, and stores the remainder in another.  These two values stored in the register are half the size; if the dividend is 32 bits, the cpu stores (1) 16-bit value in one register, and the other 16-bit value in another.

    Therefore, if we expect a 32-bit quotient and 32-bit remainder, then the dividend (which is, presumably, 32-bits) must be doubled—to 64-bits.  In our modulus program, we set a return value of type int.

    idivl

    idivil divides the numerator (stored in %eax) by the argument (or denominator) — the argument that we passed to the instruction.  In our assembly example above, idivl divides the value stored in %rax, by the value in %esi—x and y, respectively.

    movl

    At the end of a sequence, assembly (by default) returns whatever value is stored in the register %rax.  Therefore, the final instruction moves the remainder, not the quotient, from %edx to %rax.

    Wrapping up

    I hope I was able to share a thing or two on how a higher level program ultimately breaks down into simple assembly instructions.

    [1] I was banging my head against the wall until I found a easy to understand reason why we must convert long to double long: http://www.programmingforums.org/post12771.html

  • Here’s some assembly instructions … now write the corresponding C code

    A wide grin formed on my face, after successfully completing an exercise (from Computer Systems a Programmer’s perspective) that required me to write C code, based off a sequence of a six assembly instructions:

    void decode1(long *xp, long *yp, long *zp) {
    /* xp stored in %rdi, yp in %rsi, zp in %rdx)
    
    decode1:
        movq (%rdi), %r8
        movq (%rsi), %rcx
        movq (%rdx), %rax
        movq %r8, (%rsi)
        movq %rcx, (%rdx)
        movq %rax, (%rdi)

    The exercise consists of solely of mov instructions, which is capable of moving bytes (movq specifically moves a quad word) from:

    • register to memory
    • register to register
    • immediate value to register
    • immediate value to memory

    So, I pulled a pen and paper from my backpack and began tracing the flow of data between the multiple registers and memory locations.  I then took that chicken scratch, and wrote the corresponding C code.  Finally, I flipped to the end of the chapter to compare my code.  Bingo—the C code I wrote mapped exactly to the author’s answer key.

    I celebrated the tiny victory.

    The purpose of the exercise is two fold.  First, the exercise illustrates that higher level programming languages—including C, Python, and Java—compile into a lower level language: assembly.  Moreover, this compels programmers, I think, to pause while writing code and question the performance implication; does this line require two instructions, three, four? Second, the exercise demystifies the concept of C pointers, a construct that many novice programmers stumble over. But after completing this exercise, I find pointers intuitive—nothing more than a value of a memory address.

  • Let’s get lower than Python

    Like a huge swath of other millennial, I dibbled and dabbled in building websites —writing in html, css, and javascript—during my youth, but these days, I primarily code (as a living) in favorite programming language: Python.

    I once considered Python as one of the lower level programming languages (to a certain degree, it is) but as a I dive deeper into studying computer science— reading Computer Systems from a programmer’s perspective, at my own pace, and watching the professors lecture online, for free—I find the language creates too big of a gap between me and system,  leaving me not fully understanding what’s really going on underneath the hood.  Therefore, it’s time to bite the bullet and dive a bit deeper into learning the next not-so-new language on my list: C.

    Why C?  One could argue that if you want to really understand the hardware, learn the language closest to the hardware: assembly (the compiler translates assembly into object code, which, ultimately, executed by the machine).  Yes—assembly is the closest one could get to programming the system, but C strikes a balance.  C can easily be translated into assembly, while maintaining it’s utility (many systems at work still run on C).

    Now, I’m definitely not stopping from writing and learning Python.  I love Python. I learn something new—from discovering standard libraries to writing more idiomatic code—every day.  I doubt that will ever change; I’ll never reach a point where I’ll say “Yup, that’s it, I learned everything about Python.”

    But, I am devoting a large chunk of my time (mostly outside of working hours) on learning C.

    So, my plan is this: finish “The C Programming Language” by Brian Keringhan and Dennis Ritchie. The de-facto book to learn C.

    [1] https://scs.hosted.panopto.com/Panopto/Pages/Sessions/List.aspx#folderID=%22b96d90ae-9871-4fae-91e2-b1627b43e25e%22

  • Calculating page table entry size

    How many page table entries are required given a virtual address size and page size?

    I was confronted with this question when reading Computer Systems from a programmer’s perspective on virtual memory management (VMM) which forced me to reread the chapter, over and over. It wasn’t until I forced myself to close the book, step away from the problem for a day, was I able to fully grasp what was being asked:

    Page table entries problem
    Page table entries problem

    Let’s narrow the scope and focus on a single instance, where n is 16 and P is 4K;  the question can be easily rephrased into something that makes more sense (to me): “How many 4k pages can fit into 16 bit memory address”?

    First, how any bits are required, for a 4K block size, to represent a single page? Well, 212 is 4096 so … that’s sufficient. Therefore, we can rephrase the abstract question that we can solve mathematically: “How many 12 bit pages can fit into a 16 bit virtual address size?”

    So, the problem now boils down simple division— 216 / 212,which can be expressed as 216-12: 24, which is sixteen (16). That’s 16 page table entries that required for a virtual address of size of 16 combined with a page size of 4k.

    The same approach can be taken for all the remaining practice problems. Convert the page (e.g. 4k, 8k) size into bits and divide that into the virtual address size (e.g. 16, 32)

  • Fakebook

    Every day for the past two weeks, my friend’s stream of Facebook posts poured onto my timeline, engulfing my entire newsfeed with photos of him and his latest girlfriend, having such a comical time together. They can’t get enough of each other. They really can’t.  They are glued at the hips.  They must share every dish of food.  Pictures and pictures of them gazing passionately into one another eyes.

    Then, suddenly, his posts seemed to stopped appearing on my feed.  At first, I made an assumption that Facebook’s algorithms was becoming so advance that it somehow read my mind and purged all posts that I considered rubbish.  But I was wrong. I hopped on over to his Facebook page and low and behold, most of his recent posts are gone.  Vanished. Disappeared.  Well, not all of them, just the ones with his girlfriend.  It’s as if she never existed.  What did replace those posts, though, were images with itaclized inspirational quotes on how to avoid dating a sociopath.

    Then, no more than one week after they had presumably broken up, he began posting pictures of them together.

    I cannot seem to find the right word that encapsulates how I feel about the entire situation.  But, I do find it interesting. Interesting that you can digitally erase someone from your history, as if you never met the person. One click here, one click there and poof, problem solved. You’ve rewritten your entire history (it’s like git rebase, but for your life instead of source code).

    I find that unless the material you are posting is self incriminating, it should be left in tact. Those memories—failed relationships, embarassing moments during your drunken stupor, convictions of being with “the one”—are an important part of your story.  It paints an important picture.  They tell us who you are, and perhaps how much you changed. So please, don’t delete your history. I genuinely want to know if you dated a sociopath.

  • Graduate record examination (GRE) in 2 months

    I’m pursuing a master’s degree in computer science and most of the schools I’m applying to— Seattle University, University of Washington, University of Southern California — require that I take the general GRE (graduate record examination).  Although I don’t necessarily agree with standarized tests, especially the GRE,  I recognize the necessity to establish some sort of bar for applications.  So, instead of fighting the process, by attempting to convince admissions to waive the GRE requirement (although some do), I reluctantly scheduled my exam for December 16th. That’s gives me about two months to study.

    That’s not a whole lot of time to prepare. Magoosh, an online platform that prepares exam takers, suggests that most students should aim to study for about three to four months: ain’t nobody got that time for that[1].  Therefore, I’m condensing my studying to two months.  This plan of mine will still require anywhere between two and three hours, every day.

    So, here it is. There’s a myriad of resources available (I’m starting to think that the GRE is a cash cow.  Money flows between Educational Testing Service and the institutions, I think) and I’ve made a conscious reduction of the material:

    Time to get cracking.

    [1] If you haven’t already watch this video, at least a dozen times, you’ve been doing yourself a disservice.