Matt Chung's Site

Author: mattchung

How does the modulus operator work?
As a programmer, I’ve written a line or two of code that includes the modulus operator (i.e. “return x % 2“). But, never have I paused to think: “How does the underlying system carry out this operation?” In this post, I limit “underneath the hood” to the lowest level (human readable) programming language: assembly.

So, I’ll take a program (written in C language) and dump it into assembly instructions. Then, I’ll explain each instruction in detail.

My background in assembly programming

Up until a few weeks ago, I had never studied assembly—I did flip threw a page or two of an assembly book, when a colleague mentioned, about five years ago, that his uncle programmed in assembly—and I certainly underestimated the role that assembly plays in my career.

Sure—the days of writing pure assembly language evaporated decades ago, but the treasure lies in understanding how programs, written in higher level languages (e.g “C, Perl, PHP, Python”), ultimately boil down to assembly instructions.

Modulus program

So, I contrived a trivial modulus program written in C—let’s dissect it:
```
// modulus.c

int modulus(int x, int y){
    return x % y;
}
```
Converting C code into assembly

Before a program can be executed by an operating system, we must first convert the program into machine code—we call this compiling. Before we run our program (modulus.c) through the compiler, we need to discuss two arguments that we’ll pass to the compiler, in order to alter the default behavior. The first argument, -0g, disables the compiler from optimizing the assembly instructions. By default, the compiler (we’re using gcc) intelligently optimizes code—one way is by reducing the number of instructions. The second argument, -S, instructs the compiler to stop at just before it creates the machine code (unreadable by humans), and, instead, directs the compiler to create a file (modulus.s) containing the assembly instructions.
```
# gcc -Og -S modulus.c
```
The command above outputs a file, modulus.s, with the contents (unrelated lines removed):
```
modulus:
movl	%edi, %eax
cltd
idivl	%esi
movl	%edx, %eax
ret
```
Let’s step through and explain each of the assembly instructions, one-by-one.

mov

When we want our CPU to perform any action, such as adding, substracting, multiplying, or dividing numbers, we need to first move bytes of data (an intermediate value, or from memory) to a register. We move data to registers with the mov command, which is capable of moving data from:
- register to memory
- memory to register
- register to register
- immediate value to memory
- immediate value to register
The assembly above first moves data from one register (%edi) to another register (%eax). This is necessary since subsequent instructions, such as cltd, rely on data being present in the %eax register.

cltd

cltd stands for “convert long to double long.” But before we dig into why we need this instruction, we must detour and briefly explain the next instruction in line, idivl.

When we send an idivl (divide long) instruction, the processor calcutes the numbers and store the quotient in one register, and stores the remainder in another. These two values stored in the register are half the size; if the dividend is 32 bits, the cpu stores (1) 16-bit value in one register, and the other 16-bit value in another.

Therefore, if we expect a 32-bit quotient and 32-bit remainder, then the dividend (which is, presumably, 32-bits) must be doubled—to 64-bits. In our modulus program, we set a return value of type int.

idivl

idivil divides the numerator (stored in %eax) by the argument (or denominator) — the argument that we passed to the instruction. In our assembly example above, idivl divides the value stored in %rax, by the value in %esi—x and y, respectively.

movl

At the end of a sequence, assembly (by default) returns whatever value is stored in the register %rax. Therefore, the final instruction moves the remainder, not the quotient, from %edx to %rax.

Wrapping up

I hope I was able to share a thing or two on how a higher level program ultimately breaks down into simple assembly instructions.

[1] I was banging my head against the wall until I found a easy to understand reason why we must convert long to double long: http://www.programmingforums.org/post12771.html
February 6, 2017
Here’s some assembly instructions … now write the corresponding C code
A wide grin formed on my face, after successfully completing an exercise (from Computer Systems a Programmer’s perspective) that required me to write C code, based off a sequence of a six assembly instructions:
```
void decode1(long *xp, long *yp, long *zp) {
/* xp stored in %rdi, yp in %rsi, zp in %rdx)

decode1:
    movq (%rdi), %r8
    movq (%rsi), %rcx
    movq (%rdx), %rax
    movq %r8, (%rsi)
    movq %rcx, (%rdx)
    movq %rax, (%rdi)
```
The exercise consists of solely of mov instructions, which is capable of moving bytes (movq specifically moves a quad word) from:
- register to memory
- register to register
- immediate value to register
- immediate value to memory
So, I pulled a pen and paper from my backpack and began tracing the flow of data between the multiple registers and memory locations. I then took that chicken scratch, and wrote the corresponding C code. Finally, I flipped to the end of the chapter to compare my code. Bingo—the C code I wrote mapped exactly to the author’s answer key.

I celebrated the tiny victory.

The purpose of the exercise is two fold. First, the exercise illustrates that higher level programming languages—including C, Python, and Java—compile into a lower level language: assembly. Moreover, this compels programmers, I think, to pause while writing code and question the performance implication; does this line require two instructions, three, four? Second, the exercise demystifies the concept of C pointers, a construct that many novice programmers stumble over. But after completing this exercise, I find pointers intuitive—nothing more than a value of a memory address.
January 18, 2017
Let’s get lower than Python

Like a huge swath of other millennial, I dibbled and dabbled in building websites —writing in html, css, and javascript—during my youth, but these days, I primarily code (as a living) in favorite programming language: Python.

I once considered Python as one of the lower level programming languages (to a certain degree, it is) but as a I dive deeper into studying computer science— reading Computer Systems from a programmer’s perspective, at my own pace, and watching the professors lecture online, for free—I find the language creates too big of a gap between me and system, leaving me not fully understanding what’s really going on underneath the hood. Therefore, it’s time to bite the bullet and dive a bit deeper into learning the next not-so-new language on my list: C.

Why C? One could argue that if you want to really understand the hardware, learn the language closest to the hardware: assembly (the compiler translates assembly into object code, which, ultimately, executed by the machine). Yes—assembly is the closest one could get to programming the system, but C strikes a balance. C can easily be translated into assembly, while maintaining it’s utility (many systems at work still run on C).

Now, I’m definitely not stopping from writing and learning Python. I love Python. I learn something new—from discovering standard libraries to writing more idiomatic code—every day. I doubt that will ever change; I’ll never reach a point where I’ll say “Yup, that’s it, I learned everything about Python.”

But, I am devoting a large chunk of my time (mostly outside of working hours) on learning C.

So, my plan is this: finish “The C Programming Language” by Brian Keringhan and Dennis Ritchie. The de-facto book to learn C.

[1] https://scs.hosted.panopto.com/Panopto/Pages/Sessions/List.aspx#folderID=%22b96d90ae-9871-4fae-91e2-b1627b43e25e%22

January 10, 2017
Calculating page table entry size

How many page table entries are required given a virtual address size and page size?

I was confronted with this question when reading Computer Systems from a programmer’s perspective on virtual memory management (VMM) which forced me to reread the chapter, over and over. It wasn’t until I forced myself to close the book, step away from the problem for a day, was I able to fully grasp what was being asked:

Page table entries problem

Let’s narrow the scope and focus on a single instance, where n is 16 and P is 4K; the question can be easily rephrased into something that makes more sense (to me): “How many 4k pages can fit into 16 bit memory address”?

First, how any bits are required, for a 4K block size, to represent a single page? Well, 2¹² is 4096 so … that’s sufficient. Therefore, we can rephrase the abstract question that we can solve mathematically: “How many 12 bit pages can fit into a 16 bit virtual address size?”

So, the problem now boils down simple division— 2¹⁶ / 2¹²,which can be expressed as 2^16-12: 2⁴, which is sixteen (16). That’s 16 page table entries that required for a virtual address of size of 16 combined with a page size of 4k.

The same approach can be taken for all the remaining practice problems. Convert the page (e.g. 4k, 8k) size into bits and divide that into the virtual address size (e.g. 16, 32)

December 19, 2016
Fakebook

Every day for the past two weeks, my friend’s stream of Facebook posts poured onto my timeline, engulfing my entire newsfeed with photos of him and his latest girlfriend, having such a comical time together. They can’t get enough of each other. They really can’t. They are glued at the hips. They must share every dish of food. Pictures and pictures of them gazing passionately into one another eyes.

Then, suddenly, his posts seemed to stopped appearing on my feed. At first, I made an assumption that Facebook’s algorithms was becoming so advance that it somehow read my mind and purged all posts that I considered rubbish. But I was wrong. I hopped on over to his Facebook page and low and behold, most of his recent posts are gone. Vanished. Disappeared. Well, not all of them, just the ones with his girlfriend. It’s as if she never existed. What did replace those posts, though, were images with itaclized inspirational quotes on how to avoid dating a sociopath.

Then, no more than one week after they had presumably broken up, he began posting pictures of them together.

I cannot seem to find the right word that encapsulates how I feel about the entire situation. But, I do find it interesting. Interesting that you can digitally erase someone from your history, as if you never met the person. One click here, one click there and poof, problem solved. You’ve rewritten your entire history (it’s like git rebase, but for your life instead of source code).

I find that unless the material you are posting is self incriminating, it should be left in tact. Those memories—failed relationships, embarassing moments during your drunken stupor, convictions of being with “the one”—are an important part of your story. It paints an important picture. They tell us who you are, and perhaps how much you changed. So please, don’t delete your history. I genuinely want to know if you dated a sociopath.

October 18, 2016
Graduate record examination (GRE) in 2 months
I’m pursuing a master’s degree in computer science and most of the schools I’m applying to— Seattle University, University of Washington, University of Southern California — require that I take the general GRE (graduate record examination). Although I don’t necessarily agree with standarized tests, especially the GRE, I recognize the necessity to establish some sort of bar for applications. So, instead of fighting the process, by attempting to convince admissions to waive the GRE requirement (although some do), I reluctantly scheduled my exam for December 16th. That’s gives me about two months to study.

That’s not a whole lot of time to prepare. Magoosh, an online platform that prepares exam takers, suggests that most students should aim to study for about three to four months: ain’t nobody got that time for that[1]. Therefore, I’m condensing my studying to two months. This plan of mine will still require anywhere between two and three hours, every day.

So, here it is. There’s a myriad of resources available (I’m starting to think that the GRE is a cash cow. Money flows between Educational Testing Service and the institutions, I think) and I’ve made a conscious reduction of the material:
Time to get cracking.

[1] If you haven’t already watch this video, at least a dozen times, you’ve been doing yourself a disservice.
October 5, 2016
Reflections on 1984

Working the normal nine to five job leaves little time for personal reading, which is why every morning, as soon as I situate myself on the bus, I immediately rip out a book from my backpack and read. I guard this meager time like a gambler and his poker chips. Without these short thirty minute rides to and from work, I wouldn’t have been able to finish 1984.

But I did.

Flipping to the final page, uttering the final line of 1984, left a gaping hole in my stomach. A profound sense of loss. It’s rare for a book, or anything for that matter, to leave me devastated. I just sat there, on the couch, with the book folded over my lap, staring into space. Contemplating.

OBrien’s systematic interrogation — broken down into three concrete phases — destroyed Winston mentally, emotionally, and physically. By the time Obrien had finished with Winston, he was only a “shell of a man”.

Big brother won.

On the plus side, though, the timing of reading the book could not have been more perfect. Had I been forced to read this in highschool (it’s not uncommon for this book to be compulsory reading), my unprepared mind would not have been able to process Orwell’s distopia . With the global surveillance programmes revealed (thank you Edward Snowden) is the idea of big brother that out of the question ? I think not.

You know when you are reading, mid paragraph, and you stumble across a unexpectingly beautiful sentence? Here’s one of my favorite quotes, when Winston has an empiphany, in the final scene, as he’s being painfully tortured:

Perhaps one did not want to be loved as much as to be understood

I’m returning 1984 back to Billy, my book shelf, with plans on re-reading the book. It’s definitely worth a second read.

September 26, 2016
Data structures and algorithms in Java – Inspectional reading
My plan on completing Data Structures and Algorithms in Java by December 31st, 2016, requires changing my reading strategy. Reading technical books cover to cover, like a novel, is rewarding but inefficient and impractical. How about a different approach ?

I’m applying active reading principles from How to read a book[1] and A mind for numbers: active reading by questioning author’s intention, indentifying questions author is attempting to answer.

I skimmed the 15 chapters (although I had already read the chapters on maps) and below are my notes.

The skeleton

Before I skimmed each chapter, I paused and flipped to the page that maps each chapter/section to the ACM[2]. I bookedmarked this page because it helps frame the entire study session — constant questioning of what, and why, I am reading the material.

Mapping between book and ACM guidelines

Chapter 1

Introduction to java and basic programming principles. Conditionals, loops — basic stuff.

Chapter 2

Object oriented design. Introduction to design patterns such as the adapter pattern. The three criteria for good software: robustness (handling unexpected inputs), adaptiblity (adapting to changes and requirements), reusability (reusing code somewhere else).

Chapter 3

Fundamental data structures including arrays and linked lists — singlely, doubley and circular linked lists. Covers the advantage of implementing linked lists over arrays, and vice versa. Compares shallow and deep copies: shallow method copies pointers but deep instantiates another copy of the reference.

Chapter 4

Establishing nomenclature for comparing performacne: big-o. The question of what and why big-o notation, and its history. The three levels of performance: best, worst, and avergae. The 7 mathematical functions, described in ascending order, from best to worst:
1. Constant time
2. Logaritmic
3. Linear
4. N(Logarithmic)
5. Quadratic
6. Cubed
7. Exponential
Chapter 5

Recursion. Impleneting a recursive search algorithm: binary search. When is recursion a bad idea? What a tail recursion is and how to eliminate it.

Chapter 6

Additional data structures. Introducing additional abstract data types (ADT): stacks and queues. Two primary methods of implementing a stack: array and linked list. Leveraging modulus technique to maintain a circular queue. Dequeue, pronounced as “deck”, implements properties of both stack and queue: add to head or tail, and remove from head or tail.

Chapter 7

High level concept of object oriented programming. Instead of relying on the client to initially size the array, a more client-friendly solution is offered: dynamic arrays. The iterator interface is introduced, as well as positional lists.

Chapter 8

Generic trees and ordered trees – including binary and binary search trees, and techniques to traverse: preorder, postorder. Peek at breathe depth first — I thought this concept was reserved for graphs. The chapter ends use cases and trees can model and solve real problems.

Chapter 9

Additional ADT: priority queue, which can be implementing with a positional list or a sorted list. The positional list offers O(1) to insert, but is O(N) for retrieving the minimum. The sorted list can, instead, offer O(1) for minimum but O(N) for inserting. Is there a data structure that offers the best of both worlds, something in between? Yes. The Heap. The heap offers log(N) for both insert and retrieving the minimum value. Additional features, such as removing an entry (not minimum) from a priority queue, require adapting the ADT.

Chapter 10

The map ADT is introduced. A primitive data structure to implement a map, via an array, is introduced, by followed the hashtable. The chapter ends with sorted maps, skip lists and sets.

Chapter 11

Binary search trees, a special type of tree. To combat the worst case scenario (inserting sorted items produces, effectively, a linked list) another ADT is introduced: AVL trees. Two other balancing trees are introduced: splay and (2,4) trees. Chapter ends with Red-Black trees.

Chapter 12

Up until Chapter 12, only two elementary sorting algorithms have been introduced: insertion sort and select sort. Divide and conquer, an algorithm design pattern, is introduced with two concrete implementations: merge and quick sort — both with a runtime complexity of O(nlogn). Radix and bucket is an improvement in runtime complexity — O(N). The concept of stable sorting algorithms are introduced, and we face another type of issue: selecting the correct items.

Chapter 13

Pretty neat topic on tackling “string matching.” Discusses a primitive method for string matching — I think we can I guess how this brute force method works — followed by advanced techniques: The Boyer-Moore algorithm and Knute-Morris-Prath Algorithms. Covers compression algorithms, including the huffman code. Covers dynamic programming, a method for matching a sequence of strings but reduces the runtime complexity, from exponential, by implementing “a few loops”.

Chapter 14

Introduces another ADT – graphs and respective terms: edges, vertices, directed, undirected. Presents two different strategies for keeping track of neighbors: adjaceny matrix and adjancey list. Discusses (2) methods for searching for nodes: depth search first or breathe search first. Covers several methods for constructing a minimum spanning tree. Like OSPF, introduces Dijastreka’s algorithms for determining shortest paths.

Chapter 15

Dives into characteristics of JVM: garbage collection. Introduces the PC — not personal computer — to keep track of function/procedure calls. Compares the types of memory and the hiearchy in terms of performance. How data is stored on external disks, such as B-Tree.

Footnotes

[1] – How to read a book – http://pne.people.si.umich.edu/PDF/howtoread.pdf and a similar article: https://www.farnamstreetblog.com/how-to-read-a-book/
[2] – ACM – https://www.acm.org/education/CS2013-final-report.pdf
September 17, 2016
My GSD alarm

I’m an early bird. I find that I’m the most productive with a little extra time in the morning, which I devote to personal development — meditating, reading, and writing. I wake, usually, to the sound of the alarm; today, though, I woke up naturally. Or so I thought.

I glanced at my watch. 05:30AM. I gave an approving kudos to my body for blessing me with an extra 30 minutes to kick start the day. But it was not until I reviewed arlo that I uncovered Maxella, my sister’s German Shepherd, struggling to sneak into my bed — her lack of finesse and lion sized paws sent a small wave of vibrations through the bed, enough to wake me up.

September 7, 2016
A simple C problem
I’m slowly working through “Computer Systems from a programmer’s perspective” [1] , struggling every bit of the way. I stopped beating myself up about it, though.

The code snippets below are in C. The first function, inplace_swap, swaps two integers. The code itself looks lacks intent but I promise you, it swaps two integers.

The second function, reverses an array. Coming from the Python world, I wondered why in the world we need to pass incount as an additional argument. I discovered the additional argument is absolutely necessary. C functions will always collapse array arguments into a pointer. The length of pointer, which is nothing more than a memory address, cannot be calculated [2].
```
#include <stdio.h>

void inplace_swap(int *x, int *y){
    *y = *x ^ *y;
    *x = *x ^ *y;
    *y = *x ^ *y;
}

void reverse_array(int a[], int cnt){
    int first, last;
    for (first = 0, last = cnt - 1; first <= last; first++, last--){
        inplace_swap(&a[first], &a[last]);
    }

}
```
This function appears to work just fine, at first. It reverses even numbered array flawlessly, however, fails to reverse an odd numbered array.
```
void print_array(int a[], int cnt){
    int i;
    for (i=0; i < cnt; i++){
        printf(" %d", a[i]);
    }
    printf("\n");
}

int main(){
    int a[] = {1,2,3,4};
    int a_length = sizeof(a)/sizeof(int);
    printf ("Reversing 1 2 3 4 : ");
    reverse_array(a, a_length);
    print_array(a, a_length);
}
```
```
$ ./reverse_array
Reversing 1 2 3 4 :  4 3 2 1
```
```
int b[] = {1,2,3,4,5};
int b_length = sizeof(b)/sizeof(int);
printf ("Reversing 1 2 3 4 5:");
reverse_array(b, b_length);
print_array(b, b_length);
```
```
$ ./reverse_array
Reversing 1 2 3 4 5: 5 4 0 2 1
```
3 practice problems

The author reveals there is a bug and poses 3 questions:
For an array of odd length cnt = 2k + 1, what are the values of variables first and last in the final iteration of function reverse_array?

Why does this call to function xor_swap set the array element to 0?

What simple modification to the code for reverse_array would eliminate this problem?
In an odd length array, the variables first and last values are the same. This explains why the middle array element is set to 0. Any number XOR’d with itself will always produce 0. Here are a few examples of XOR’ing two binary values of the same value:
0001 ^ 0001 = 0000 0111 ^ 0111 = 0000
I grew frustrated with the last question. For ten minutes, I held back my impulse to flip to the back of the book and reveal the solution. Instead, I slapped a piece of a piece of paper and a pen onto my desk, and stepped through the each iteration. There’s something about putting the pen to paper, that activates a different part of the brain.
1 2 3 4 5 5 2 3 4 1 5 4 3 2 1 1 2 3 4 5 6 7 7 2 3 4 5 6 1 7 6 3 4 5 2 1 7 6 5 4 3 2 1
The little fix

In an array with odd number of elements, the middle item is always in its final position. We should never call inplace_swapfor the middle item. Instead, we change the condition in the for loop from first <= last to first < last.

Footnotes

[1] My colleague Richard – who graduated from Carnegie Mellon – recommended this book. Its the only book he read from to back, twice.

[2] http://stackoverflow.com/questions/492384/how-to-find-the-sizeofa-pointer-pointing-to-an-array
July 10, 2016