Category: Computer Science

  • CPU and Device Virtualization (notes)

    CPU and Device Virtualization (notes)

    This post is a continuation of virtualization. In the previous post, I talked about memory virtualization. This post instead discusses CPU and device virtualization.

    Ultimately, as system designers, one of our chief aims is to provide an illusion to the underlying guest operating systems that they each individually won the underlying hardware, the CPU and IO devices. But how do will the guest OS transfer control back and forth to and from the hypervisor?

    In a fully virtualized environment (i.e. guest OS has no awareness that it is virtualized), the guest operating communicates to the hypervisor via the standard way: sending traps. Unfortunately, this leaves little room for innovation, which is why in a para virtualized environment (i.e. guest OS contains specific code to communicate with the hypervisor) optimizes the communication by sharing a ring buffer (i.e. producer and consumer model)

    Through this shared buffer, the guest VM and hypervisor send messages to one another, the messages containing pointers — not raw data — to buffer allocated by the either the guest operating or allocating by the hypervisor. This efficient method of communication prevents the need to copy buffers between the two entities.

    Introduction

    Summary

    With CPU virtualization, there are two main focuses: giving the illusion to the guest that it owns the CPU and having the hypervisor field events (but what the hell is a discontinuity) ?

    CPU Virtualization

    Summary

    Two techniques for CPU scheduling. Proportional share (commensurate with use) and fair share scheduler: everyone gets the same slice of the pie.

    Device Virtualization Intro

    Summary

    Want to give guest illusion that they own the IO devices

    Device Virtualization

    Device Virtualization: full virtualization vs Para

    Summary

    In a fully virtualized guest OS, there’s little room to innovate for device IO since the mechanism is really just a trap and emulate (by the hypervisor). In contrast, para virtualization offers lots of opportunities for optimization, including utilizing shared buffers. However, we need to figure out how to transfer control back and forth between the guest OS and the hypervisor

    Control Transfer

    Summary

    In a full virtualization environment, control from guest to hypervisor happens through a trap. Whereas in the para virtualization, the guest OS sends hyper calls. However, in both full and para virtualization, the hypervisor communicates to the guest OS via software interrupts.

    Data Transfer

    Summary

    Data Transfer in para virtualization uses a ring buffer for producer/consumer model

    Data transfer in a para virtualization system is fascinating: the guest VM and the hypervisor share a IO circular buffer. The guest VM produces data and the hypervisor consumes it, the guest VM maintaining a pointer to the position of the data. And in the response path, the hypervisor maintains a pointer, on the same IO circular buffer, that tracks the responses (and where the guest VM tracks a private buffer)

    Control and Data Transfer in Action

    Summary

    Control and Data Transfer in Action – ring buffers for transmitting and receiving packets

    Pretty amazing how the hypervisor and guest VM work together to avoid copying data between address spaces. To this end, the guest VM will transmit data by maintaining a circular buffer and copy the pointers into the hypervisor, so no data is copied (just pointers). Same approach in the opposite direction, when hypervisor receives packet and then forwards it to the guest.

    Disk IO Virtualization

    Summary

    Disk Virtualization – ring buffer

    Pretty much identical to data transfer in network IO virtualization. Except that Xen offers a reorder barrier to support higher level semantics like write ahead logging (a topic that seems really interesting to me: I’d be curious how to build a database from the ground up)

    Measuring Time

    Summary

    Fundamental tenet is utility computing so we need a way to accurately bill users

    Xen and guests

    Summary

    Xen and guest – paravirtualization

    Xen offers para virtualization and VMWare offers fully virtualization (i.e. guest VM has no clue). Regardless, virtualization focuses on protection and flexibility, trading off performance

  • Operating Systems – Memory Virtualization – Paging

    Operating Systems – Memory Virtualization – Paging

    In my other blog post on memory segmentation, I talked about diving the process’s virtual address space into segments: code, heap, stack. Memory segmentation is just one approach to memory virtualization and another approach is paging. Whereas segmentation divides the address space into segments, paging chops up the space into fixed-sized pieces.

    Each piece is divided into a virtual private number (VPN) and the offset; the number of bits that make up the offset is determined by the size of the page itself and the number of pages dictate the number of bits of the virtual private number.

    To support the translation between virtual address to physical address, we need a page table, a per process data structure that maps the virtual address to the physical frame number (PFN).

    (more…)

  • Week 1 and Week 2 of compilers

    Last week, I couldn’t carve out enough time to write a post on my reflections for week 1 of my compiler’s course so I’ll do the next best thing: write a post that combines week 1 and week 2. The quarter kicked off on January 10th and since then, I’ve felt a range of emotions, from excitement to anxiety. Meanwhile, I’ve been scarfing down a ton of knowledge about compilers (previously, I viewed the compiler as an elusive, magical and mysterious black box). On top of all this, I’ve been experimenting with Cornell Note taking system, switching back and forth between typing up notes on my laptop, writing notes down on my iPad, and using the good old pen and paper.

    Study Habits

    I’m constantly refining the way I study, optimizing my sessions for increased understanding and maximizing long term retention. Last quarter, I had taken notes using LaTex, generating aesthetically beautiful documents. The idea behind this was that that I would one day look back at them in awe. But in reality, I haven’t looked back at any previously taken notes, not for any of the four prior classes, so there’s really no point (as of right now) to continue precious cycles on typing out LaTex notes when (I think) I’m more actively engaged when scribbling notes down with pen and paper.

    Emotions and feelings

    A couple things make have been making me feel anxious. First, the sheer number of hours required for the course. According to the omscentral reviews, students estimate roughly 20-25 hours per week. To compare against these estimations, I’ve been tracking the hours I put into this course. So far, on average, I’m studying — watching lectures, reading the textbook, creating flash cards — in roughly 2.5 hours a day; same applies for the weekends. That means, in given week, I’m putting in roughly 17.5 hours, give or take. Will the number of hours increase once the first project is announced (two days from now) ? Perhaps. One way would be to reduce the number of commitments (e.g. singing, writing music, writing blog posts like this one) or sacrifice some sleep — not ideal at all. Instead, I think I’ll just accept that, given the finite number of hours in a week, I may be only able to pull off a B, which I’m totally fine with.

    What have I learned so far

    As mentioned before, I have zero prior background with compilers. But in just the past two weeks, I’ve discovered that a compiler is not just a black box. It actually consists of three major components: front end, optimizer, back end.

    The front end’s job is to scan and parse text and convert that text into a format (intermediate representation) that’s consumable by the backend (more on this below). And if we drill down a bit further, we’ll see that the front end actually consists of subcomponents: scanner, parser, and semantic analyzer. The scanner reads one character a time and then groups them into tokens. These tokens (tuple of type and value) are passed back to the parser (who initially requested them), which then calls the semantic analyzer to ensure that the generated tokens adhere to the semantic rules, ensuring that they are meaningful and make sense.

    How does the compiler know whether something is both syntactically and semantically correct? Based off of the rules of the language: the grammar. These are the set of rules that specify the language and can be formalized using finite automata and represented as state machines. What I found most interesting is that we can express the grammar using a human readable format called regular expressions.

    Regular Expressions

    If you are a programmer, you certainly used regular expressions (in your favorite language like Python) for searching text. But regular expressions are much more powerful than just creating search patterns. They are fundamental to compiler design.

    By using regular expressions, we can create a formal language, which consists of the following:
    • Alphabet – finite set of allowed characters
    • Transition Function
    • State – finite set of states
    • Accepted State – finite set of accepted states
    • Starting State – initial state of the state machines

    State Machines

    As mentioned above, regular expressions can be converted into state machines (similarly, state machines can be converted back into regular expressions) that fall into one of two categories: deterministic and non-deterministic. Next week will cover this topic in more detail.

    Summary

    So that about wraps it up. I’m looking forward to next week (I.e. Week 3) !

  • Masters in CS paying off

    Taking computer science courses are already paying off in my career. Nothing too significant (yet) but I am witnessing small wins.

    For example, this past summer I suffered through HPCA (high performance computing architecture), a course historically only offered in the longer semesters (e.g. fall, spring). In the course, I learned a lot of theory: barrier synchronization, processing pipelines, instruction scheduling, multi level caches, branch predicators, cache coherence protocols, and much more (many of which I have already purged from my memory). However, since most of that knowledge was primarily theory, very little of it I’ve been able to directly apply to my day to day job.

    Until a couple days ago.

    While at work, my co-worker had pointed out that the unlikely C function calls were sprinkled throughout our code base, used for branch prediction. And thanks to theory, I was able to describe to another co worker how the compiler (with its understanding of the underlying CPU architecture) will rearrange the assembly (or machine code) in order to avoid a branch prediction miss: if the wrong instruction is fetched, the processing pipeline must flush all the previous instructions queued in the pipeline, temporarily halting the pipeline, which ultimately reduces the throughput.

    And this is just one example of how taking my masters had paid off. There have been a number of other situations (like understanding shared memory and interprocess communication) that I’ve been able to apply theory from academia to practice at work.

    Thanks to the program, I’m feeling much more competent and confident working as a software developer at Amazon Web Services.

  • Information Security – Project 4

    This afternoon, I started on project 4 for introduction to information security (IIS). This goal for this project is to have us students learn more about web security and consists of three objectives, manufacturing three web attacks: cross site scripting, cross site forgery and structure query language (SQL) injection attack. And although I’m very familiar with the terms, I’ve actually never carried out any of those attacks in neither an academic or professional setting. In this post, I’ll share some of the things I learned while spending 4 hours in a cafe chipping away at the project.

    Cross site request forgery (CSRF)

    This attack was very straight forward: inspect the source code of the (PHP) files and carefully tease out which form inputs could be set in the body of the HTTP POST.

    Cross site scripting (XSS)

    This task was a ton of fun. Initially, I headed down an entirely wrong path and found myself getting very frustrated. Initially, because of the video lectures, I had wrongly assumed that the only way to perform the attack was to embed an iFrame my hand crafted HTML page, the iFrame loading the contents of the remote website, the target of the attack. And although this entirely possible, embedding an iFrame is unnecessary: what I really need to do is basically send an HTTP post to the remote site, embedding javascript in one of FORM values, carefully ensuring that when the page renders in the browser, it’s identical to the original website.

  • Week 1 of master’s in computer science

    January 7th, 2019 marks the first day of my computer science master’s program through University of Georgia Tech. The week leading up to the first day was somewhat stressful since I was mid flight (returning to Seattle from London), my plane hovering over the massive blue sea, while I frantically typed away on my keyboard trying to register for a course in my allotted time window. Because of the spotty internet connection on the airplane, it took me about an hour and half to register for my course and by that point, the open spots filled up so fast that I positioned 100+ on the wait list (which I discovered, later on after reading online posts, that 100+ wait list is normal and that I would like get get into the course, which I did).

    Anyways, despite all that headache, I’m officially enrolled in a course that I’ve wanted to take for a very very long time: Introduction to Operating Systems. So far, although it’s only been 1 week, I love the course, for multiple reasons.

    First, I love the collaborative and sense of community of the program. Before getting into this master’s program, I was taking a handful of undergraduate computer science courses (e.g. computer organization, discrete mathematics, data structures) from University of North Dakota and University of Northern Iowa, two excellent schools that offered courses through their Distant Learning platform. Now although I learned a lot from the courses, I always felt like I was working in isolation, by myself, my only interaction was through a few short threaded e-mails with the professors. But now, with this course, there’s an online forum (i.e. Piazza) and chatty chatroom (via Slack, which was paid out of pocket by one of the TA of the course), where students fire off questions and comments (especially random comments in the #random slack channel). So in a sense, despite never meeting these folks, there’s a sense of comradery, a shared a goal.

    Second, I’m learning a ton of material that’s not only applicable to my day to day job (as a software engineer) but material that I’m genuinely interested in. For the first week, the professor of the course has us reading a 30 page white paper (titled “Introduction to Threading”) written in 1989, a seminal piece of work (on threads and concurrency) that gives me perspective and appreciation of my industry. In addition to reading the white paper, I’m watching lectures covering fundamental operating system concepts (e.g. processes, virtual memory) and above all, writing C code! A ton of C code!

    The project has us writing code for a multi-threaded client and multi-threaded web server (think implementing HTTP protocol) that’s intended to teach us how to write safe, concurrent systems that utilize the threading facility offered by the operating system.

  • History of i,j,k variables ?

    Any time you read code (in production or from a textbook), you’ll often see the control variable, when employing for loops, being declared with the variables i,j,k. And for most of my programming career, I’ve never really questioned why we specifically choose those three letters. Why not m (a great letter), or c or d or e — or any other letter for that matter. Seems rather arbitrary.

    But, I suspect that it has to do with William Rowan Hamilton, a famous Irish mathematician, who published a book in the 1800’s, the book titled “A theory of Systems of Rays”. And in this book, William uses i, j, notation when representing vectors (in R3).  This representation of vectors became the standard notation and he’s the person we need to thank when we type in those three letters when programming.  

  • Linear algebra – exam 1

    Earlier this morning, before starting the work week, I took my first linear algebra exam at the nearby Northgate Testing center.  The proctored exam covered the first four modules in the course, topics including:

    • Gaussian elimination (row echelon)
    • Gaussian Jordan elimination (row reduced echelon)
    • Adding, subtracting, multiplying matrices
    • Calculating the inverse of matrices (using identity matrices)
    • LU Factorization

    I felt well prepared (thanks again to Khan Academy and 3Blue1Brown). Going into the exam, I worked (and reworked) the suggested exercises in the textbook.  Honestly, I think if I hadn’t attempted the exercises, my mind would’ve tricked me into believing that I understood the material. But that obviously wasn’t the case because the different math problems had me fumbling around, furiously scribbling pencil all over my notebook paper followed by vigorous strokes of erasing. Anyways, 1 exam down — 2 more to go.

    Next topic: calculating the determinant of a matrix (at this present moment, I have no idea what that means)

     

  • Assembly RA (return address)

    About a year has passed since I took computer organization and I’m a bit disappointed of how much of the material faded from my memory. I only realized this while reading about Exception Control Flow (ECF) in the book Computer Systems: A programmer’s perspective, when the authors mentioned:

    As with a procedure call, the processor pushes a return address on the stack before branching to the handler

    Section 8.1 Exceptions

    I had to pause for a moment because despite a handful of assembly programs — one of them where I implemented quicksort — I totally forgot how, in assembly, how the return address gets handled. But after reviewing my own assembly program that I wrote earlier this year, I realized that the syscall directs the processor to store the PC value into the ra (return address) register.  Then, it’s the callee’s responsiblity to then save that value on the stack and after its routine finishes, call jr (jump return).

    Here’s some sample code

    main:
    	la $a0, numbers		# Load the address numbers into $a0, the first argument
    	li $a1, 0		# Load 0 into $a, the second argument
    	lw $t0, length		# Load the length of the array into temporary register $t0
    	sub $a2, $t0, 1		# Subtract 1 from the length of the array and store into $a3, the first argument
    	jal quicksort
    	li $v0, 10
    	syscall
    
    quicksort:
    
    quicksort_if:
    
    	bge $a1, $a2, quicksort_end_if	# CHeck if low < high
    
    quicksort_body:
    	addi, $sp, $sp, -24	# Update the stack pointer, allocating enough room for savin the 6 registers
    	sw $ra, 0($sp)		# Store return address on the stack
    	sw $s0, 4($sp)		# Store $s1 on the stack
    	sw $s1, 8($sp)		# Store $s2 on the stack
    	sw $s2, 12($sp)		# Store $s3 on the stack
    	sw $s3, 16($sp)		# Store $s3 on the stack
    	sw $s4, 20($sp)		# Store $s3 on the stack
    	move $s0, $a0		# Store $a0 (address of array) into $s0
    	move $s1, $a1		# Store $a1 (low) into $s1
    	move $s2, $a2		# Store $a2 (high) into $s2
    	jal partition		# Call the subprogram partition
    	move $s4, $v0		# Store pivot -1 into $a2
    	move $a0, $s0		# Store $s0, array, into $a0
    	move $a1, $s1		# Store $s1, low, into $a1
    	move $a2, $s4		# Store pivot position in $a2
    	sub $a2, $a2, 1		# Subtract 1 from $a2 (i.e. pivot - 1)
    	jal quicksort		# First call to quicksort (i.e. quickSort(array, low, pivotPosition - 1)
    	move $a0, $s0		# Store array into $a0
    	move $a1, $s4		# Move pivot into $a1
    	add $a1, $a1, 1		# Add 1 to $a1 (i.e. pivot + 1)
    	move $a2, $s2		# Move high (i.e. $s2) into $a2
    	jal quicksort		# Second call to quicksort (i.e. quickSort(array, pivotPosition + 1, high)
    	lw $ra, 0($sp)		# Restore the return address from the stack and back into $ra
    	lw $s0, 4($sp)		# Restore $s0 from the stack
    	lw $s1, 8($sp)		# Restore $s1 from the stack
    	lw $s2, 12($sp)		# Restore $s2 from the stack
    	lw $s3, 16($sp)		# Restore $s2 from the stack
    	lw $s4, 20($sp)		# Restore $s2 from the stack
    	addi $sp, $sp, 24	# Pop the call frame
  • Flexing C muscles

    I’ve been sharpening my C programming skills because I’m now working on a team (i.e. EC2 Networking) within Amazon Web Services that requires skills in lower level software development. And I must say, I’m truly rusty. A noob. On top of that, my only experience with C in the past has been fairly superficial, me dabbling here and there in the K & R book, long before I had taken any computer science courses.  But since taking computer organization and data structures, I have a tight understanding of pointers and their practical uses.

    But I digress.

    I’m currently working through a book written by Zed Shaw, the book titled “Learn C the hard way.” I had picked up the book at the local library in Seattle, stumbling upon it while I was running my fingers along the spines of books in the computer science section. This book has mixed reviews. Some websites strongly dissuade readers from reading anything published by this particular author.  But others recommend it. So, I’m taking their advice with a grain of salt and forming my own opinion.

    At the end of every chapter, the author recommends some extra credit exercises.  In one of these exercises, the author suggests implementing a stack data structure in C.  Now, there are two popular implementations when building a stack: using an array and using a linked list. So, I took this exercise as an opportunity to write a simple stack data structure using linked lists (you gotta love them C pointers).

    Here’s the program output:

    ./stack_linked_list
    
    Stack is empty. Pushing 10 to the top of stack
    Stack is not empty. Pushing 20 to the top of the stack
    Stack is not empty. Pushing 30 to the top of the stack
    Popped item: 30
    Peeking at next item: 20
    Peeking at next item: 20
    Peeking at next item: 20
    Popped item: 20
    Popped item: 10
    Popped item: -1

    And the corresponding code:

    #include <stdio.h>
    #include <stdlib.h>
    #include <stdbool.h>
    
    struct Node {
        int value;
        struct Node * next;
    };
    
    struct Stack {
        int count;
        struct Node * top;
    };
    
    
    bool isEmpty(struct Stack * stack)
    {
        return stack->top == NULL;
    }
    
    void push(struct Stack * stack, int item)
    {
        /*
         * Edge case: first item to push
         */
    
        struct Node * node = (struct Node *) malloc(sizeof(struct Node));
        node->value = item;
    
        if (stack->top == NULL){
            printf("Stack is empty. Pushing %i to the top of stack\n", item);
            stack->top = node;
        } else {
            printf("Stack is not empty. Pushing %i to the top of the stack\n", item);
            node->next = stack->top;
            stack->top = node;
        }
    
        stack->count++;
        return;
    
    }
    int pop(struct Stack * stack)
    {
        if (isEmpty(stack)){
            return -1;
        }
    
        struct Node * node = stack->top;
        stack->top = node->next;
    
        return node->value;
    }
    
    int peek(struct Stack * stack)
    {
        if (isEmpty(stack)){
            return -1;
        }
    
        return stack->top->value;
    }
    
    
    int main(int argc, char * argv[])
    {
        struct Stack stack;
        push(&stack, 10);
        push(&stack, 20);
        push(&stack, 30);
        printf("Popped item: %i\n", pop(&stack));
        printf("Peeking at next item: %i\n", peek(&stack));
        printf("Peeking at next item: %i\n", peek(&stack));
        printf("Peeking at next item: %i\n", peek(&stack));
        printf("Popped item: %i\n", pop(&stack));
        printf("Popped item: %i\n", pop(&stack));
        printf("Popped item: %i\n", pop(&stack));
        return 0;
    }