Week 1 and Week 2 of compilers

January 23, 2020 | minutes read

Last week, I couldn’t carve out enough time to write a post on my reflections for week 1 of my compiler’s course so I’ll do the next best thing: write a post that combines week 1 and week 2. The quarter kicked off on January 10th and since then, I’ve felt a range of emotions, from excitement to anxiety. Meanwhile, I’ve been scarfing down a ton of knowledge about compilers (previously, I viewed the compiler as an elusive, magical and mysterious black box). On top of all this, I’ve been experimenting with Cornell Note taking system, switching back and forth between typing up notes on my laptop, writing notes down on my iPad, and using the good old pen and paper.

Study Habits

I’m constantly refining the way I study, optimizing my sessions for increased understanding and maximizing long term retention. Last quarter, I had taken notes using LaTex, generating aesthetically beautiful documents. The idea behind this was that that I would one day look back at them in awe. But in reality, I haven’t looked back at any previously taken notes, not for any of the four prior classes, so there’s really no point (as of right now) to continue precious cycles on typing out LaTex notes when (I think) I’m more actively engaged when scribbling notes down with pen and paper.

Emotions and feelings

A couple things make have been making me feel anxious. First, the sheer number of hours required for the course. According to the omscentral reviews, students estimate roughly 20-25 hours per week. To compare against these estimations, I’ve been tracking the hours I put into this course. So far, on average, I’m studying — watching lectures, reading the textbook, creating flash cards — in roughly 2.5 hours a day; same applies for the weekends. That means, in given week, I’m putting in roughly 17.5 hours, give or take. Will the number of hours increase once the first project is announced (two days from now) ? Perhaps. One way would be to reduce the number of commitments (e.g. singing, writing music, writing blog posts like this one) or sacrifice some sleep — not ideal at all. Instead, I think I’ll just accept that, given the finite number of hours in a week, I may be only able to pull off a B, which I’m totally fine with.

What have I learned so far

As mentioned before, I have zero prior background with compilers. But in just the past two weeks, I’ve discovered that a compiler is not just a black box. It actually consists of three major components: front end, optimizer, back end.

The front end’s job is to scan and parse text and convert that text into a format (intermediate representation) that’s consumable by the backend (more on this below). And if we drill down a bit further, we’ll see that the front end actually consists of subcomponents: scanner, parser, and semantic analyzer. The scanner reads one character a time and then groups them into tokens. These tokens (tuple of type and value) are passed back to the parser (who initially requested them), which then calls the semantic analyzer to ensure that the generated tokens adhere to the semantic rules, ensuring that they are meaningful and make sense.

How does the compiler know whether something is both syntactically and semantically correct? Based off of the rules of the language: the grammar. These are the set of rules that specify the language and can be formalized using finite automata and represented as state machines. What I found most interesting is that we can express the grammar using a human readable format called regular expressions.

Regular Expressions

If you are a programmer, you certainly used regular expressions (in your favorite language like Python) for searching text. But regular expressions are much more powerful than just creating search patterns. They are fundamental to compiler design.

By using regular expressions, we can create a formal language, which consists of the following:
• Alphabet – finite set of allowed characters
• Transition Function
• State – finite set of states
• Accepted State – finite set of accepted states
• Starting State – initial state of the state machines

State Machines

As mentioned above, regular expressions can be converted into state machines (similarly, state machines can be converted back into regular expressions) that fall into one of two categories: deterministic and non-deterministic. Next week will cover this topic in more detail.


So that about wraps it up. I’m looking forward to next week (I.e. Week 3) !

I’m Matt Chung. I’m a software engineer, seasoned technology leader, and father currently based in Seattle and London. I love to share what I know. I write about topic developing scalable & fail-safe software running in the AWS cloud, digital organization as a mechanism for unlocking your creativity, and maximizing our full potentials with personal development habits.

View all articles