Author: mattchung

  • Georgia Tech OMSCS CS6515 (Graduate Algorithms) Course Review

    Georgia Tech OMSCS CS6515 (Graduate Algorithms) Course Review

    To pass this class, you should

    1. digest everything written in Joves’s notes (he’s a TA and will release these notes gradually throughout the semester so pay close attention to his Piazza posts)
    2. join or form a study group of a handful of students
    3. dedicate at least 20+ hours per week to drill, memorize, and apply algorithms
    4. complete all the homework assignments, (easy) project assignments, and quizzes (these are all easy points and you’ll need them given that exams make up 70% of your final grade)
    5. drill ALL the practice problems (both assigned and extra ones published on the wiki) over and over again until you’ve memorized them

    Almost failing this class

    This class kicked me in the ass. Straight up. Words can barely described how relieved I feel right now; now that the summer term is over, my cortisol levels are finally returning to normal levels.

    I’m not exaggerating when I say I teared up when I learned that I received a passing grade. I barely — and I mean barely (less than 1%) — passed this class with a B, a 71%. Throughout the last week of the summer semester, while waiting for the final grades to be published on Canvas, I had fully prepared myself (both mentally and emotionally) for repeating this class, level of anxiety and stress I haven’t felt throughout the last 3 years in the OMSCS program.

    Other students in the class felt the same level of despair. One other student shared that he has:

    never felt that much pressure and depression from a class in [his] entire academic career.

    One other student definitely did not hold back any punches on Piazza:

    I am going to open up a new thread after this course finishes out. I am tired of the arrogant culture that is in this program and specifically in this course! There is a lack of trying to understand other perspectives and that is critical for creating a thriving diverse intellectual community.

    So yes — this course is difficult.

    All that being said, take my review with a pinch of salt. Other reviewers have mentioned that you just need to “put in the work” and “practice all the assigned and wiki problems”. They’re right. You do need to do both those things.

    But the course may still stress you out; other courses in the program pretty much guarantee that you’ll pass (with an A or B) if you put in x number of hours; this doesn’t apply for GA. You can put in all the hours and still not pass this class.

    Before getting into the exam portion of my review, it’s worth noting that the systems classes I mentioned above play to my strengths as a software engineer building low level systems; in contrast, graduate algorithm predominately focuses on theory and is heavy on the math, a weakness of mine. Another factor is that I’ve never taken an algorithmic course before, so many of the topics were brand spanking new to me. Finally, my mind wasn’t entirely focused on this class given that I had quit my job at FAANG during the first week this class started.

    Okay, enough context. Let’s get into discussing more about the exams.

    Exams

    As mentioned above, do ALL the practice problems (until you can solve them without thinking about it) and really make sure you understand everything in Joves’s notes. I cannot emphasize these two tips enough. You might be okay with just working the assigned practice problems but I highly recommend that you attempt the homework assignments listed on the wiki since questions from the exam seem to mirror (almost exactly) those questions. And again, Joves’s notes are essentially since he structures the answers in the same way they are expected on the exam.

    Exam 1

    Exam 1 consists of 1) dynamic programming and 2) Divide and Conquer (DC)

    Read the dynamic programming (DP) section from the DPV textbook. Practice the dynamic programming problems over and over and over again.

    Attempt to answer all the dynamic programming (DP) problems from both the assigned practice problems and all the problems listed on the wiki. Some other reviewers suggest only practicing a subset of these problems but just cover your bases and practice ALL of practice problems — over and over again, until they become intuitive and until you can (with little to no effort) regurgitate the answers.

    For the divide and conquer question, you MUST provide an optimal solution. If you provide a suboptimal solution, you will be dinged heavily: I answered the question a correct solution but was O(n) and not O(logn), I only lost half the points. A 50%. So, make sure you understand recursion really well.

    Exam 2

    Exam 2 focuses on graph theory. You’ll likely get a DFS/Dijkstra/BFS question and another question that requires you understand spanning trees.

    The instructors want you to demonstrate that you can use the algorithms as black boxes (no need to prove their correctness so you can largely skip over the graph lectures). That is, you must understand when/why to use the algorithms, understand their inputs and outputs, and memorize their runtime complexity.

    For example, given a graph, you need to find out if a path exists from one vertex to another.

    To solve this problem, should know explore algorithm like the back of your hand. You need to know that the algorithm requires both an undirected (or directed) graph and a source vertex as inputs. And the algorithm returns a visited[u] array, each entry set to True if such a path exists.

    That’s just one example. There are many other algorithms (e.g. DFS, BFS, Krushkal’s MST) you need to memorize. Again, see Joves’s notes (recommendation #1 at the top of this page). Seriously, Joves, if you are reading this, thanks again. Without your notes, I would 100% have failed the course.

    Exam 3

    Understand the difference between NP, NP Hard, NP-Complete.

    I cannot speak much to the multiple choice question (MCQ) since I bombed this part of the exam. But I did relatively well on the single free-form question, again, thanks to Joves’s notes. Make sure that you 1) Prove that a problem is in NP (i.e. solution can be verified in polynomial time) and 2) You can reduce a known NP-Complete problem to this new problem (in that order — DO NOT do this backwards and lose all the points).

    Summary

    Some students will cruise this class. You’ll see them on Piazza and Slack, celebrating their near perfect scores. Don’t let that discourage you. Most of students find this topic extremely challenging.

    So just brace yourself: it is a difficult course. Put the work in. You’ll do fine. And I’ll be praying for you.

     

  • Leaps of faiths

    Leaps of faiths

    Today marks my last day at Amazon Web Services. The last 5 years have flown by. Typically, when I share the news with my colleagues or friends or family, their response is almost always “Where are you heading next?”.

    Having a job lined up is the logical, rational and responsible thing to do before making a career transition. A plan is not only the safe thing to do, but probably even the right thing to do, especially if you have a family you need to financially support. And up until recently, I started really doubting myself, questioning my decision to leave a career behind without a bullet-proof plan.

    But then, I start to reflect on the last 10 years and all of the leaps of faith I took. In retrospect, many of those past decisions made no sense whatsoever.

    At least not at that time.

    Seven years ago, I left my position as a director of technology at Fox and with nothing lined up, reduced my belongings to a single suit case, moving to London for a girl I had only briefly met for 2 hours while volunteering at an orphanage in Vietnam. When I booked my flight from Los Angeles to London, almost everyone was like, “Matt — you just met her. This makes no sense.”

    They were right. It made no sense.

    Around the same time, another leap of faith: confessing to my family and friends that I was living a double life and subsequently checking myself into rehab and therapy. Many could not fathom why I was asking for help since issues, especially around addiction, was something our family didn’t talk about. Shame and guilt was something we kept ourselves, something one battles alone, in isolation.

    Again, my decision made no sense.

    But now, looking back, those decisions were a no brainer. That relationship I took a shot on blossomed into a beautiful marriage. And attending therapy every week for the past 5 years quite literally saved my life from imploding into total chaos. These decisions , making no sense at the time, were made out of pure instinct.

    But somehow, they make total sense now.

    Because it’s always easy to connect the dots looking backwards — never forwards.

    So here I am, right now, my instinct nudging me to take yet another leap of faith. It’s as if I have this magic crystal ball, showing me loud and clear what my path is: a reimagined life centered around family.

    How is this all going to pan out?

    No clue.

    But it’ll probably all make sense 5 years from now.

  • “Is my service up and running?” Canaries to the rescue

    “Is my service up and running?” Canaries to the rescue

    You launched your service and rapidly onboarding customers. You’re moving fast, repeatedly deploying one new feature after another. But with the uptick in releases, bugs are creeping in and you’re finding yourself having to troubleshoot, rollback, squash bugs, and then redeploy changes. Moving fast but breaking things. What can you do to quickly detect issues — before your customers report them?

    Canaries.

    In this post, you’ll learn about the concept of canaries, example code, best practices, and other considerations including both maintenance and financial implications with running them.

    Back in early 1900s, canaries were used by miners for detecting carbon monoxide and other dangerous gases. Miners would bring their canaries down with them to the coalmine and when their canary stopped chirping, it was time for the everyone to immediately evacuate.

    In the context of computing systems, canaries perform end-to-end testing, aiming to exercise the entire software stack of your application: they behave like your end-users, emulating customer behavior. Canaries are just pieces of software that are always running and constantly monitoring the state of your system; they emit metrics into your monitoring system (more discussion on monitoring in a separate post), which then triggers an alarm when some defined threshold breaches.

    What do canaries offer?

    Canaries answer the question: “Is my service running?” More sophisticated canaries can offer a deeper look into your service. Instead of canaries just emitting a binary 1 or 0 — up or down — they can be designed such that they emit more meaningful metrics that measure latency from the client’s perspective.

    First steps with building your canary

    If you don’t have any canaries running that monitor your system, you don’t necessarily have to start with rolling your own. Your first canary can require little to no code. One way to gain immediate visibility into your system would be to use synthetic monitoring services such as BetterUptime or PingDom or StatusCake. These services offer a web interface, allowing you to configure HTTP(s) endpoints that their canaries will periodically poll. When their systems detect an issue (e.g. TCP connection failing, bad HTTP response), they can send you email or text notifications.

    Or if your systems are deployed in Amazon Web Services, you can write Python or Node scripts that integrate with CloudWatch (click here for Amazon CloudWatch documentation).

    But if you are interested in developing your own custom canaries that do more than a simple probe, read on.

    Where to begin

    Remember, canaries should behave just like real customers. Your customer might be a real human being or another piece of software. Regardless of the type of customer, you’ll want to start simple.

    Similar to the managed services describe above, your first canary should start with emitting a simple metric into your monitoring system, indicating whether the endpoint is up or down. For example, if you have a web service, perform a vanilla HTTP GET. When successful, the canary will emit http_get_homepage_success=1 and under failure, http_get_homepage_success=0.

    Example canary – monitoring cache layer

    Imagine you have a simple key/value store system that serves as a caching layer. To monitor this layer, every minute our canary will: 1) perform a write 2) perform a read 3) validate the response.

     
     

    [code lang=”python”]
    while(True):
    successful_run = False
    try: put_response = cache_put(‘foo’, ‘bar’)
    write_successful = put_response == ‘OK’
    Publish_metric(‘cache_engine_successful_write’, write_successful)
    value = cache_get(‘foo’) successful_read = value = ‘bar’ publish_metric(‘cache_engine_successful_read’, is_successful_read)
    canary_successful_run = True
    Except as error:
    log_exception(“Canary failed due to error: %s” % error)
    Finally:
    Publish_metric(‘cache_engine_canary_successful_run’, int(successful_run))
    sleep_for_in_seconds = 60 sleep(sleep_for_in_seconds)
    [/code]

    Cache Engine failure during deployment

    With this canary in place emitting metrics, we might then choose to integrate the canary with our code deployment pipeline. In the example below, I triggered a code deployment (riddled with bugs) and the canary detected an issue, triggering an automatic rollback:

    Canary detecting failures

    Best Practices

    The above code example was very unsophisticated and you’ll want to keep the following best practices in mind:

    • The canaries should NOT interfere with real user experience. Although a good canary should test different behaviors/states of your system, they should in no way interfere with the real user experience. That is, their side effects should be self contained.
    • They should always be on, always running, and should be testing at a regular intervals. Ideally, the canary runs frequently (e.g. every 15 seconds, every 1 minute).
    • The alarms that you create when your canary reports an issue should only trigger off more than one datapoint. If your alarms fire off on a single data point, you increase the likelihood of false alarms, engaging your service teams unnecessarily.
    • Integrate the canary into your continuous integration/continuous deployment pipeline. Essentially, the deployment system should monitor the metrics that the canary emits and if an error is detected for more then N minutes, the deployment should automatically roll back (more of safety of automated rollbacks in a separate post)
    • When rolling your own canary, do more than just inspect the HTTP headers. Success criteria should be more than verifying that the HTTP status code is a 200 OK. If your web services returns payload in the form of JSON, analyze the payload and verify that it’s both syntactically and semantically correct.

    Cost of canaries

    Of course, canaries are not free. Regardless of whether or not you rely on a third party service or roll your own, you’ll need to be aware of the maintenance and financial costs.

    Maintenance

    A canary is just another piece of software. The underlying implementation may be just few bash scripts cobbled together or full blown client application. In either case, you need to maintain them just like any other code package.

    Financial Costs

    How often is the canary running? How many instances of the canary are running? Are they geographically distributed to test from different locations? These are some of the questions that you must ask since they impact the cost of running them.

    Beyond canaries

    When building systems, you want a canary that behaves like your customer, one that allows you to quickly detect issues as soon as your service(s) chokes. If you are vending an API, then your canary should exercise the different URIs. If you testing the front end, then your canary can be programmed mimic a customer using a browser using libraries such as selenium.

    Canaries are a great place to start if you are just launching a service. But there’s a lot more work required to create an operationally robust service. You’ll want to inject failures into your system. You’ll want a crystal clear understanding of how your system should behave when its dependencies fail. These are some of the topics that I’ll cover in the next series of blog posts.

    Let’s Connect

    Let’s connect and talk more about software and devops. Follow me on Twitter: @memattchung

  • 3 project management tips for the Well-Rounded Software Developer

    3 project management tips for the Well-Rounded Software Developer

    This is the second in the series of The Well Rounded Developer. See previous post “Network Troubleshooting for the Well-Rounded Developer”

    Whether you are a solo developer working directly with your clients, or a software engineer part of a larger team that’s delivering a large feature or service, you need to do more than just shipping code. To succeed in your role, you also need good project management skills, regardless of whether there’s an officially assigned “project manager”. By upping your project management skills, you’ll increase the odds of delivering consistently and on time — necessary for earning trust among your peers and stakeholders.

    3 Project Management Tips

    Just like programming, project management is another skill that requires practice — you’ll get better with it overtime. Sometimes you’ll grossly underestimate a task, thinking it’ll take 3 days … when it really took 10 days (or more!). Don’t sweat it. Project management gets easier the more you do it.

    Capturing Requirements

    This seems obvious and almost goes without saying, but as a developer, you need to be able to extract the mental image of your customer/product manager. Then, distill them into words, often referred to as “user stories”: “When I do X, Y happens” or “As a [role] … I want [goal] … so that [benefit].

    These conversations will require a lot of back and forth discussion. With each iteration, aim to be as specific as possible. Include numbers, pictures, diagrams. The more detail, the better. And most important, beyond defining your acceptance criteria, spell out your assumptions — loud and clear. Because if any of the assumptions get violated while working on the task, you need to sound the alarm and communicate (see “send frequent communication updates” below) that the current estimated time has been derailed.

    Example

    Task Description

    When we receive a packet with a length exceeding the maximum transmission unit (MTU) of 1514 bytes, the packet gets dropped and the counter “num_dropped_packets_exceeding_mtu” is incremented.

    Sending frequent communication updates

    Most importantly, keep your stakeholders in the loop. Regardless the task at hand is trending on time, slipping behind, or being delivered ahead of schedule, send an update. That might be in the form of an e-mail, or closing out your task using your project management system.

    Example of a short status update

    More often than not, we developers tend to send updates too infrequently and as a result, our stakeholders are often guessing where the project(s) stand. These updates can be short and simple: “Completed task X. Code has been pushed to feature branch but still needs to be merged into mainline and deployed through pipeline.”

    Breaking tasks into small deliverables

    It pays off to break down large chunks of work into small, actionable items.

    The smaller, the better. Ideally, although not always possible to achieve, strive to break down tasks such that they can be completed within a single day. This isn’t an absolute requirement but serves as a forcing function to crystalize requirements. Of course, some tasks just require more days, like fleshing out a design document. For ambiguous tasks, create spike stories (i.e. research tasks) that are time-bound.

    Summary

    Project management is an essential skill that every well-rounded developer must have in their toolbox. This skill combined with your technical depth will help you stand out as a strong developer: not someone who just delivers code, but someone who does it consistently and on time.

    Let’s chat more about being a well-rounded software developer. If you are curious about learning how to move from front-end to back-end development, or from back-end development to low-level systems programming, follow me on Twitter: @memattchung

  • Why all developers should learn how to perform basic network troubleshooting

    Why all developers should learn how to perform basic network troubleshooting

    Regardless of whether you work on the front-end or back-end, I think all developers should gain some proficiency in network troubleshooting. This is especially true if you find yourself gravitating towards lower level systems programming.

    The ability to troubleshoot the network and systems separates good developers from great developers. Great developers understand not just code abstraction, but understand the TCP/IP model:

    Source: https://www.guru99.com/tcp-ip-model.html

    Some basic network troubleshooting skills

    If you are just getting into networking, here are some basic tools you should add to your toolbelt:

    • Perform a DNS query (e.g. dig or nslookup command)
    • Send an ICMP echo request to test end to end IP connectivity (i.e. ping command)
    • Analyze the various network hops (i.e. traceroute X.X.X.X)
    • Check whether you can establish a TCP socket connection (e.g. telnet X.X.X.X [port])
    • Test application layer (i.e. curl https://somedomain)
    • Perform a packet capture (e.g. tcpdump -i any) and what bits are sent on the wire

    What IP address is my browser connecting to?

    % dig dev.to
    
    ; <<>> DiG 9.10.6 <<>> dev.to
    ;; global options: +cmd
    ;; Got answer:
    ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 39029
    ;; flags: qr rd ra; QUERY: 1, ANSWER: 4, AUTHORITY: 0, ADDITIONAL: 1
    
    ;; OPT PSEUDOSECTION:
    ; EDNS: version: 0, flags:; udp: 512
    ;; QUESTION SECTION:
    ;dev.to.                IN  A
    
    ;; ANSWER SECTION:
    dev.to.         268 IN  A   151.101.2.217
    dev.to.         268 IN  A   151.101.66.217
    dev.to.         268 IN  A   151.101.130.217
    dev.to.         268 IN  A   151.101.194.217

    Is the web server listening on the HTTP port?

    % telnet 151.101.2.217 443
    Trying 151.101.2.217...
    Connected to 151.101.2.217.
    Escape character is '^]'.
    

    Each of the above tools helps you isolate connectivity issues. For example, if your client receives an HTTP 5XX error, you can immediately rule out any TCP level issue. That is, you don’t need to use telnet to check whether there’s a firewall issue or whether the server is listening in on the right socket: the server already sent an application level response.

    Summary

    Learning more about the network stack helps you quickly pinpoint and isolate problems:

    • Is it my client-side application?
    • Is it a firewall blocking certain ports?
    • Is there a transient issue on the network?
    • Is the server up and running?
  • Why all developers should learn how to perform basic network troubleshooting

    Why all developers should learn how to perform basic network troubleshooting

    (Also published on Hackernoon.com and Dev.to)

    Regardless of whether you work on the front-end or back-end, I think all developers should gain some proficiency in network troubleshooting. This is especially true if you find yourself gravitating towards lower level systems programming.

    The ability to troubleshoot the network and systems separates good developers from great developers. Great developers understand not just code abstraction, but understand the TCP/IP model:

    Source: https://www.guru99.com/tcp-ip-model.html

    Some basic network troubleshooting skills

    If you are just getting into networking, here are some basic tools you should add to your toolbelt:

    • Perform a DNS query (e.g. dig or nslookup command)
    • Send an ICMP echo request to test end to end IP connectivity (i.e. ping command)
    • Analyze the various network hops (i.e. traceroute X.X.X.X)
    • Check whether you can establish a TCP socket connection (e.g. telnet X.X.X.X [port])
    • Test application layer (i.e. curl https://somedomain)
    • Perform a packet capture (e.g. tcpdump -i any) and what bits are sent on the wire

    What IP address is my browser connecting to?

    % dig dev.to
    
    ; <<>> DiG 9.10.6 <<>> dev.to
    ;; global options: +cmd
    ;; Got answer:
    ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 39029
    ;; flags: qr rd ra; QUERY: 1, ANSWER: 4, AUTHORITY: 0, ADDITIONAL: 1
    
    ;; OPT PSEUDOSECTION:
    ; EDNS: version: 0, flags:; udp: 512
    ;; QUESTION SECTION:
    ;dev.to.                IN  A
    
    ;; ANSWER SECTION:
    dev.to.         268 IN  A   151.101.2.217
    dev.to.         268 IN  A   151.101.66.217
    dev.to.         268 IN  A   151.101.130.217
    dev.to.         268 IN  A   151.101.194.217
    

    Is the web server listening on the HTTP port?

    % telnet 151.101.2.217 443
    Trying 151.101.2.217...
    Connected to 151.101.2.217.
    Escape character is '^]'.
    

    Each of the above tools helps you isolate connectivity issues. For example, if your client receives an HTTP 5XX error, you can immediately rule out any TCP level issue. That is, you don’t need to use telnet to check whether there’s a firewall issue or whether the server is listening in on the right socket: the server already sent an application level response.

    Summary

    Learning more about the network stack helps you quickly pinpoint and isolate problems:

    • Is it my client-side application?
    • Is it a firewall blocking certain ports?
    • Is there a transient issue on the network?
    • Is the server up and running?

    Let’s chat more about network engineering and software development

    If you are curious about learning how to move from front-end to back-end development, or from back-end development to low level systems programming, hit me up on Twitter: @memattchung

  • Building an audience: A lesson from the younger me

    When it comes to building an audience as a solo-entrepreneur, the younger me was much smarter, much more in tuned with himself. These days, I operate 95% of my life from the left side of my brain, analyzing and taking a data driven, logical approach. While necessary in many respects, I need to make more decisions using my intuition … that’s how one of my first YouTube videos published 14 years ago received over 400,000 views:

    (I admit that I feel a little embarrassed sharing the above video!)

    I was 20 years old at the time. I was getting into parkour and tricking, and after learning how to back flip, I landed the “butterfly twist”. For this martial arts move, I had practiced several hours a day — with a community of other aspiring trickers — and after many face plants and overcoming of my fears, I recorded a tutorial that I then uploaded to this thing called “YouTube”.

    Back then, the number of views did not matter to me. I was just documenting my progress, sharing with the world, and most importantly: helping others on a similar journey.

    That’s it.

    I’m sure there are more sophisticated ways to gain an audience, a more strategic approach. But, I think I’ll start with just continuing to 1) help others and 2) be part of a community of link-minded people

    So, what are some ways in which you are building and engaging with your audience?

    Come chat with me on Twitter @memattchung

     

  • My introduction in the Piazza forum for Graduate Algorithms (GA)

    My introduction in the Piazza forum for Graduate Algorithms (GA)

    At the beginning of every semester, each student is encouraged to post on the forum (i.e. Piazza), introducing themselves and answering the following questions:

    What is your name? Where do you live? Why take Graduate Algorithms? What do you hope to learn? What other OMS courses have you taken? What is something interesting about you? (Optional) LinkedIn profile

    Here’s my response:

    What is your name?

    Matt Chung

    Where do you live?  Seattle, WA

    Why take Graduate Algorithms? What do you hope to learn?

    Of course, this class is mandatory for us in computing systems specialization, so that’s a big reason. Another reason would be is that I’ve taken data structures course, but never an algorithms course. So, excited to learn more about the second half of DSA.

    What other OMS courses have you taken? 

    2 more to go!!!

    Graduate Introduction to Operating Systems (GIOS), Computer Networks (CN), Information Security (IS), High Performance Computing Architecture (HPCA), Educational Technology, Compilers, Advanced Operating Systems (AOS) Distributed Computing

    What is something interesting about you?

    I published a YouTube video 14 years ago when I was into parkour and tricking, and the video has close to 500,000 views (https://www.youtube.com/watch?v=zXrot9ShfvM).

    I took a non-linear path to becoming a software engineer at Amazon Web Services, where I currently work as a C developer building network packet processing devices for EC2 Networking.

    Most recently, I’m reevaluating my career as a software engineer at FAANG and looking to pivot (not sure where/what exactly) so I can spend more time with my wife, daughter, and our two dogs; spending more time with them has been the silver lining of all the COVID-19 lockdowns.

    (Optional) LinkedIn profile

    https://www.linkedin.com/in/matchu/

    https://blog.mattchung.me

    https://twitter.com/memattchung

  • Distributed Computing @ OMSCS over – what a ride!

    Distributed Computing @ OMSCS over – what a ride!

    Last semester, I decided to enroll in the brand spanking new Georgia Tech’s Distributed Computing course offered for the first time (as part of OMSCS) this past Spring 2021. What a ride! Learned a ton, including Lamport’s Logical Clocks, the FLP theorem, and the notorious PAXOS for consensus. Hats off to Professor Ada and the wonderful teacher assistants for delivering a rigorous and rewarding course. The lectures were top notch and the distributed systems labs (originally created at University of Washington by Ellis Michael) were challenging to say the least.

    Only 2 more semesters until graduating from Georgia Tech with my M.S. in Computer Science!

     

  • Distributed Computing CS7210 Distributed Computing – A course review

    Distributed Computing CS7210 Distributed Computing – A course review

    Distributed Computing was offered in the OMSCS program for the first time this past semester (i.e. Spring 2021) and when the course opened up for registration, a storm of newly admitted and seasoned students signed themselves up — me included. I was fully aware that I was walking into unknown territory, a bleeding edge course, and expected a lot of rough edges, of which there were many. That being said, the course is great and with some some tweaks around pacing, has the potential to be the be one of the best courses offered for students, especially those specializing in computing systems.

    Overview

    The course quality is top-notch. The lectures are intellectually challenging, the assigned readers are seminal pieces of work, and the projects really drill the theoretical concepts. Overall, depending on your programming experience, expect putting in at least 20+ hours per week (with some students reporting anywhere between 30-50 hours).

    Recommendation: If you are a seasoned engineer (with at least a couple years of programming experience under your belt) and someone who can handle ambiguity with little hand holding, then I highly recommend taking the course. But if you are just starting out in your computer science journey, then I would hold off for at least a couple semesters; take the recommended pre-requisites (i.e. graduation introduction to operating systems, advanced operating systems, computer networks) and wait until the course’s rough edges are smoothed out. As another student on omscentral pointed out, this class is “for experienced engineers, not students.”

    What to expect from the course

    Pros

    • Lectures are easy to watch and are packed in digestible ~5 minute chunks
    • Assigned readings (from first half of semester) are seminal pieces of work by famous computer scientists like Leslie Lamport and Eric Brewer
    • Skills and knowledge acquired directly apply to my career as a software engineer and computer scientist
    • Instructors and teacher assistants are extremely professional, care about the students well-being, and quite generous with the grading curve

    In this class, you’ll develop a foundation around designing and building distributed systems. You’ll understand the importance of systems keeping track of time and the different ways to implement clocks (e.g. scalar clocks, vector clocks, matrix clocks). In addition, you’ll appreciate how systems achieve consensus and being able to make trade offs between choosing different consistency models such as strict consistency, eventual consistency. You’ll end the semester with learning about the infamous CAP theorem and FLP theorem and how, as a system designer, you’ll make trade offs between consistency, availability, and the ability to withstand network partitions. Of course, you’ll eat and breathe Leslie Lamport’s PAXOS. So if any of these topics interest you, you’re in for a treat.

    Cons

    • Bleeding edge course means that there were lots of rough edges
    • Projects were very demanding, often requiring multiple hours to pass a single test worth very little towards grades
    • Triggered lots of uncertainty and desperation among students throughout the second half of the semester

    As mentioned above, this class induced a lot of unnecessary stress in students. Even for someone like me, who cares less about the actual letter grades on transcripts, felt pretty anxious (this class potentially could’ve held me back another semester, since up until the grades were actually released, I had assumed I would get a C or lower).

    Impact on mental health

    One concerned students published a post on the forum, asking if students were mentally okay:

    I just wanted to check in with everyone on here in the class. I know these projects are stressful and for me it’s been something of a mental health hurdle to keep pushing despite knowing I may very well not succeed. Hope everyone is doing ok and hanging in there. Remember no assignment is worth your sanity or mental health and though we are distanced we are all in this together.

    Anonymous Calc

    Many other students chimed in, sharing their same frustrations

    I found both of the projects very frustrating. Specially this one. I am working for last 2 weeks (spending 50+ hours in writing/rewriting) and still passing only 7/8 tests. I never had unfinished academy projects. This is the first course I am having this.

    Adam

    I couldn’t help but agree:

    Honestly, I was fairly stressed for the past two weeks. Despite loving the course — content and rigor of the project — I seriously contemplated dropping the course (never considered this avenue before, and I’m 2 courses away from graduating after surviving compilers and other difficult systems courses) as to avoid potentially receiving a non-passing grade (got an A on the midterm but its looking pretty bleak for Project 4 with only 12 tests passing). At this point, I’ve fallen behind on lectures and although there is 1 (maybe 2) days left for Project 4, I’ve decided to distance myself from the project. Like many others, I’ve poured an insane number of hours into this project, which doesn’t reflect in the points in Gradescope. I suspect both the professor and the TAs are aware of the large number of people struggling with the project and will take this all into account as part of the final grading process.

    Tips

    Programming Projects

    Here’s a list of the projects, their weight towards the final grade, and the amount allocated to each assignment.

    • Project 1 – Environment Setup – 5% – 2 weeks
    • Project 2 – Client/Server – 10% – 2 weeks
    • Project 3 – Primary/Backup – 15% – 3 weeks
    • Project 4 – PAXOS – 15% – 3 weeks
    • Project 5 – Sharded KV Store – 15% – 4 weeks

    Project 1 and 2 are a walk in the park. The final 3 projects are brutal. Make sure you start early, as soon as the projects are released. I repeat: start early. Some people reported spending over 100+ hours on the latter projects.

    Unless you are one of the handful of people who can pour in 50+ hours per week in the class, do not expect to get an A on the programming projects. But don’t sweat it. Your final grade will be okay — you just need to have faith and ride the curve. All you can do is try and pass as many tests as possible and mentally prepare for the receiving a C or D (or worst) on these assignments.

    Summary

    The course is solid but needs serious tweaking around the pacing. For future semesters, the instructors should modify the logistics for the programming assignments, stealing a couple weeks from the first couple projects and tacking them on to final projects (i.e. Primary/Backup system, PAXOS, Sharded Key-Value Store). With these modifications, students will stress out way less and the overall experience will be much smoother.