Tag: openmp

What I learned from writing synchronization barriers

Before starting project 2 (for my advanced operating systems course), I took a snapshot of my understanding of synchronization barriers. In retrospect, I’m glad I took 10 minutes out of my day to jot down what I did (and did not) know because now, I get a clearer pictur eof what I learned. Overall, I feel the project was worthwhile and I gained not only some theoretical knowledge of computer science but I was also able to flex my C development skills, writing about 500 lines of code.

Discovered a subtle race condition with lecture’s pseudo code

Just by looking at the diagram below, it’s not obvious that there’s a subtle race condition hidden. I only was able to identify it after whipping up some code (below) and analyzing the concurrent flows. I elaborate a little more on the race condition — which results in a deadlock — in this blog post.

Centralized Barrier

[code lang=”C”]

/*
* Race condition possible here. Say 2 threads enter, thread A and
* thread B. Thread A scheduled first and is about to enter the while
* (count &gt; 0) loop. But just before then, thread B enters (count == 0)
* and sets count = 2. At which point, we have a deadlock, thread A
* cannot break free out of the barrier
*
*/

if (count == 0) {
count = NUM_THREADS;
} else {
while (count &gt; 0) {
printf("Spinning …. count = %d\n", count);
}
while (count != NUM_THREADS){
printf("Spinning on count\n");
}
}

[/code]

Data Structures and Algorithms

How to represent a tree based algorithm using multi-dimensional arrays in C

For both the dissemination and tournament barrier, I had to build multi-dimensional arrays in C. I initially had a difficult time envisioning the data structure described in the research papers, asking myself questions such as “what do I use to index into the first index?”. Initially, my intuition thought that for the tournament barrier, I’d index into the first array using the round ID but in fact you index into the array using the rank (or thread id) and that array stores the role for each round.

[code lang=”C”]
typedef struct {
bool myflags[PARITY_BIT][MAX_ROUNDS];
bool *partnerflags[PARITY_BIT][MAX_ROUNDS];
} flags_t;

void flags_init(flags_t flags[MAX_ROUNDS])
{
int i,j,k;

for (i = 0; i < MAX_NUM_THREADS; i++) {
for (j = 0; j < PARITY_BIT; j++) {
for (k = 0; k < MAX_NUM_THREADS; k++) {
flags[i].myflags[j][k] = false;
}
}
}
}
[/code]

OpenMP and OpenMPI

Prior to starting I never heard of neither OpenMP nor OpenMPI. Overall, they are two impressive pieces of software that makes multi-threading (and message passing) way easier, much better than dealing with the Linux pthreads library.

Summary

Overall, the project was rewarding and my understanding of synchronization barriers (and the various flavors) were strengthen by hands on development. And if I ever need to write concurrent software for a professional project, I’ll definitely consider using OpenMP and OpenMPI instead of low level libraries like PThread.

October 13, 2020
OpenMP tutorial notes (Part 1)

I’m watching the YouTube learning series called “Introduction to OpenMP” in order to get a better understanding of how I can use the framework for my second project in advanced operating systems. You might find the below notes useful if you don’t want to sit through the entire video series.

Introduction to OpenMP: 02 part 1 Module 1

Moore’s Law

Summary

Neat that he’s going over the history, talking about Moore’s Law. The key take away now is that hardware designers are now going to optimize for power and software developers have to write parallel programs: there’s no free lunch. No magic compiler that will take sequential code and turn it into parallel code. Performance now c

Introduction to OpenMP – 02 Part 2 Module 1

Concurrency vs Parallelism

Summary

The last step should be picking up the OpenMP Library (or any other parallel programming library). What should be done firs and foremost is breaking down your problem (the act of this has not been able to automated) into concurrent parts (this requires understanding of the problem space) and then figure out which can run in parallel. Once you figure that out, then you can use compiler syntactical magic (i.e. pragmas) to direct your compiler and then sprinkle some additional pragmas that will help the compiler tell where the code should enter and where it should exit.

Introduction to OpenMP – 03 Module 02

OpenMP Solution Stack

Summary

It’s highly likely that the compiler that you are using already supports OpenMP. For gcc, pass in -fopenmp. And then include the prototype, and add some syntactic sugar (i.e. #pragma amp parallel) which basically gives the program a bunch of threads.

Introduction to OpenMP: 04 Discussion 1

Shared address space

Summary

OpenMP assumes a shared memory address space architecture. The last true SMP (symmetric multi-processor) machine was in the late 1980s so most machines now run on NUMA (non uniformed multiple access) architectures. To exploit this architecture, we need to schedule our threads intelligently (which map to the same heap but contain different stacks) and place data in our cache’s as close as possible to their private caches. And, just like the professor said in advanced OS, we need to limit global data sharing and limit (as much as possible) the use of synchronization variables since they both slow down performance

Programming Shared Memory

September 29, 2020

Tag: openmp

What I learned from writing synchronization barriers

Discovered a subtle race condition with lecture’s pseudo code

Data Structures and Algorithms

How to represent a tree based algorithm using multi-dimensional arrays in C

OpenMP and OpenMPI

Summary

OpenMP tutorial notes (Part 1)

Introduction to OpenMP: 02 part 1 Module 1

Summary

Introduction to OpenMP – 02 Part 2 Module 1

Summary

Introduction to OpenMP – 03 Module 02

Summary

Introduction to OpenMP: 04 Discussion 1

Summary