For project 2 of my advanced operating systems course, I quickly typed up what I thought was a functional simple centralized counting barrier implementation. However, after launching the compiled executable multiple times in a row, I noticed the program would hang and not immediately exit … damn deadlock. Instead of loading the debugger and inspecting each of the thread stack frames, I revisited the code and reasoned about why the code would deadlock.
In the code (below), lines 29-32 are the culprit for the race condition. Just as one thread (say thread B) is about to enter the while (count > 0) loop, another thread (the last thread) could reset the count = NUM_THREADS. In this situation, thread B would continue spinning: forever.
Centralized Barrier Example from Lecture Slides
Code Snippet
[code lang=”C” highlight=”29-32″]
#include <stdbool.h>
#include <omp.h>
#include <stdio.h>
#define NUM_THREADS 3
int main(int argc, char **argv)
{
int count = NUM_THREADS;
bool globalsense = true;
#pragma omp parallel num_threads(NUM_THREADS) shared(count)
{
#pragma omp critical
{
count = count – 1;
}
/*
* Race condition possible here. Say 2 threads enter, thread A and
* thread B. Thread A scheduled first and is about to enter the while
* (count > 0) loop. But just before then, thread B enters (count == 0)
* and sets count = 2. At which point, we have a deadlock, thread A
* cannot break free out of the barrier
*
*/
if (count == 0) {
count = NUM_THREADS;
} else {
while (count > 0) {
printf("Spinning …. count = %d\n", count);
}
while (count != NUM_THREADS){
printf("Spinning on count\n");
}
}
}
printf("All done\n");
}
[/code]