Not a Computer Scientist, But…

Bit Reversal in Concurrent Data Structures

2025-03-16T00:00:00+00:00

Combing through old papers on concurrent data structures, I was delighted to see the bit reversal subroutine being used in 2 of the designs in ingenious ways. In this post I will try to capture the essence of them.

Bit reversal? Concurrent data structures?

Bit reversal is where we take an int (say, 64 bits) and reverse it, e.g. reversing 8 bits may look like 00110101 -> 10101100. There is no CPU instruction to do so, but it can be implemented relatively efficiently in a few ways, e.g. by masking + shifting, bswap (instruction for swapping bytes), lookup tables etc.

Concurrent data structures are data structures that are safe to use in shared-memory parallelism. For performance, modern computers don’t guarantee strong memory consistency, and programs that need to access the same memory locations need to explicitly do so, e.g. via mutexes and atomic operations.

Concurrent priority queue by Hunt et. al. (1994)

This section covers the paper An Efficient Algorithm for Concurrent Priority Queue Heaps. The problem is: let’s say we want to have a heap (supports inserting an element with priority and removing the element with highest priority) and let a bunch of threads access it. The most naive thing to do is probably an array-based binary heap that is put behind a lock. The ith node has children 2i+1 and 2i+2, parent has higher priority than children, inserts happen at the bottom and bubble up to the root, removals swap the last element to the root and bubbles down, etc. Any thread that attempts to read/write needs to take the lock, and only one thread can proceed at a time.

What if we put a lock at each node? Before you swap two nodes in a bubble-up process, you’d only need to lock those two nodes. That way, you wouldn’t need to lock the entire data structure, giving other threads a chance to proceed.

Unfortunately this results in high contention. If two threads are inserting at the same time, one will be at node i, and the other will be at node i+1. Chances are they have the same parent, so they immediately start contending for the same lock. Even if they have different parents, their parents’ parents are probably the same, and so on.

This paper’s main insight is: what if after inserting at node i, we don’t insert at node i+1, but somewhere far away in the tree? Imagine the following binary tree:

    (0)
   /   \
 (1)   (2)
 / \   / \
3   4 5   6

Only nodes 0-2 are occupied. Inserting in the order (3, 4, 5, 6) will likely cause high contention. But if we insert in the order (3, 5, 4, 6), then we might avoid some of the most common lock contentions, since neighboring pairs don’t share the same parent.

More generally, when a low bit of the heap size changes, you want to choose a different sub-tree to insert; when a high bit changes, we have inserted a lot of nodes, so it’s fine to choose a closer sub-tree.

What this ends up looking like is: the high bits of the node at which to insert depends on the low bits of the heap size. We could literally implement that by reversing the order of bits after the leading 1 bit. So the insert order would look like:

0, 1; 10, 11; 100, 110, 101, 111; 1000, 1100, 1010, 1110, 1001, 1101, 1011, 1111…

This idea is simple but clever. Unfortunately this paper didn’t quite stand the test of time, most importantly there’s still global contention to mutate the heap size and to pop the heap at the root. Skip-list based designs are now considered a better choice.

Split-ordered lists by Shalev et. al. (2006)

This section covers the paper Split-Ordered Lists: Lock-Free Extensible Hash Tables.

Compared to priority queues, hash tables are undoubtedly more ubiquitous. Generally they can be open/closed - open meaning each bucket contains a resizable container (e.g. linked list) and closed meaning all data lives in the array, and collisions are handled by probing.

A significant challenge in designing a concurrent hash table is the resizing operation. Typically, resizes are done by doubling the size of the array and moving all old data over to the new one. During this migration, no readers/writers can proceed.

This paper asks: what if we don’t move the data into buckets, but move the buckets to point to the data in the right way? If we’re only creating a new index of the same underlying data, then the old index is still valid and can be used in the meanwhile.

More concretely, we want to put all the items in a linked list, so that all the hashes that are equal mod 2^k (for any k) are consecutive in the list. This way, we can maintain a hash table by having an array where the ith bucket points to the first node where hash mod 2^k = i.

It turns out you get this property by ordering the nodes by the reverse of the hash. Thinking about it more, this isn’t that surprising - at first, you want all the nodes with the hashes having 0 as the last bit to be in front of the rest; then, within those, you want all the 0 as the second last bit to be in front…

As a quick example, say we have 8 items with hashes 0-7. Sorted by this order, we have:

000 100 010 110 001 101 011 111
^               ^               | 2 buckets (hash mod 2)
^       ^       ^       ^       | 4 buckets (hash mod 4)
^   ^   ^   ^   ^   ^   ^   ^   | 8 buckets (hash mod 8)

Notice how across different number of buckets, the chains of each bucket is still valid, even though the underlying linked list never changed.

From what I can tell, this design is still not obsolete in 2025, even as new improvements have appeared and computers evolved.

HTB CTF Try Out

2024-10-20T00:00:00+00:00

I just finished all challenges for HTB CTF Try Out, which was my first CTF event. This post serves both as a summary for what I’ve learned and as guides for beginners like myself.

Background

If you’re a total beginner to cyber security like myself, Capture The Flag events are something like puzzles that often involve extracting obscured information, writing bespoke interactive scripts or hacking designated processes to eventually get access to a string, i.e. the flag. It’s a bit like puzzle hunts in that challenges can take many forms, and the difficulty is mostly figuring out what the rules are, rather than following them.

Below, I’ll go through each category. I won’t detail the solutions, only describe general knowledge needed to solve the challenges.

Web: TimeKORP, Flag Command, Labyrinth Linguist

There are 2 flavors of exploits here - client side and server side.

On the client side, I find the chrome dev tools (source code, console, debugger) more than sufficient to understand what the program does or read data out of memory.

On the server side, there are various types of injections. The relevant ones are php and server-side template injection here. TimeKORP’s source code was given which makes it pretty easy, as long as you know how to cat a file. Labyrinth Linguist’s source code was given as an encrypted zip. This encrypted zip could be decrypted, but note that the cracker only works for uncompressed plaintext of at least 12 bytes (there is exactly one file in the zip that matches the conditions). You actually need to figure this out for a later challenge. But even if you didn’t, you can still see what entries are in the zip and therefore what SSTI payloads are likely to work. If you have Burp Suite Professional Edition, your life is a lot easier as it just tells you the vulnerability. From there it was still difficult to get remote code execution because none of the top google results for the payloads worked. The hint here is that in addition to getClassLoader, there is another function that achieves similar functionality that gives you the java runtime for RCE.

Forensics: An Unusual Sighting, Phreaky

Similar to Labyrinth Linguist, you’re given an encrypted file for Phreaky. Refer to the above to decrypt the zip (john the ripper wasn’t the right tool). Then Wireshark can be used to reveal the next step, and NetworkMiner can get the job done.

Reversing: LootStash

Just install ghidra. It’ll be useful later.

Misc: Character, Stop Drop and Roll

LLMs make these coding problems very easy. I did struggle with Stop Drop and Roll a little bit until I changed the approach to use a regex.

Crypto: Dynastic

This is like a Caesar cipher (not sure if there’s a name for this variant).

Hardware: Critical Flight, Debug

You just have to figure out which softwares to use to open these files. I used KiCad Gerber Viewer and Saleae Logic-2.

Pwn: Getting Started, Labyrinth, Void

Ghidra was very helpful for these, as well as pwntools. For Labyrinth, you need to learn something called Return Oriented Programming. Embarrassingly I had to peek at a solution for Void, but I just didn’t read the linked website closely enough. This code snippet basically worked after filling in the details shown by ghidra.

Closing Thoughts

There are quite a few libraries, tools and concepts to go through even in this “beginner-friendly” CTF. But adding these to my personal toolbox has been quite rewarding.

As a metapoint, it can feel like cheating to get hint by searching for the challenge name online. But all of this is for fun anyway, so if you’re going to have more fun knowing what to do than getting stuck, then that’s more important.

The Monty Hall Game

2024-05-27T00:00:00+00:00

A friend brought up the classic Monty Hall problem. During our discussions, I realized it’s interesting to put the problem under the lens of game theory.

Refresher

You’re on a game show. The host shows you three doors, and you win a car iff you open the right door. After you name your choice (door 1), the host opens door 2, showing that it’s not the right one. Now you’re given the choice to switch to door 3. Should you do it?

Person A says it doesn’t matter, since 1 and 3 are still equally likely to be the right door. Why would eliminating door 2 make that untrue?

Person B says we should switch. Before the host opened door 2, door 1 has 1/3 chance to be the right one, i.e. there is 2/3 chance that you’re wrong. If you were wrong, switching would make you right, so door 3 now has 2/3 chance of being right!

This apparent paradox has caused heated debates throughout the years.

A two-player zero-sum game

It turns out that what you should do depends on what the host was thinking when they opened door 2.

Let us reframe the game as a two-player zero-sum game. The contestant picks a door (name it 1), then the host either opens door 1 or another door (name it 2), showing that it’s not the correct door. Now the contestant can pick one of the two unopened doors.

The contestant’s strategy can be characterized with a single probability p. If the host opens door 1, obviously the contestant can only guess among the other two doors, getting it right 1/2 of the time. If the host opens door 2, the contestant can switch with probability p.

The host’s strategy can also be characterized with a single probability q. If door 1 is correct, then the host doesn’t have any choice but to open one of the remaining two doors at random. Otherwise, the host can open door 1 with probability q, forcing the contestant to guess.

The contestant’s probability of winning would then be:

P(door 1 is right) * P(contestant doesn't switch)
+ P(door 1 is wrong) * P(host opens door 1) * P(contestant guesses right)
+ P(door 1 is wrong) * P(host opens door 2 or 3) * P(constestant switches)
= 1/3 * (1 + p + q - 2pq)

If the host wants to minimize this probability by changing q, then it depends the sign of 1-2p. If p < 1/2, the host would want q to be minimized, otherwise maximized. The contestant faces an analogous situation: if q < 1/2, p should be maximized, otherwise minimized. In other words, if p is small, q should be small, which should make p large, which should make q large… The only equilibrium is when p = q = 1/2, which makes the probability of winning 1/2. This intuitively makes sense: either player is just randomly guessing, and the result is the same as flipping a coin.

The game show, hypothetically

In the actual game show, the host never opens the contestant’s chosen door, i.e. q = 0. By the above analysis, p should be 1, resulting in a winning probability of 2/3. One way to think about this classic result is that the host is playing a bad strategy and getting exploited by the contestant.

But as a thought experiment, what if you’re the first ever contestant, and you don’t know what the host is thinking when you see door 2 being opened?

If q = 0 then switching is 2/3 of a prize;
But if q = 1 then switching is 1/3;
What if the host only opens the second door if you were correct on the first try? Then switching always loses.
What are the host’s incentives? Does the show want to save on the prizes, or is giving more prizes better for ratings?
What is the average contestant expected to do? Does that affect the design of the game?

How do we stay sane when playing games when the rules are unknown?

When, If Not Now?

2023-04-08T00:00:00+00:00

Everybody procrastinates, and everyone hates it. Much ink has been spilled to teach people how to stop delaying work. Here, I want to offer a logician’s view.

There are certain tasks that I want to do, but there aren’t clear deadlines, for example working on a hobby project, reading a book, or learning how to play a new song. I might write those down in my notes, thinking to myself: I’ll get to them when I get some free time.

But when I finally have some free time, I start to find excuses. I’m a bit tired right now. Maybe in an hour. Why not tomorrow?

The thing is, if at time t, I have the chance to do a thing, but I delay that to t+1, then by induction I am never going to do it. In other words, if I’m ever going to do it, it might as well be now.

It is always now or never.

This Sudoku Must Be Solvable

2023-02-04T00:00:00+00:00

Unique Rectangle

In this post I’ll share my favorite Sudoku trick, called the Unique Rectangle. It is really clever and comes up in actual puzzles. I’ve been solving the New York Times hard Sudoku every night for years, and I’ve used this trick on maybe 10-20% of them.

Imagine you’re solving a Sudoku and you’ve penciled in:

     1   2   3
   +===+===+===+
A  |12 |12 | ? |
   +---+---+---+
B  | ? | ? | ? |
   +---+---+---+
C  | ? | ? | ? |
   +===+===+===+
D  |12 |123| ? |
   +---+---+---+

You’ve eliminated all possibilities except 1 and 2 for grids A1, A2 and D1, and you know D2 can only be 1, 2 or 3.

The Unique Rectangle rule says that D2 must be 3.

“Why?” You object. “There are only 3 rules in Sudoku - all 9 3x3 blocks, 9 rows and 9 columns must all contain numbers 1-9. If D2 is 1 or 2, we can still fill in A1, A2 and D1 without violating the rules!”

You are right that the normal constraints of Sudoku aren’t enough. This trick relies on a leap of faith: we assume that the puzzle has a unique solution. This is true for all valid Sudokus – if there are more than one solutions, then the puzzle will not be solvable using logic, and is therefore invalid. (Using the fact that the puzzle is solvable to solve the puzzle might feel like cheating, but I have no personal issues with it.)

In the above example, if D2 isn’t 3, we have 2 possibilities:

(I)       1   2   3     (II)      1   2   3
        +===+===+===+           +===+===+===+
     A  | 1 | 2 | ? |        A  | 2 | 1 | ? |
        +---+---+---+           +---+---+---+
     ...                     ...
        +===+===+===+           +===+===+===+
     D  | 2 | 1 | ? |        D  | 1 | 2 | ? |
        +---+---+---+           +---+---+---+

The only constraints that can be used to solve these 4 grids are columns 1 and 2, rows A and D and the 3x3 blocks, but in all these constraints, (I) and (II) are indistinguishable, and we are left with an unsolvable puzzle. So D2 has to be 3.

More generally, the Unique Rectangle rule states that if you have 4 grids in 2 3x3 blocks forming a rectangle, and 3 of them have only 2 choices, then the fourth grid cannot be either of those choices.

There are some extensions to this trick. One is chaining which I have also used:

     1   2   3
   +===+===+===+
A  |12 |12 | ? |
   +---+---+---+
...
   +===+===+===+
D  |23 |23 | ? |
   +---+---+---+
...
   +===+===+===+
G  |13 |134| ? |    G2 cannot be 1 or 3
   +---+---+---+

Another one that I recently figured out during solves:

     1   2   3
   +===+===+===+
A  |12 |12 | ? |
   +---+---+---+
B  | ? | ? | ? |
   +---+---+---+
C  | ? | ? | ? |
   +===+===+===+
D  |123|123|34 |
   +---+---+---+

Either D1 or D2 must be 3, so we know D3 must be 4.

Here is a website with other advanced Sudoku techniques. I have not found the other tricks to be as interesting or as applicable to NYT puzzles.

Leap of Faith

This sort of “if there is a solution, it must be this” thinking comes up in other occasions as well. Here are a few that I can immediately think of.

In minesweeper, it is very common to run into unsolvable boards, but the same logic can still apply. In this example:

     1   2   3   4
   +===+===+===+===+ ...
A  |   |   | 1 | 0 | 
   +---+---+---+---+
B  |   |   | 2 | 1 |
   +---+---+---+---+
C  | 1 | 2 | |>| 1 |
   +---+---+---+---+
D  | 0 | 1 | 1 | 1 |
   +---+---+---+---+
...

There are three possibilities:

(I)            (II)          (III)
+===+===+ ...  +===+===+ ... +===+===+ ...
| X |   |      |   | X |     |   |   |
+---+---+      +---+---+     +---+---+
|   | X |      | X |   |     |   | X |
+---+---+      +---+---+     +---+---+
...            ...           ...            

When you finish the rest of the board, you can see how many mines are left. If it’s 1, you know it must be (III); but you can’t tell between (I) and (II). Either way, clicking on A2 or B1 is never wrong, so you can do it without solving the rest of the board. (In minesweeper, if you must guess, it is better to guess earlier so that you don’t waste time in an unsolvable game).

Another example is Alice at the Convention of Logicians. Everything on this wiki page is worth a read, so I won’t bother explaining the puzzle here.

I’ve also found that this line of thinking is useful in solving math puzzles in general, or even problems that aren’t necessarily designed to be solvable. It is like a less blasphemous form of Pascal’s wager – if you’re right, then great, otherwise it doesn’t matter.

Left Shoot, Right Shoot: A Rock Paper Scissors Variant

2022-09-19T21:31:00+00:00

In Hong Kong, there’s a variant of rock paper scissors called 左一拳右一拳, which can be roughly translated as “left shoot, right shoot”. In this post, I’ll walk through how it works and how to play optimally.

Rules

There are three steps in this game. First, both players play rock paper scissors with one hand. Second, both players play rock paper scissors with the other hand. Third, both players take back one hand, and the winner is determined by comparing the remaining hands using normal rock paper scissors rules.

Let’s run through a quick example. Say players A/B play ✊/🖐, then 🖐/✌. Now since A can only pick between ✊ and 🖐, if B keeps 🖐 and retracts ✌, B will never lose. If A anticipates B to play 🖐 and also plays 🖐, then they will tie.

But if B anticipates that, B can sometimes pick ✌ which beats 🖐. But if A anticipates that, maybe A will also sometimes play ✊… In true rock paper scissors spirit, there always seems to be a better strategy.

Strategy

But of course, all games have optimal strategies, when we allow strategies to incorporate randomness. The optimal play is:

Pick anything for the first hand with equal probability;
follow your opponent’s first hand for your second hand with 2/3 probability, unless that’s the same as your first hand, in which case pick the one that beats that;
Keep the hand that both players have in common with 2/3 probability, unless both players have the same two choices, in which case pick the one that doesn’t lose.

Below, let’s go through an outline of the math involved. First, we have to define what both players are maximizing.

Objective

It might not be immediately obvious that it is nontrivial to define the goal of the game - of course you want to win instead of lose! But this is not always the case. For example, some people might really want to avoid losing, instead of trying too hard to win.

Rock paper scissors is commonly played as a way to fairly decide a binary outcome between two people (e.g. who gets to pick the restaurant). It is reasonable to apply the same to this game, meaning players will repeat the game until a winner is determined, allowing no ties. Playing optimally then means maximizing the probability of winning in a repeated game.

Since the game is fair, in the event of a tie, your chance of winning is still 1/2. So, we’re maximizing P(win)+P(tie)*1/2, which is the same as minimizing of P(lose)+P(tie)*1/2. Equivalently we can put those together and maximize P(win)-P(lose) for each game. In other words, we can pretend the loser pays $x to the winner, and maximize the expected value of winnings.

Analysis

First we can analyze step 3 - both players have to pick one out of two given choices.

Let’s run through the boring cases quickly. Boring case 1: either player has the same choice for both hands (dumb). Boring case 2: both players have the same two choices (the optimal pick is obvious).

Now to analyze the remaining case where both players have different choices. Let’s say A has ✊, 🖐 and B has 🖐, ✌ to pick from. Let’s say the winner of the game wins $3, the loser loses $3. We have four outcomes after both players take back one hand:

A \ B	🖐	✌
✊	-3 \ 3	3 \ -3
🖐	0 \ 0	-3 \ 3

Since this is a zero sum game, to maximize your winning, you want to make your opponent’s best option as bad as possible. This happens when both of their options are equally bad.

When A (B) plays 🖐 with 2/3 probability, B (A) gets $1 (-1) in expectation no matter which hand they take back. Hence, the optimal strategy for step 3 is to pick the option that can lead to a tie 2/3 of the time (🖐 in this case).

Now that we’ve established step 3’s strategy, let’s do the same to step 2. Say A picked ✊ and B picked 🖐, and both have to pick their second hand. Since we already worked out the expected value for all possibilities at step 3, we can again tabulate the expected value of all scenarios for A and B here.

A \ B	🖐 , ✊	🖐 , ✌
✊ , ✌	-1 \ 1	1 \ -1
✊ , 🖐	0 \ 0	-1 \ 1

If you paid attention, you’ll realize that this is just the previous table from step 3 but with winnings scaled down by 1/3. This means that at step 2, we’re playing the exact same game! So the optimal strategy must also be the same - we play the option that leads to a tie with probability 2/3.

Summary

As a bonus fact, we can calculate the probability of a tie. If both players end up with the same two choices after step 2, the game must end in a tie; otherwise there’s still a 2/3*2/3 chance of a tie. This yields 1/3 + 2/3 * (4/9 + 5/9 * 4/9) = 193/243 = 79.4% chance of a tie.

This concludes the analysis of the game. In Cantonese, both players would both say something like “Xxx, Xxx, xxxxXxx” (inscrutable Chinese) where the uppercase letters indicate when the steps happen. I wonder how this gameplay can be translated into English exactly.

Unexpected Chinese Remainder Theorem

2022-05-28T00:00:00+00:00

Last week, we had an incident at work. Both the bug itself and the debugging process were mildly interesting, and I’ll describe both briefly below, and discuss some lessons.

The Setup and the Incident

We have a system that basically subscribes to a whole bunch of data, does a bunch of computations on them, and publishes output in real time. The computations are split into tasks identifiable by unique names. For both CPU & memory reasons, the system spawns up to a few dozens of workers (Linux processes) across a bunch of computers, and assigns each task to one process by the hash of its name, modulus the number of workers.

For redundancy and latency, we run a few replicas of the whole thing across multiple data centers, all doing roughly the same computations.

One day, during a routine system upgrade, all replicas crashed one after another. This got us into panic mode.

The Debugging Process

To be clear, this is bad - this is what we specifically tried to prevent by running replicas, and stagger their upgrade schedule.

To find out the root cause, the first thing as always is to inspect the logs. There was a single error message saying which worker crashed first, and the exception that crashed it. The exception suggested that one of the values that came from a data subscription was too large and caused a buffer overflow.

OK, that’s something, but we have hundreds of thousands of data subscriptions, so we need more cleverness to narrow it down so we can find the problematic data.

Well, we know the worker number is X out of N total workers from the logs. If we get a list of all data scriptions and their task names (a data subscription is also considered a task), then we can compute the hash of each name and see which ones are running on worker X. This will narrow it down by a factor of N, which is on the order of 50. Which is not nearly enough.

But it happens that due to whatever reason, some replicas run with a different number of workers. That means we can gather a bunch of X_i and N_i pairs, and narrow down the set of suspected tasks further.

If you have a few equations of the form A mod Ni = Xi, you can merge them together to get A mod N* = X* where N* is the LCM of all N_i using the Chinese Remainder Theorem. The larger we can make N*, the fewer tasks will satisfy the equation, and the better chance we have to pin down exactly the one task we’re looking for.

So we gathered 4 pairs of X and N, and ended up shrinking the number of suspected tasks by a factor of a few thousand, leaving us with only a few dozen options. Poring over the task names one by one, we finally found the one subscription that caused the crashes.

The Bug

The bug itself is fairly simple, but the mechanism in which it crashed all replicas is a bit subtle.

There was a recent code change that changes the behavior when the system gets erroneous values from data subscriptions. In the past, when a worker gets an error, it just passes the error value to downstream computations. The code change was to append metadata to these errors to help track down where they came from.

This change seems innocent enough, but the issue manifests when the system is configured to subscribe to data published by itself. Let’s say such a self loop exists in a task. When this task first computes, it subscribes to data that hasn’t been published yet, so it will result in an error. In the old code, this error will then be fed in to the task again, but the output will not change. However, in the new code, each round of feedback leads to a bigger error value due to the additional metadata, eventually overflowing buffers and crashing the process and also clients consuming the value.

This also explains why despite rolling out the new executable to only a subset of replicas, all of them crashed. When subscribing to a data, you have to specify some sort of url, and this url points to one of the replicas. In other words, only one replica has the self loop, and other replicas are consumers of the output of the loop. So when the loop is completed in the roll process, all replicas will crash upon receiving the large value, regardless of whether they contain the code change.

Reflections

This incident didn’t end up causing too much headache because it was fixable with a rollback. Either way, it’s a good exercise to think through it to learn the maximum amount of lessons out of it.

Pinning down the issue this time required some amount of luck. In particular, we had replicas running with different number of workers. This did not have to be the case, and it even seems undesirable to have different enviroments. One change we could make here is to change the hashing scheme of task names to worker. We could hash the task name and the replica ID together (i.e. use the ID to salt the hash). This way, we don’t have to rely on the numbers of workers being coprime with each other to narrow down tasks using worker IDs.

This incident is not the first time that snowballing error values caused hiccups. I’ve also heard of stories where parsing and appending to error values lead to accidentally quadratic time complexity. Perhaps we should be a bit careful when dealing with error values, because they can sometimes be unexpectedly large. (I’m not sure how much this makes sense in various programming languages; some languages might not have the concept of a error value object that can be manipulated at runtime.)

Another thing that is less clear is that perhaps we could just outright ban self loops, as these are probably just footguns. But this might or might be reasonable depending on the actual situation.

One might also be tempted to think that instead of relying on clever filtering based on hashes, we should just improve the error message in the logs to show exactly what caused the crash. I think practically this would not have helped in this case. Sometimes you just don’t know where the system could crash - if we had anticipated it, we would’ve fixed it. Wrapping every single part of code with error tagging just seems excessive and infeasible.

In the end, I didn’t think anyone made a mistake in the process, and there was not much we could’ve done to avoid it. Testing couldn’t have caught it because the loop would only exist in production, due to all testing systems also subscribing to the url that points to production; this bug was hard to anticipate in code review; and careful deployment wouldn’t have prevented it.

Sometimes, incidents are just a cost of business, even if they happen in production.

A Tale of Two Zeros

2022-04-07T00:00:00+00:00

One day I came across some code that looked like this (paraphrased):

let sign__old f =
  assert (Float.is_finite f);
  if String.is_prefix (Float.to_string f) ~prefix:"-"
  then `Negative
  else `Nonnegative

Naturally, I replaced it with the following:

let sign__new f =
  assert (Float.is_finite f);
  if Float.is_negative f then `Negative else `Nonnegative

A few days later, this caused an issue in prod. How is it possible? These two functions are obviously equivalent, right?

It turns out there is exactly one edge case for which these two functions behave differently: -0., aka negative float zero.

Positive vs Negative Zero

In the IEEE floating point standard, numbers are represented as sign and magnitude. This means it is technically possible to have both a positive and a negative zero. While these two values are numerically equal, both are treated as valid floats, and they behave differently when passed into different functions.

In this case, sign__new sees negative zero as `Nonnegative, because it is not numerically smaller than zero, despite having a negative sign. On the other hand, Float.to_string (-0.) produces "-0.", so sign__old thinks it is `Negative.

I think it’s likely uncommon that the existence of negative zero leads to bugs in code, because typically programs see these two values as having the same behavior. In my case, the code is constructing an AST that represents the float value. The old code produces a unary negation applied to positive zero immediate, while the new code proces a negative zero immediate. This change caused an exception in prod because the code generation and parsing process no longer round trips.

The first time I learned about negative zero, I thought it was a misfeature. But it actually makes sense, considering infinities are also signed. With all four values representable, we can have 1/-inf = -0, 1/-0 = -inf and so on, which is nice.

With the bug identified, the fix is easy: use Float.ieee_negative instead of Float.is_negative.

If you enjoyed this post, try another: Precisely Compare Ints and Floats

Building Multiplayer Minesweeper: Easy If You Know How

2022-03-20T12:15:02+00:00

I spent the past few weekends building a multiplayer minesweeper game on the web. It’s now live, so get a nerdy friend and try it out! In this post, I’ll list out the things that I picked up along the way. I started roughly from scratch, so if you also have little background but want to do something similar, you can use this post as a starting point.

Overall, it was easy if you know how to build it, but knowing how was not easy.

Objectives

The project is to build a web game. The game play is like regular minesweeper, except that there are 2 players mining the same map, and the goal is to race to flag 50 mines out of 99 instead of clearing the whole map. When a player clicks on a mine, it is a free point for the opponent; and when a player flags a grid that isn’t a mine, they get frozen for a few seconds. When a flag is correctly placed, it shows up on both clients, but cleared grids are only visible locally.

Project objectives in descending order of importance:

Finish the project
Do it right

Stack

Ultimately, the deliverables are a front end web app and a websocket back end to handle the real time messages between front end clients. For both front and back end, there are many ways to build them.

After some research, I settled on vue.js (Vue 3) for the client, and NestJS for the server. None of the decisions here, or below, are obvious by any means. I picked the stack based on the following criteria.

Popularity. Solutions to common problems can be googled, and tooling is probably better.
Nice APIs. Static typing, concise syntax, modularity, etc.
Easy (for me) to learn. This is important for actually finishing the project.

Some metrics that I did not care about at all: performance, security and bloat.

Here are some of the other choices I considered:

Native apps instead of web. This is a no go because few people want to install apps these days, especially on the desktop, and performance is extremely unimportant given how simple the game is.
OCaml client+server, using bonsai (incr_dom, js_of_ocaml) for front end. I am most proficient in OCaml and I like the language. But frankly, the popularity checkbox is extremely unchecked, and the syntax isn’t even nice. Unless you already work in an OCaml repo, using OCaml for web clients is pretty hard to justify.
React client. I mean, you can’t go wrong with react, since it’s the most popular choice. But I find Vue simpler to use and more opinionated. In this case, Vue having a designated way to do things is a plus, since I’m here to build a game, not to form opinions about how to build a game.
Other client frameworks like angular, svelte, solid, etc. I didn’t seriously consider these, since they seem to be less popular that React and Vue.
Other languages for the server, e.g. python, go. For simplicity, I think using the same language for both the client and the server is good, and in this case that language is TypeScript. So I didn’t seriously consider these.
Not using a back end framework. NestJS is designed to have opinions on how to do things, which again is good when I don’t have opinions.

Here is a grab bag of things that I used.

VS Code
socket.io for websocket communication in both client and server
Tailwind css
TypeScript for both client and server
Font awesome for icons
Google font
Vuex (to persist player name)

Things that weren’t obvious to me at first:

For servers to be able to push updates to clients, either you need a websocket connection between the two, or you use the server push event API. I didn’t look closely at the latter, as I believe it is much less used.
There are actually 2 web servers: one to serve the front end through a GET request, and one to serve the websocket back end. It is possible to serve both from the same server, but as far as I can tell people don’t do that.
When you click on a “link” that changes the url, e.g. from https://onemistakes.click/foo to https://onemistakes.click/bar, it is not actually going to a new website. Instead, it just runs whatever javascript you want it to run to show another view, load resources, etc. This url change can also be programmatically triggered. This is how single page apps, or SPAs, work.

Design

The server handles:

Creating/joining game rooms
Generating new game map
Message relaying between clients
Tracking which client flagged/bombed a grid first

And the client does the rest.

The client tracks the state of each grid, which can be one of the following: unclicked, clicked, flagged by local player before server ack, flagged by either player after server ack, bombed by either player.
Flags before server ack are displayed as flagged by the local user, but ignored in score counting. This ensures the user sees no lag and there is no race for deciding the winner.
The client requests a new game automatically a few seconds after the game ends.

This design is similar to rollback netcode, although greatly simplified because the game is very simple. To explain what that is, we first have to explore a few alternatives.

A multiplayer game is like a distributed state machine. You have a state, and all players provide inputs to change it. There are a few designs that achieve this.

Server as the only master: Clients send user inputs to the server, server sends back the current state for display.
Designated client as the only master: Server only relays messages between the master client and other clients. Only the master client can change the state and disseminate it to other clients.
All clients run the state machine, and proceed only when all inputs from all clients arrive. (This is called deterministic lockstep.)
All clients run the state machine, and guess the inputs from other clients before they arrive. When they arrive, replay the state machine to correct wrong guesses. (This is called rollback netcode.)

1 is perhaps the simplest. But it suffers from two issues: players experience a lag between making an action and getting a response due to waiting for the server, and the server, which is a shared resource, has to do more work. 2 and 3 also have the issue of lag.

Even though there is no explicit state machine, the design explained above is like rollback netcode, because by displaying local flags immediately, we are basically guessing that the opponent did nothing unless we later learn otherwise. Since the game is simple, using the above state tracking logic greatly simplifies the code, while achieving the same behavior as rollback netcode.

Deployment

After a few weekends, I got the game working on localhost. Excluding boilerplate, the client took around 800 lines and the server under 200. But deployment is another inevitable battle.

I ended up pushing both the client and server code to Github private repos. I linked the server repo to AWS CodePipeline and Elastic Beanstalk, linked the client repo to AWS Amplify and got a domain name from Route 53 to sign SSL certs. Now I just have to git push my code and everything works automatically, which is nice.

For my previous hosting experience, I’ve been renting an EC2 instance and doing things like ssh, scp, cron jobs, Let’s Encrypt and so on. This time, once I figured out what AWS services I needed, setting them up was much less painful than what I did in the past manually. Again, easy once you know how, because there are so many AWS products and just knowing which ones are relevant is already a bunch of work.

There were a few more issues I had to deal with at this stage.

Client needs to talk to different servers depending on prod/dev mode. This was solved using environment variables.
AWS instance ran out of memory building the docker image for the server, because building takes more memory than running it. This was solved by adding a prebuild hook to add swap space to the instance.
Client couldn’t talk to server unless the server has https. This was solved by buying a domain name from AWS and signing a cert for the load balancer of the back end. For this reason I have to use a load balancer even though there can only be one server (otherwise clients may not see each other).

I did not look into AWS competitors, as I was already an AWS customer, and the popular sentiment seems to be that AWS is still the leader in cloud services. Self hosting was a nonstarter, because there is no concern about data integrity (the server persists nothing on disk), and I only care about building the game, not everything else that runs it.

Future

When I set up the repo and deployment flow, I made sure that I would not have to do this again for the next game. When I build the next little thing, I should be able to just write some more code, test it locally, push to Github, and then I’m done.

Now I just need an idea…

Searching for Sets: Jaccard Index and MinHash

2022-01-02T19:59:25+00:00

Here’s a well-written intro to audio fingerprinting. One part of the article contains a clever trick that seems generally useful and interesting to think about. I will attempt to quickly describe the problem below and explain the solution.

The Problem: Fingerprints of Sparse Sets

To summarize the article in an extremely lossy fashion, the problem is that we want to take an audio clip, and find similar clips in a database. To do so, we can convert each clip into low dimensional “fingerprints”, and utilize nearest neighbor algorithms to find clips with the most similar fingerprints. Given any clip, one can cut it into small segments, run FFT to get spectrograms, then binarize the resulting images to get bit vectors. But even with, say, 128 x 32 = 4096 bits per fingerprint, it’s still too many to run nearest neighbors. How can we further reduce dimensions to facilitate faster matching?

It is important to note that in this context, the 0s and the 1s in the bit vectors aren’t symmetric. Instead, it is far more useful to think of the bit vectors as sets of 1s. Imagine if clip A has a C note and clip B has a D note. You would think they’re completely different because they don’t share any common notes; it would be silly to say they’re mostly the same because they have a lot of common missing notes.

If you think about it, finding similar sets is a general problem: you have a universe of unique items (pixels in the spectrogram), you have many sparse sets of them (spectrograms), and you wish to reduce each set’s dimensionality while preserving pairwise similarity. It’s just like comparing people’s book lists, movie lists, or interests.

The Trick: MinHash

First, we need to define a metric of similarity between two sets. It seems like a natural definition would be: the size of the intersection divided by the size of the union. If both sets are equal, you get 1; if both sets share no common elements, you get 0. It turns out this is called the Jaccard Index.

Here’s the interesting part: say you have two n-bit vectors. If you randomly permute both vectors in the same way, and then find the index of the first 1 bit (this is called the MinHash), then the probability that the two indices are equal is exactly equal to their Jaccard Index. Let’s go through a quick example before explaining why this works.

A = 0101 0001
B = 0011 0001

Random permutation 1:
abcdefgh becomes -> daefbcgh (1st position -> 2nd position, 2nd -> 5th, etc)
A becomes -> 1000 1001, first 1: 1
B becomes -> 1000 0101, first 1: 1, equal to A's

Random permutation 2:
abcdefgh becomes -> bcadfehg
A becomes -> 1001 0010, first 1: 1
B becomes -> 0101 0010, first 1: 2, different from A's

In the above example, the Jaccard Index is 2/4 (intersection / union), which means we should expect the test to return “equal” 50% of the time. So, why is this true?

First, note that matching zeros in both vectors will not affect the result of the test. This is because no matter where they end up after permutation, they will not affect the test result. Therefore, we can safely drop them from the inputs without changing the result.

A drops matching 0s -> 1011
B drops matching 0s -> 0111

After dropping matching zeros, we are left with matching 1s or differing bits. It is now easy to see that the first 1’s index will be equal iff an index with matching 1s is selected as the first digit in the random permutation, hence the probability will be equal to the Jaccard Index.

In a universe of n things, any n-permutation will yield a MinHash function. Now, we just have to precompute a bunch of these functions, and apply them to each bit vector to get the fingerprints. Given a pair of fingerprints, we just have to count the number of matching MinHash outputs to get an estimate of the similarity. One cool observation is that more hashes only gives you more accurate similarity estimates, which means even when the universe becomes larger, or more sets are added, you still don’t really have to linearly scale up the total fingerprint size.

Given the fingerprints, we still have to figure out how to search efficiently, but that’s another complicated subject for another day.