Precisely Compare Ints and Floats

2021-12-21 2 minute read

Here’s a seemingly trivial task that I ran into recently - given a 64 bit int i and a 64 bit floating point f, how can we tell which one is larger?

Well, duh! In a language with implicit type casting, this is hardly even a question. Just do i < f and it magically works, right?

This is almost correct, but the issue is that by turning the int into a float, we are dropping precision for large integers. This is because only 52 bits of the float is used for representing the “mantissa”, i.e. the binary digits after the “1.”; in other words, we can only keep 53 binary significant figures in a float. So if you have an int 253+1, the closest float will be 253, so naively your code will think the values are equal, while one is actually numerically larger than the other. How hard is it to do this comparison exactly?

Let’s say we’re writing a compare function that returns a positive number when the int is larger, a negative number if the float is larger, and 0 if they’re equal. And for simplicity let’s assume we already have a compare function for ints and floats respectively. Here’s a seemingly clever way to do it.

function compare_int_float(int i, float f)
  f_cmp = compare_float(int_to_float(i), f)
  if f_cmp != 0
    return f_cmp
  else
    return compare_int(i, float_to_int(f))

There are a few observations here. I’ll just state them for now, and will explain them in comments in the final version of the pseudocode.

If the float comparison doesn’t say the numbers are equal, then we can trust the result.
If they are “float equal”, then f must be numerically an integer. Therefore, we can compare them as if they were ints.

But actually this code has a bug. Can you spot it?

The bug is that for certain inputs, this function can raise, specifically through calling float_to_int(f) on an out of bounds f. This happens when f = 2**63 and i rounds to f when converted to a float. Below is the final pseudocode:

function compare_int_float(int i, float f)
  f_cmp = compare_float(int_to_float(i), f)
  if f_cmp != 0
    (* If i rounds to a float less/greater than f,
       i must be less/greater than f, because otherwise
       f would be a float that is closer to i,
       and i should have rounded to f. *)
    return f_cmp
  else if f = 2**63
  then
    (* Large integers round up to 2**63, which is larger than
       max int, 2**63-1. We need to handle this case, otherwise
       float_to_int will raise. *)
    return -1
  else
    (* When i is converted to a float, its significant digits
       can be dropped. Regardless, it will still be an integer,
       so f (which is equal to i rounded) is also an integer.
       Therefore we can turn f into an int and compare exactly. *)
    return compare_int(i, float_to_int(f))