$\texttt{NaN}$? Nah.

This post first appeared 17 March 2023.

Let’s talk about floats.

Floating point numbers are complicated. They are meant to represent “any number” by which we probably mean any real number. However this is of course impossible, since there are only $2^{64}$ possible values that can be stored in a 64-bit float, and the number of reals is uncountably infinite.

To solve this, we represent a subset of rational numbers and declare the job done. For good measure, we throw in $\pm \infty$ and the concept of “not-a-number” or NaN. This lets us shrug off computations like sqrt(-3) as “not-a-number” and 1 / 0 as $+\infty$.

The exact format of floats and the numbers they can represent is unfortunately outside the scope of this post, but these two extra concepts introduce a whole slew of problems. I’ll mention just one, and then take a look at how q “solves” it.

Partial and Total Orders

If we want to compare things in a totally ordered set like the reals, we have exactly two options: $$ \text{either}\quad x \le y \quad\text{or}\quad x \ge y. $$ If both hold then $x=y$. However, that can be a bit cumbersome and so when we implement this in code, we might consider equality a special case:

fn compare(x, y) -> Ordering {...};

enum Ordering {
    Less,
    Equal
    Greater,
}

Don’t take my word for it: that’s straight from the rust standard library.

However, if we only have a partial order there is a third option: $$ \text{possibly}\quad x \le y \quad\text{or}\quad x \ge y. $$ Again, if both hold then $x=y$, but we also have the possibility that neither hold. Note that all partial orderings are total orderings (but not the other way around).

This is what happens with floats, for reasons I’ll mention in a moment. When we code this up, we get something like¹:

fn p_compare(x, y) -> PartialOrdering {...};

enum PartialOrdering {
    Less,
    Equal
    Greater,
    Uncomparable,
}

However, this suffers one fatal flaw and that is that what we really want is to be able to write var_1 < var_2 and get out true or false. For this reason, most sensible languages define operators <, >, ≤, ≥ which take in two floats and return a boolean². The expression x < y turns into p_compare(x, y) == Less and similarly for the others³.

The reason this is important to us is NaN. How can we compare floats when one of them might be not a number.⁴

The solution, as you may have guessed is to declare that the floats are not totally ordered, but partially ordered⁵. In particular, we say that NaN is not smaller than any number, nor is it bigger than any number, nor is it equal to any number. Here is a table taken verbatim from the relevant part of the Wikipedia entry. Note that x is any float, including NaN itself.

Test	`NaN ≥ x`	`NaN ≤ x`	`NaN > x`	`NaN < x`	`NaN = x`	`NaN ≠ x`
Result	`False`	`False`	`False`	`False`	`False`	`True`

The non-finite numbers also behave like this, so as a corollary inf < x is false for all x, though inf >= x is only true if x is not NaN.

This means that in most languages to check if a float is NaN we could check if it was bounded above by infinity, but most often we write

>>> def is_nan(x):
...     return x != x
>>> is_nan( float('nan') )
True

as NaN is the only float that is not equal to itself.

Who Says?

I’ve been making a lot of grand statements about NaN, claiming that “most languages” do this, and that “this is how it is done”. What weight do I have in my assertions?

Well in 1985 the IEEE⁶ decided that too many people were using too many different standards for digital representations of numbers, and set out IEEE 754. This document carefully detailed layouts (in bits) and standard operations for floating point numbers.

It also detailed how those operations should behave with the “interesting” values such as NaN (which is actually not one, but many different numbers!) and infinities.

For example 1 / 0 = ∞ and 1 / -0 = -∞.

Yes, there are two zeros.

Yes, that doesn’t make sense mathematically.

It also has more sensible requirements like ∞ + 3.14 = ∞, and 3.14 + NaN = NaN and, most importantly, specifies the table of comparisons given above. Finally it specifies that in almost all cases, NaNs propagate which is to say if a numerical operation involves a NaN then the output should be a NaN.

In this way, errors in calculations don’t result in hardware faults, but will not be silently swallowed.

Now, not all languages follow the rules exactly and not all FPUs do exactly what the standard recommends, which is one of the reasons why the same calculation made on different computers might have different outcomes.

However, almost all languages expose floats to the user in the same way, even if they don’t represent or calculate with them identically.

Enter q

In q, NaN is denoted by 0n and infinity by 0w⁷ and the language has some… idiosyncrasies. Let’s take them one by one.

Addition

We start small:

q) 0n + 1
0n              / Yay!

q) 0n + 1 + 2
0n              / Yay!

q) sum 0n 1 2
3               / Heh?

This one at least has an explanation. The fact that 0n = 0n + 1⁸ is what we’d expect⁹. The keyword sum, like its cousins avg, min and max, aggregate nulls.

And here we reach an important philosophical point. In q, NaN means “missing data”. So of course if you sum over a bunch of data including some missing data we get the sum of all the data that is there. This is something that other database-oriented languages agree with:

SQL> WITH test_data (a, b) as (
...    SELECT * FROM (VALUES
...           ('alpha', 1),
...           ('bravo', 2),
...           ('charl', NULL),
...           ('delta', 3)
...         ) t
...  )
...  SELECT
...    SUM(b) AS b_total
...  FROM test_data

b_total
-------
6

Now one may say that we should really write sum 0^x if we want to ignore nulls (here 0^ means “fill nulls with 0), but this gets tricky with things like avg where we then have to divide by the number of non-zero entries. Without null aggregation we’d have to write something like

avg: { sum[0^x] % count where not null x}

All-in-all, I think this is OK, as long as we remember that sum is slightly different to +.

Of course it is not OK at all that sum doesn’t aggregate over nested lists¹⁰:

q) sum (1f;2f)
3f
q) sum (0n;4f)
4f
q) sum ((1f;2f);(0n;4f))
0n 6

Moving on.

Comparisons

Things start to go a bit off the rails when we compare floats in q. The infinities work almost as expected, as this line from q for Mortals¹¹ describes.

The float infinities perform exactly as they should in arithmetic and comparison operations, since they are required to do so by the IEEE spec.

However, the NaNs do not conform to spec at all (recall that 0b and 1b are false and true respectively):

q) 0n < 0n
0b         / Yay!

q) 0n = 0n
1b         / Uhhh

q) 0n < 0w
1b         / Yay!  Wait...

q) 0n < -0w
1b         / Oh dear

q) 0n > -0w
0b         / Well at least its consistent

q) 0n < 1
1b         / ¯\_(ツ)_/¯

It seems that, to q, NaN is a single value (equal to itself) and it is smaller than every other float.

Two Gotchas

The problem with this is that it introduces gotchas. The first is just by virtue of q behaving differently to other languages. This makes it particularly difficult when working on projects using multiple languages and passing data between them.

The other gotcha is arithmetic. Suppose we have a list of data and we want to clamp it between 0 and 1. A natural piece of q code might be¹²

0|1&list.

However, what if some of the data are missing? We’d represent this by NaNs, and the invocation above replaces them with 0, since the NaN are replaced by the “bigger” 0. More confusingly, they aren’t replaced by 1, because they are “smaller” than 1.

Here’s another example that makes this more explicit:

make_non_negative: {0&x}   / Works as expected
make_non_positive: {0|x}   / Replaces NaNs with 0!

My Take on Why This Happens

The first thing to note is that q came out after IEEE 754. In fact, k, the language on which it is based also came out after IEEE. It’s predecessor A+ did not. Perhaps this is a strange quirk which carries over from there?

However, I think this quirk in float handling is more subtle, and has to do with how q represents integers.

See, q took a look at the values ±∞ and NaN and thought “shouldn’t there be integer versions of these? And I absolutely agree. We could write this as¹³

enum ThickInt {
  Finite(i64),
  Inf,
  NegInf,
  NaN,
}

The thing about float NaNs though is that they take up exactly the same space as a normal float whereas our struct above takes up at least 65 bits. We could fix this by using 63 bits for the finite numbers and the final bit for signalling, but then we either have to do odd bit packing or sacrifice $2^{62}$ numbers.

Q solves this in what I think is the correct way by defining constants (note the use of upper case to distinguish from floats)

 0W :  9223372036854775807
-0W : -9223372036854775807
 0N : -9223372036854775808

Note that 0W is the largest value expressible in a 2s-compliment 64-bit integer and 0N is the smallest.

This is great and even behaves exactly how you’d expect. We can use ^ to fill in default values for nulls and -0W ≤ x ≤ 0W for all finite x. It even respects the addition rules.

q) 0W > 17
1b          / Yay!
q) 0N + 45
0N          / Yay!
q) sum 0N 1 2
3           / We are used to this by now
q) -0W - 1
0N          / Um... ok...
q) 0W + 1
0N          / You know what, that's consistent. Also adding finite
            / things to infinity maybe _should_ be NaN?
q) 0W + 2
-0W         / Oh no...

Yes, that’s our good old friend integer overflow¹⁴. Lets leave that can of worms and turn to the other surprising fact: 0N ≤ x for all integers x.

And this I think is the point. In an attempt to introduce infinities and NaNs to integers (laudable), q “has” to have 0N as the minimal integer (an acceptable compromise)¹⁵. However, then in order to be consistent (a good idea), q declares float NaN to be smaller than all other floats (IMHO a terrible idea).

Postscript

I have to admit, even after having rewritten this post from scratch a couple of times, checked all the claims in the code blocks and racked my brain, I’m still left with the troubling feeling I’m missing something.

I’ve left out a bunch of stuff¹⁶ in order to make this a readable post, but if I’ve missed the obvious reason why q does things this way, please drop me an email!

This isn’t how rust does it though. Rust uses an Option<Ordering> which boils down to the same thing. ↩︎
Yes, I know everyone writes <= not ≤, but that’s because ASCII got there first. We have unicode now. We should use it. ↩︎
Note that even if we have a total order, we use the partial comparison function to implement <, ≤, > and ≥. This is because partial orders are strictly more general. ↩︎
Note the infinities don’t break the total ordering of the floats: it’s only the NaNs that are a problem. ↩︎
This is a source of much confusion to folks starting out in rust, since f64 only implements PartialOrd and not Ord. There’s an explanation here. The annoying thing is that this means that lists of floats can’t be sorted with my_list.sort(), but have to have a custom comparator. ↩︎
Institute for Electronic and Electrical Engineers ↩︎
Because w looks like ∞. Seriously. ↩︎
Written in Yoda form because q is evaluated RTL so 0n = 0n + 1 is the boolean 1b while 1 + 0n = 0n is the integer 2. We’ll get to that in a moment. ↩︎
And indeed lets us do cool things. ↩︎
To the q aficionados wincing at my use of brackets and ;, just remember there are people who aren’t as familiar with the language reading this… ↩︎
This is the standard introductory text to the language. As introductory texts go, its surprisingly readable and detailed. However, the fact that we mere mortals must learn from the “q Gods” as they are known should tell you all you need to know about the community. ↩︎
Here & is ‘max’ and | is ‘min’. Remember to read from right to left! ↩︎
Note I’m getting rid of the +0 and -0 distinction because… well because. ↩︎
Can we just appreciate how the wiki page for integer overflow uses a picture of an odometer ticking from 999999 to 000000 to illustrate its point. /me chef’s kiss ↩︎
The alternative is to have separate logic that checks for infinities. The argument then might be that this makes expressions like 0 < very_long_list slow to compute, but elementary operations are already slow if 1 + very_long_list propagates integer NaN¹⁷. ↩︎
The fact that other language specs don’t explicitly require IEEE 754 to be followed (q isn’t a spec), the odd way q handles integer NaN so neatly, the funny business about 0n = 0n, etc. ↩︎
Is this a good time to mention that long_list_1 - long_list_2 is almost exactly twice as slow as long_list_1 + long_list_2 because it is evaluated as long_list_1 + neg[log_list_2]? ↩︎