Font-based digit grouping

Struggling to read absurdly large numbers in terminal windows many years ago, I wondered how I might apply some logic in the terminal to subtly bunch together groups of three digits as a form of thousand separator. Eventually it occurred to me to try doing it in the font with ligature rules (initially as a joke, but then I looked into it) and it turned out Numderline was a thing which already did that.

But I didn’t get along with underlines for digit grouping. So I spent some time hacking around, tweaking it, and adding hexadecimal support (that is, things starting with 0x grouping by fours), and generally making a huge mess, and struggling with bugs¹ I couldn’t overcome.

So I started again.

Why in the font?

The pragmatic answer is because I can usually insert a font as a solution where the software doesn’t already do what I want.

The more ambitious answer is that it does better at abstracting the problem out to the presentation layer. The application doesn’t have to worry about implementation details of presenting thousand separators and trying to ensure that copy-paste operations aren’t impeded by the added characters. It just has to relay the right configuration to the font, and then render that font the way it asks to be rendered.

And I want to reiterate, the underlying text does not have to be modified. There’s no confusion between a string with this or that thousand separator and a string which is meant to be parsed by another tool. With one caveat about the decimal separator they’re the same thing. Even if the application bungles the locale settings, the underlying text is not going to get any worse, and is still bare-bones copy-pasteable.

Why not in the font?

It’s not designed for that?
It’s a solution using made-up codes which are not a part of any standard.
Basically no fonts support it so it’s rarely going to work, and even if you provide a known-working font the user should have the ability to override that font, and theirs probably won’t support it.
My feeling (untested) is that the complexity of the tables I’ve added to achieve this can’t be very efficient.
In a terminal emulator the cursor placement is a bit off; and the spacing of numbers are interfered with by spurious details like the edge of the screen, the position of the cursor, and their interactions with implementation details of each terminal.
In terminals and text entry boxes, the digits can dance around as you type, which can be disconcerting. Or as I’ve observed with Chrome, they don’t update properly while you’re typing except for sometimes when you don’t expect it, but even then only partially.

What I have now

I have a font patcher which modifies a font to group sets of three digits into thousands (threes digits only; sorry rest of the world), four hexadecimal digits into 16-bit words, and five fractional digits into whatever unit 10e-5 is. I don’t like that last one but it seems to be a convention.

But also, since I found that most (not all) terminals support it, and CSS also supports it, I put everything under the control of so-called “font features” to make a few things configurable without re-patching the font.

Like this:

dgsp dgap dgco dgdo dgdc dghx

font-feature-settings: "dgsp";

sans
1234.12	0x1337
12345.12	0x12deadbeef
1234567.12	code1234
123.1234567	1,000000

mono
1234.12	0x1337
12345.12	0x12deadbeef
1234567.12	code1234
123.1234567	1,000000

In monospaced mode grouping involves moving digits closer together so the group occupies the same space as before even after the addition of the digit grouping separator. In proportional mode the digit grouping makes the number a bit wider.

How it works

By default my new version uses GPOS rules to move the digits together for monospaced applications, rather than inventing new glyphs (mostly) like the old one did, and it uses FontForge’s rule generation directly because that lets me generate a reversesub rule without crashing (though that interface also has its own bugs).

But it turns out some terminals don’t work with GPOS, so there’s a separate mode to duplicate glyphs at different positions instead.

The process involves inserting a lot of table-based rules into the font; first to mark out all the parts of the text containing digits, then more rules to replace that markup with spacing glyphs at the proper intervals, then more rules to remove all the cruft, then more rules to change some of the spacing glyphs into different shapes. If it’s a monospaced font then more rules shuffle things around a bit to make numbers occupy the same space as if there were no thousand separators.

These rules are selectively enabled by different features, but the rules must always run in the same order. Within a lookup there are bail-out rules to enable exclusion of specific patterns (there’s no [^a-z]xyz, so you need a test beforehand which bails out if it matches [a-z]xyz), and these bail-outs don’t reach across lookups so you can advance to the next logical block that way. If one block wants to prevent another from running it can instead temporarily poison the text to prevent a subsequent rule from matching it, and then a block at the end to mop that up again.

For monospaced fonts, which are the ones I most care about, the glyphs are shifted sideways to make room for the space without the group taking up more space overall. There are a few possible ways to shift things around:

move digits around the desired gap outwards
move digits surrounded by gaps towards the middle
move digits towards the [implicit] decimal point

So far I have only implemented the last two. The last one is the one that moves glyphs the furthest, which can introduce clipping problems on terminals, but it’s the most reliable in terms of getting the expected spacing. Re-spacing around or between separators causes uneven gap sizes or uneven spacing within a group of digits, which might look a bit weird.

How to use the result

The common way of stuffing extra ligature rules into a font is to put them under the calt feature, which is on by default and you don’t have to think about how to enable it. That’s not what I did. Instead, in various applications in various places where you choose your font, you should also have the option of specifying a bunch of other features by FourCC or by common names. The Iosevka font, for example, offers extensive customisation in this space.

In CSS, for example, if using the patched font then spaces could be enabled with:

font-feature-settings: "dgsp";

Change that to dgco for comma separators, dgap for apostrophes, dgdo for dots.

I’ve also added dghx, to force the grouping of hex strings without any prefix as hexadecimal, and to group them appropriately. This might be useful when you have CSS markup on a column of data which is definitely hexadecimal. It’s not a thing you would want to enable by default in a terminal emulator.

The dgdc feature, to interpret comma as the decimal separator, is in a similar situation; in most programming languages I can think of trying to use a decimal comma would be problematic, but if you know the data you have is like that then you can tag it as such to get the intended spacing.

The way these are enabled in a terminal emulator varies, but there’s typically a way to do it in most of the terminals which support ligatures.

Doing it properly

Fonts live complicated lives dealing with a lot of different expectations from different languages. Digit grouping doesn’t seem to be an exception.

After reading some documentation my understanding is that even though fonts have the means to tie configuration items to different languages and scripts, the design intent of OpenType is that the application is still on the hook for signalling all the locale-specific behaviours to the font, so that the font can then deliver the right tables to the shaper.

If one were to make a concerted effort on this there would be a bunch more rules for all the different spacing rules around the world, and feature names to enable each of them.

As far as I can tell, the process would involve reserving a bunch of different “features” identifying different conventions, for which I can (and already did) just go ahead and make up my own. If a serious effort was worthwhile then things would need to be more structured and complete than what I’ve cobbled together, though.

But I’ve entirely abandoned any notion of handling various conventions. It’s all much too complex and while I do not enjoy telling anybody to conform to a monoculture I really don’t have the energy for much beyond one international standard. Although I’m not sure what there is in the way of international standards around fraction digit grouping. I just followed Wikipedia but I would prefer sixes or threes myself, to continue the pattern of aligning with SI prefixes (in fact, looking at the relevant wikipedia page, they break from their own convention and use threes on that table!).

So that’s where I stopped. Because I have what I want and I’m not sure anybody else in the world cares that much.

Things I guess could be done:

Apply similar logic to different scripts which have their own digits and have their own group separators.
Include features to specify different grouping choices for whole numbers, whole hexadecimal numbers, and fractional numbers.
Include features that describe independent separator glyphs for whole and fractional parts of numbers.

see for example the original Numderline mention of such). Somebody helpfully posted a comment on my own bug on that matter, but I don’t need to worry about that anymore. ↩