Chipaca’s Miscellanea

en español

U+1F929 GRINNING FACE WITH STAR EYES from Google’s Noto Emoji font, via ImageMagick and xbmbraille, as rendered by the kitty terminal.

XBM❤braille

There’s not always a good reason to play with stuff, but stuff sometimes begs to be played with. In this case, Unicode’s braille block gives you a (chonky) monochrome pixel-addressable screen on the terminal. And XBM gives you a very straightforward monochrome bitmap representation, so you can play with both things with minimal friction.

Braille in Unicode

There are Braille patterns in Unicode, from (U+2800 BRAILLE PATTERN BLANK) through (U+28FF BRAILLE PATTERN DOTS-12345678). Outside of -BLANK, each character is named according to which dots are present, and the dots are numbered from 1 to 8, in the obvious way:

14
25
36
78

the name is always sorted (so it’s DOTS-167, not DOTS-176 even though 7 is in the first column).

These beg to be used as the building blocks for a canvas, or at least a pixel-addressable screen. All you have to do is get a monochrome bitmap, and map the pixels onto the right characters.

XBM

You probably have some XBM files on your Linux desktop.
They’re text.
They look suspiciously like C code.
If you’re writing C you probably #include them directly, embedding them like a boss.

: locate .xbm | tail -n1
/usr/share/themes/Syscrash/openbox-3/max_toggled.xbm
: !! | xargs cat
locate .xbm | tail -n1 | xargs cat
#define max_toggled_width 6
#define max_toggled_height 6
static unsigned char max_toggled_bits[] = {
   0x3c, 0x27, 0x25, 0x3d, 0x11, 0x1f };

There are some slight variations (e.g. some versions don’t say unsigned), and some of them are designed for use as pointers so they have a hotspot, but you get the idea. If you’ve ever looked into XPM files, those are even neater to hack around with because in those you can see the image directly (if you squint a little), but that would be wasteful for our usecase (XPMs are 8-bit paletted), so we’re sticking with XBMs.

From that wikipedia article,

XBM image data consists of a line of pixel values stored in a static array. Because a single bit represents each pixel (0 for white or 1 for black), each byte in the array contains the information for eight pixels, with the upper left pixel in the bitmap represented by the low bit of the first byte in the array. If the image width does not match a multiple of 8, the extra bits in the last byte of each row are ignored.

That “0 for white or 1 for black” is backwards to what you might expect, but in practice it’ll depend on whether your terminal is dark-on-light or vice versa.

Pixel streams

So let’s assume we have a stream of bytes that correspond to the bitmapped pixels, horizontally. We’ll need to look at four lines at a time, and for each quadruple of interleaved bytes we consume we should be producing four Unicode characters (assuming the image’s width is a multiple of 8). The first two bits of each byte quadruple determine the first Unicode character, the second two bits the second character, and so on. We’re obviously going to need a little table to look this up, but if the four bytes coming in are a, b, c, and d, then the table lookup will be something like I particularly appreciate how the ability to write binary literals makes it easier to understand what’s going on.

table[(a&0b00000011)<<0+(b&0b00000011)<<2+(c&0b00000011)<<4+(d&0b00000011)<<6],
table[(a&0b00001100)>>2+(b&0b00001100)<<0+(c&0b00001100)<<2+(d&0b00001100)<<4],
table[(a&0b00110000)>>4+(b&0b00110000)>>2+(c&0b00110000)<<0+(d&0b00110000)<<2],
table[(a&0b11000000)>>6+(b&0b11000000)>>4+(c&0b11000000)>>2+(d&0b11000000)<<0]

Building the table is straightforward if we can look characters up by name. That’s easily done in Python See go#32937 about doing this in Go. :

from unicodedata import lookup

table = []
for i in range(255):
    # REMEMBER 0 is 'on'
    dots = []
    if (i & 0b00000001) == 0:
        dots.append('1')
    if (i & 0b00000010) == 0:
        dots.append('4')
    if (i & 0b00000100) == 0:
        dots.append('2')
    if (i & 0b00001000) == 0:
        dots.append('5')
    if (i & 0b00010000) == 0:
        dots.append('3')
    if (i & 0b00100000) == 0:
        dots.append('6')
    if (i & 0b01000000) == 0:
        dots.append('7')
    if (i & 0b10000000) == 0:
        dots.append('8')
    dots.sort()
    table.append(lookup("BRAILLE PATTERN DOTS-" + "".join(dots)))
table.append(lookup("BRAILLE PATTERN BLANK"))

and you can print the table out, align it properly and then visually inspect it for correctness,

'⣿', '⣾', '⣷', '⣶', '⣽', '⣼', '⣵', '⣴', '⣯', '⣮', '⣧', '⣦', '⣭', '⣬', '⣥', '⣤',
'⣻', '⣺', '⣳', '⣲', '⣹', '⣸', '⣱', '⣰', '⣫', '⣪', '⣣', '⣢', '⣩', '⣨', '⣡', '⣠',
'⣟', '⣞', '⣗', '⣖', '⣝', '⣜', '⣕', '⣔', '⣏', '⣎', '⣇', '⣆', '⣍', '⣌', '⣅', '⣄',
'⣛', '⣚', '⣓', '⣒', '⣙', '⣘', '⣑', '⣐', '⣋', '⣊', '⣃', '⣂', '⣉', '⣈', '⣁', '⣀',
'⢿', '⢾', '⢷', '⢶', '⢽', '⢼', '⢵', '⢴', '⢯', '⢮', '⢧', '⢦', '⢭', '⢬', '⢥', '⢤',
'⢻', '⢺', '⢳', '⢲', '⢹', '⢸', '⢱', '⢰', '⢫', '⢪', '⢣', '⢢', '⢩', '⢨', '⢡', '⢠',
'⢟', '⢞', '⢗', '⢖', '⢝', '⢜', '⢕', '⢔', '⢏', '⢎', '⢇', '⢆', '⢍', '⢌', '⢅', '⢄',
'⢛', '⢚', '⢓', '⢒', '⢙', '⢘', '⢑', '⢐', '⢋', '⢊', '⢃', '⢂', '⢉', '⢈', '⢁', '⢀',
'⡿', '⡾', '⡷', '⡶', '⡽', '⡼', '⡵', '⡴', '⡯', '⡮', '⡧', '⡦', '⡭', '⡬', '⡥', '⡤',
'⡻', '⡺', '⡳', '⡲', '⡹', '⡸', '⡱', '⡰', '⡫', '⡪', '⡣', '⡢', '⡩', '⡨', '⡡', '⡠',
'⡟', '⡞', '⡗', '⡖', '⡝', '⡜', '⡕', '⡔', '⡏', '⡎', '⡇', '⡆', '⡍', '⡌', '⡅', '⡄',
'⡛', '⡚', '⡓', '⡒', '⡙', '⡘', '⡑', '⡐', '⡋', '⡊', '⡃', '⡂', '⡉', '⡈', '⡁', '⡀',
'⠿', '⠾', '⠷', '⠶', '⠽', '⠼', '⠵', '⠴', '⠯', '⠮', '⠧', '⠦', '⠭', '⠬', '⠥', '⠤',
'⠻', '⠺', '⠳', '⠲', '⠹', '⠸', '⠱', '⠰', '⠫', '⠪', '⠣', '⠢', '⠩', '⠨', '⠡', '⠠',
'⠟', '⠞', '⠗', '⠖', '⠝', '⠜', '⠕', '⠔', '⠏', '⠎', '⠇', '⠆', '⠍', '⠌', '⠅', '⠄',
'⠛', '⠚', '⠓', '⠒', '⠙', '⠘', '⠑', '⠐', '⠋', '⠊', '⠃', '⠂', '⠉', '⠈', '⠁', '⠀'

looking good so far.

In Go, this could be a []rune; if we store it as a string constant we wouldn’t be able to index it directly because each rune would be stored as three bytes—but we could do a bit of maths and save a bit of space. We’ll have to measure that at some point. Might as well do it now. Compare

const brailidx = "<... a string made by subtracting 10240 from each of those runes above ...>"

func bit2brail(b uint8) rune {
	return 10240 + rune(conv_be[b])
}

with

var brailidx = []rune{ /* ... those runes ... */ }

func bit2brail(b uint8) rune {
	return brailidx[b]
}

the first one is ~40% faster:

BenchmarkStringAndMath-12    	4848230	      246.2 ns/op
BenchmarkRuneSlice-12        	3539121	      347.3 ns/op

Looking at the assembly output go build -gcflags -S . you can notice some wasted cycles checking bounds of the rune slice, which can be avoided by changing it to an array,

var brailidx = [...]rune{ /* ... those runes ... */ }

which gets them within 20%:

BenchmarkStringAndMath-12    	4914315	      239.4 ns/op
BenchmarkRuneSlice-12        	4165580	      289.2 ns/op

I like that in the string version the data is immutable, but I don’t like that it’s a lot more unreadable. Can we get the best of both worlds?

const brailidx = `
⣿ ⣾ ⣷ ⣶ ⣽ ⣼ ⣵ ⣴ ⣯ ⣮ ⣧ ⣦ ⣭ ⣬ ⣥ ⣤ ⣻ ⣺ ⣳ ⣲ ⣹ ⣸ ⣱ ⣰ ⣫ ⣪ ⣣ ⣢ ⣩ ⣨ ⣡ ⣠
... etc ...`

func bit2brail(b uint8) rune {
	return []rune(brailidx[4*int(b)+1 : 4*(int(b)+1)])[0]
}

this does unfortunately pay the price of that []rune conversion and those pesky bound checks are back,

BenchmarkRuneConst-12        	 546738	     2148 ns/op

and it’s actually faster to do it explicitly,

func bit2brail(b uint8) rune {
	n := 4*int(b) + 1
	if n < 0 || n > len(brailidxr) {
		return '⠀'
	}
	r, _ := utf8.DecodeRuneInString(brailidxr[n:])
	return r
}

but even then it’s the worst option:

BenchmarkStringAndMath-12    	4997421	      245.6 ns/op
BenchmarkRuneSlice-12        	4116244	      290.7 ns/op
BenchmarkRuneConst-12        	1530085	      791.0 ns/op

Regular expressions

Now we just need a way to load an XBM file into something useful. The quickest way for me is to write what we know into a regular expression.

We’ll probably want to come back and revisit it once everything is working and make an actual parser that normal people can understand, but for now just

(?sm)^#define\s+\S+_width\s+(\d+)\s*$
#define\s+\S+_height\s+(\d+)\s*$
(?:#define\s+\S+_x_hot\s+\d+$
#define\s+\S+_y_hot\s+\d+$
)?static\s+(?:unsigned\s+)?char\s+(\S+)_bits\s*\[\]\s*=\s*{\s*$
((?:\s*0x[[:xdigit:]]{2}\s*,)*\s*0x[[:xdigit:]]{1,2})\s*,?\s*\}

should work to load the whole file, and a simple 0x[[:xdigit:]]{1,2} to then pull out the hex digits. This will fail for older XBMs (“X10” format) where the data is in 16-bit numbers instead of 8-bit shown here, but I haven’t found any of these “in the wild” outside of explicitly asking The Gimp to create one so I’m not worried at this point.

Future work

Using ImageMagick it’s easy to procedurally create XBMs with individual pixels turned on:

: convert -size 4x4 xc:black -fill white -draw "point 1,2" XBM:
#define _width 4
#define _height 4
static char _bits[] = {
  0x0F, 0x0F, 0x0D, 0x0F, };

and so you can create a little mapping to test that each bit in an XBM is picked up properly, and another to test that it is output as the right braille character. This’ll let you dig into the weird and wonderful corner cases, e.g. what happens with a 3×3 or a 7×7 XBM.

Also from here you can look into fuzz testing the XBM reader, prior to moving away from the regexp.

Or, I dunno, just have fun.

: go install chipaca.com/xbmbraille@latest
: convert +dither -font Noto-Emoji -pointsize 64 label:🤩 -trim XBM:- | xbmbraille -n -
⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⣀⣤⣴⣶⠶⠶⠶⠶⢶⣶⣤⣤⣀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⢀⣴⡾⠟⠋⠉⠀⠀⠀⠀⠀⠀⠀⠀⠀⠉⠛⠿⣶⣄⡀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⣠⣾⠟⣿⣄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣴⡟⢿⣦⡀⠀⠀⠀⠀
⠀⠀⢀⣼⠟⠁⢠⣿⣿⣦⢀⣀⣀⣀⠀⠀⠀⠀⢀⣀⣀⣀⢀⣾⣿⣿⠀⠙⢿⣆⠀⠀⠀
⠀⢀⣾⣯⣤⣴⣾⣿⣿⣿⣿⣿⣿⠟⠀⠀⠀⠀⠈⢿⣿⣿⣿⣿⣿⣿⣶⣦⣤⣿⣧⠀⠀
⠀⣾⠏⠙⠻⣿⣿⣿⣿⣿⣿⣿⠁⠀⠀⠀⠀⠀⠀⠀⢹⣿⣿⣿⣿⣿⣿⡿⠟⠉⢻⣇⠀
⢸⡟⠀⠀⠀⠀⣿⣿⣿⣿⣿⣿⣇⠀⠀⠀⠀⠀⠀⢀⣾⣿⣿⣿⣿⣿⡏⠀⠀⠀⠈⣿⡀
⣿⡇⠀⠀⠀⠀⣿⠟⠋⠀⠈⠙⠛⠀⠀⠀⠀⠀⠀⠘⠛⠉⠁⠀⠙⢿⡇⠀⠀⠀⠀⢿⡇
⣿⡇⠀⠀⠀⠀⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣸⡇
⢻⡇⠀⠀⠀⢀⣤⣤⣀⣀⣀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⣀⣀⣠⣤⣄⠀⠀⠀⠀⣿⡇
⠸⣿⠀⠀⠀⠘⣧⣄⣈⣉⡉⠉⠛⠛⠛⠛⠛⠛⠛⠛⠋⠉⢉⣉⣀⣠⡿⠀⠀⠀⢰⡿⠀
⠀⢹⣧⠀⠀⠀⠙⢿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⡿⠁⠀⠀⢠⣿⠃⠀
⠀⠀⠹⣷⡀⠀⠀⠀⠙⠿⣿⣿⡿⠟⠛⠋⠉⠛⠛⠿⣿⣿⡿⠟⠉⠀⠀⠀⣠⡿⠃⠀⠀
⠀⠀⠀⠘⢿⣦⡀⠀⠀⠀⠀⠉⠛⠓⠶⠶⠶⠶⠖⠛⠋⠁⠀⠀⠀⠀⢀⣼⠟⠁⠀⠀⠀
⠀⠀⠀⠀⠀⠙⠿⣦⣄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⣠⣾⠟⠁⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠈⠙⠻⢶⣦⣤⣄⣀⣀⣀⣀⣀⣠⣤⣶⠾⠟⠋⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠉⠉⠉⠛⠛⠛⠉⠉⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀