Wednesday, January 21, 2015

QR codes on the BBC Micro, Part 4 of 6: Prototyping in C

In which I wrap my head around algebra and write code for myself, not the machine.

The preliminary feasibility study turned out positive, the hardware was sufficiently cooperating, so I was cautiously optimistic that this thing was possible. Time to learn what QR codes are all about.

The QR code standard is behind a paywall at ISO, but it's trivial to find online elsewhere, and before long I had a 124-page document on my screen. I began to read. For an introduction that's easier to digest, I recommend this guide at Thonky.com.

So, there are various sizes of QR code. I already knew that. The standard talked about various alphabets, including numeric, alphanumeric and Kanji, but fortunately, raw bytes are also among the options. Then it describes in detail where all the standard parts go: the big corner squares, the smaller inset squares, and the barely noticeable timing patterns along the top and left. So far so good, I can code that.

Then it starts getting complicated. The stream of codeword bits has to be snaked through the matrix and in between the other parts in a very particular way: left-up-right-up until you hit the top, then left-down-right-down until you get back to the bottom. Sometimes not all these modules are part of the timing or alignment patterns, and you need to skip them. At the leftmost timing pattern, we skip the entire column. A lot of fiddly edge cases, but doable.

As a final step, we have to apply a XOR mask to all data bits, to make sure no patterns appear that look like the timing and alignment patterns and might throw off the QR code reader. The standard specifies eight different masks, and a QR encoder is supposed to try them all. Then it should choose the one that minimizes some penalty computed from patterns that appear in the output. I decided to just go with a single type of mask first, the one that was easiest to compute, and hope for the best.

But the really hard part is actually generating the codeword stream. The data is broken up into blocks, and for each block we have to compute a set of error correction codewords. In the case of the 40-L QR code, there are 19 blocks of 118 codewords, and 6 blocks of 119 codewords each; for each of these blocks, we have to compute 30 error correction codewords. The error correction codewords are simply the coefficients of the remainder polynomial when we divide the data polynomial by a particular 30th-degree generator polynomial with coefficients in the GF(256) Galois field. Wait, what? The words reminded me of a class on error correction codes I took in university, but I had all but forgotten the details. How do you do polynomial long division? What is a generator polynomial? What is a Galois field?

I won't explain it all in detail here, but again, Thonky.com has you covered. To summarize: polynomial long division is just like regular long division; if you divide by a polynomial of degree n, you get a remainder polynomial of degree n ­– 1, at most. The generator polynomial can be calculated somehow, but I didn't need to know, because it was listed in an appendix. This particular Galois field is a field with 256 elements, and if we represent those by 8-bit numbers, addition can be done by the XOR (exclusive or) operation. For multiplication, we have a log and antilog table, which let us do multiplication as addition, preceded and followed by table lookups. Nice!

So, time for some code. I decided that this stuff was complicated enough that I wouldn't be able to learn it, re-learn 6502 assembly, wrestle with memory limits, and debug the code at the same time, so I wanted to build a prototype on the PC in C99 instead. It was a simple matter of programming (ahem) that I spent some free evenings and weekends on. I wrote it very carefully, in the same way I'd have to do on the Beeb, checking output manually at every step. This would have been perfect for TDD (test-driven development), but I wouldn't have that luxury on the Beeb due to memory constraints, so I refrained from adding too many tests.

A few weeks later I had about 350 lines of C code that took on its standard input the ASCII text of the book I'm currently reading, and produced a PNG file that looked like this:

Output of the C program
The code is up on GitHub if you want to take a look. You can see some repeating patterns; this is because it's ASCII text, where not all bits are equally well-trodden. But now again that big question… will it scan? Let's see!

ZXing Barcode Scanner recognizing the above QR code
Oooh yeah! Hole in one! Victory!

So, how big is this code? Remember that the BBC Micro has only 32 kB of RAM, about 16 of which we can actually use in the screen mode we're targeting. I estimated that I'd need about 5 kB of scratch memory to generate the QR code. The C source weighs in at 11.5 kB, almost exactly the space we'd have left on the Beeb, but of course that doesn't mean much. The binary program on the Beeb might be larger because of the limited instruction set and lack of standard library, or smaller because it doesn't contain semicolons and whitespace.

A better comparison would be the compiled C code; the .o file is only 8.7 kB. But it would also contain headers and such, a symbol table, maybe even debugging symbols; how much executable code is actually in there? We can find out using objdump -d, which gives a nice disassembly of the executable code, next to the raw bytes so I could count them with a simple shell pipeline:

$ objdump -d qr.o|egrep '^ *[a-f0-9]*:'|cut -c11-32|wc -w
4762

So that's 4.7 kB of x86-64 code, with its weird variable-length instruction set. On average, these instructions were 3.7 bytes long, whereas the 6502 instructions on the Beeb are 1-3 bytes each (instruction plus up to 16 bits of operand), so on average probably about half the size. That was the good news. The other good news was that gcc probably didn't produce the smallest code possible. The bad news was that an x86 processor can do 32-bit arithmetic, whereas on the Beeb you have to hand-roll everything that's more complicated than 8-bit addition. Even 16-bit addition and multiplication are not supported by the hardware. So the comparison could go either way.

So why not cross-compile this C code to target the 6502 processor, and type it into the Beeb? Maybe I could have. But I felt it would be more trouble than it's worth; one typo and the program is ruined, but if I hadn't written the assembly myself, there'd be no way I'd be able to debug it. Also, I wasn't sure that a compiler targeting such an old CPU would be any good; the only one I tried didn't even get past the parsing phase, and choked on my C99-style variable declarations. I also had some experience with writing tiny code and felt up to the challenge.

No comments: