Debugging your space probe

Years ago we were building an embedded vehicle tracker for commercial vehicles. The hardware used an ARM7 CPU, GPS and GPRS modem, running uClinux.

We ran into a tough bug in the initial application startup process. The program that read from the GPS and sent location updates to the network was failing. When it did the console stopped working, so we could not see what was going on. Writing to a log file gave the same results.

This is unfortunately common for embedded systems. For normal programmers, if your machine won't boot up, you are having a bad day. For embedded developers, that's just a normal Tuesday, and your only debugging option may be staring at the code and thinking hard.

This board had no Ethernet and only three serial ports, one for the console, one one hard wired for the GPS and one for the cellular modem. The ROM was almost full (it had a whopping 2 MB of flash, 1 MB for the Linux kernel, 750 KB for apps and 250 KB for storage). The lack of MMU meant no shared libraries, so every binary was statically linked and huge. We couldn't install much else to help us.

A colleague came up with the idea of running gdb (the text mode debugger) over the cellular network. It took multiple tries due to packet loss and high latency, but suddenly we got a stack backtrace. It turned out printf() was failing when it tried to print the latitude and longitude from the GPS, a floating point number.

One rule for normal programming is that if you think there is a compiler bug, you are wrong -- it's a bug in your code. In this case, a few hours of debugging and Googling five year old mailing list posts turned up a patch to gcc (never applied), which fixed a bug on the ARM7 which affected uclibc.

This made me think of how the folks who make the space probes debug their problems. If you can't be an astronaut, at least you can be a programmer, right? :-)