2020-12-29

Testing software -- a myth?

#hardware

#software

The other day I had a long conversation with B, a very low level software person at his day job. And the conversation reminded me strongly about a few episodes of my day job. So I put forward this bold statement:

You cannot test just software!

Not me, not you, not anyone else.

But why? I hear you ask. You (the software developer) can write unit tests and integration tests and put all the fabulous build and test infrastructure to good use, can you not? Yes, of course, you can. And at day job they actually do. Which is good!

But grumpy old me leans back with a sarcastic grin saying: That is not going to help. Software cannot live in a vacuum, let alone execute.

In my role as a "systems integrator" I am often the one, who puts a load of compiled software together onto a brand new circuit board:

All of this is neccessary, but not sufficient. It also takes electricity, cables, connectors, funny peripheral devices and the like. And if I did everything ok and if the karma is right and if the godesses of computing are in a benevolent mood, then the whole load might burst to life, albeit computing life, that is.

Then the other half of my work starts: making sure that all the peripherals are actually available and talking to my kernel/application. This part is never seen by my beloved project managers :)

Any evidence? I hear you ask. Yepp:

I2C

So there is an i2c-bus on the board. Yes, yes! And there is a separate controller on that bus. Can I /see/ it? With something like i2c-detect? Nope. But its there! Promised by the hardware guy. Yes, yes, it is there. However: first I have to "output enable" the i2c level shifter. How nice, that I can actually read schematics. And after that I have to talk to a i2c multiplexer. I have done this before. But why is it playing dead? Turns out, that /someone/ decided to use another multiplexer, with an external reset. So far ok, I need to assert and release the reset line. But it is still misbehaving? Turns out that the new multiplexer features a completely different addressing scheme. Ok, reading that datasheet, changing my code a little ... victory!!!

So back to that extra controller. I can address it, it will assert the ACK, and after that the i2c-bus blocks. Completely. Only power-cycling the whole board will release the lock. Turns out that this controller is just a whee bit picky about being talked to. My desk neighbour found a magical incantation of a few bytes which make the controller answer with the version number of its firmware. Except the returned number looked crook. Turns out, of course, that the controller had not been programmed m(

So, the better part of a week later, all of this stuff works. On to new adventures:

UART

The board has a serial console. And during the first half of this job I have managed to make it alive. The board has two connectors to electrically reach this serial interface: the one that I used all the time, and the other, which is the only available one after the board has been properly mounted in its designated housing. Needless to say, that the proper cable/connector had to be hunted down, that it had a broken solder joint on the improvised other end, and that it was unclear of course, which of the two soldered on connectors was the serial console.

And also needless to say: It didn't work. Not a whee bit. So back to adventure mode: Different board? No dice. Different connector board? No dice either. The cable worked, the workstation saw a serial interface, but not for all the gold in the universe it would see characters going across or anything. There were more entertaining phenomena along this path, but in the end it turned out: there were two resistors missing on the connector board. Why these weren't there, I have no clue. A kind soul added resistors, and the connection worked immediately. BUT OF COURSE it was all the fault of /my/ Linux --- as my beloved colleages call it.

More UART

There was another phenomenon. The new cpu modules would work, the old ones ceased to boot. Turnes out, that the 3.3V regulator would not start, because that rail was powered by the external USB serial connector. But the maximum allowable current was not enough to power the board fully up. Turned out that the component group of this connector had been copied from elsewhere and sported a naming conflict with the rest of the board. Sigh.

Frequencies

There was the awkward situation that the first prototype board on my desk /once in a long while/ would not boot up, when switched on. For a PC at home, no problem. For a 24x7 hours machine thing? No way! I could not get to grips with it, and for some strange reason, noone was interested. The problem moved out of sight for a lengthy while --- but we all know, this kind of thing strikes again at the least convenient moment.

The second round of prototype boards came to my desk. And unbelievably, the godesses of computing had a very nice treat for me: one of them new boards /never/ booted up. The others booted in like half of the attempts. That was good! No kidding! Because it forced everyone else to notice :)

Long story short: one of the configurable frequencies within the cpu/fpga conglomerate was out of spec. After that had been fixed, all the boards came to live every time without a hitch. This thing took like two weeks on and off and involved three people. Needless to say that this was not in any project plan.

More Frequencies

So now that it had been re-discovered, that fiddling with the frequency setup might lead to interesting results, someone decided to change the crystal, from which all the frequencies were derived, from 50MHz to 33MHz. By virtue of configuration this would allow the system to increase it's overall speed by a whopping 2.5%! Yes! Unbelievable, isn't it?

So I set out to try a reworked board, and at first sight everything was nice. Except a day later I found out, that ethernet had ceased to function. Nothing at all. Not even the second stage boot loader was happy. So the next day, the hardware person hooked a piece of his more fancy equipment to it. He almost immediately spotted, that the clock driving the ethernet chip was not 25Mhz, but more like 16.7MHz. Tough luck! Turned out that the 50MHz crystals clock was divided using a simple gate to generate the 25MHz clock for the ethernet chip. Old crystal back. Everything nice.

Conclusions

Home