💾 Archived View for gemini.complete.org › serial-tncs-in-linux captured on 2024-07-09 at 00:59:56. Gemini links have been rewritten to link to archived content
-=-=-=-=-=-=-
This page is about using serial TNCs with Linux Packet Radio[1].
For the majority of people, it appears that serial TNCs work great with Linux. Many people use `kissattach` to use them in KISS mode, and this seems to work just out of the box for almost everyone. It did for me (KR0L) with my Kenwood TS-2000 and standard 16550A UART on my Core 2 Quad machine.
I'm going to go into a lot of detail here about a particular problem I had -- and also the thought process for troubleshooting. This may be more detail than you need, but it should perhaps help illuminate the problem.
I set up another system -- a laptop with a USB-to-serial converter and an old Kantronics KPC-3+ TNC. The computer is talking to the TNC at 19200bps and the TNC is running in KISS mode. So several things are different from my working setup: the use of a USB-to-serial converter, the make and model of TNC, and the serial interface baud rate (19200 instead of 9600, though the over-the-air baud rate is 1200 in both instances.)
I initially noticed a problem almost immediately. I tried telnetting from my main station to this new one over VHF. Initially things started to work fine, but partway through the session setup I stopped receiving packets from the KPC-3+ TNC. The first question was: was there something weird with the TCP setup causing this? So I placed an AX.25 call to it and had the same problem, ruling out TCP.
I then went over to the problem machine itself and ran axlisten. axlisten showed incoming packets just fine, but showed no outgoing packets. I ran axcall on that machine, which should have immediately caused an outgoing packet, but still nothing. I restarted the system and it worked fine for awhile.
I also observed that stty sometimes would even hang after this problem occurred, and once even had a kernel panic.
This made me suspicious to start with of a flow control problem. Here's what flow control means. Let's say you have a 19200bps link to your TNC, but your TNC only can transmit at 1200bps. (This is also a common situation with a modem.) Your TNC or modem probably has some sort of internal transmit buffer, but it's going to be small. When it gets full, it needs to tell the PC to stop sending for a bit so that it doesn't lose characters.
Unfortunately, there is no standard way to accomplish this on a serial link. The two most common ways are called XON/XOFF or software flow control, and RTS/CTS or hardware flow control. XON/XOFF works by having the device send a Control-S when it needs the remote end to stop transmitting, and a Ctrl-Q when it's ready for it to resume. The advantage of this is that it doesn't require any additional signaling cabling. The disadvantages are many. First, it takes a certain time to transmit a Ctrl-S or Ctrl-Q. Moreover, what if the application in question needs to send one of those characters as part of its data? Sending a binary file, for instance, is almost guaranteed to send a Ctrl-S at some point. Without careful avoidance of this, it can result in a connection appearing to hang (lock up, or deadlock) due to XON/XOFF processing.
RTS/CTS involves signaling pins on the serial connector. The computer applies voltage to a pin when it's able to receive and removes it when it isn't. This is elegant in that it is out of band from the data, so there are no issues with handling special characters. It also is virtually instantaneous. But it's not supported by some older hardware or non-standard cabling. In practice, people prefer to use RTS/CTS whenever possible.
So my problem smells, in part, of XON/XOFF flow control deadlock. That is, the PC perhaps received a Ctrl-S (XOFF) from the TNC and never got a corresponding Ctrl-Q. That could happen for a few reasons. One reason could be that the packets sent to the TNC over the air contained a Ctrl-S character. Another could be that the TNC sent an XOFF but for whatever reason the XON was never received (or never sent by the TNC).
So, how could that explain the stty locking or the kernel panic? Well, a bit of investigation reveals that the USB serial port converter driver appears to handle XON/XOFF either directly in that driver or in the firmware of the converter itself. This could lead to that. This could also explain the kernel panic, or the kernel panic could have come from the AX.25 stack due to being unexpectedly prevented from transmitting from so long.
Either way, more investigation into flow control is warranted.
kissattach doesn't have any documented options about flow control. Strangely, mkiss from the same package has a -h option to enable hardware flow control ("handshaking" according to the mkiss manpage), but it has nothing that disables XON/XOFF (it is possible, though useless, to run both at the same time.)
It was time to look at the source code.
in ax25-tools/kiss/mkiss.c, hwflag is set to true if hardware flow control is requested. Then it calls:
tty_raw(tty->fd, hwflag);
`tty_raw` itself is defined in libax25 as, partially:
if (tcgetattr(fd, &term) == -1) { .... } term.c_cc[VMIN] = 1; term.c_cc[VTIME] = 0; term.c_iflag = IGNBRK | IGNPAR; term.c_oflag = 0; term.c_lflag = 0; #ifdef CIBAUD term.c_cflag = (term.c_cflag & (CBAUD | CIBAUD)) | CREAD | CS8 | CLOCAL; #else term.c_cflag = (term.c_cflag & CBAUD) | CREAD | CS8 | CLOCAL; #endif if (hwflag) term.c_cflag |= CRTSCTS;
Note, by the way, that kissattach never calls this function at all. Also note that nothing in libax25 or ax25tools ever mentions anything at all about XON/XOFF. In other words, even if you tell mkiss to use hardware flow control, it's not disabling XON/XOFF.
The second piece of the puzzle is what the TNC is doing. According to the Kantronics manual, the `XFLOW` option defines whether software flow control will be used, and it defaults to ON. The TNC I was using had just been reset to factory defaults, so of course this was ON over there. The manual says that in "transparent mode", `TRFLOW` and `TXFLOW` can override that. KISS is different from transparent mode, so there is no apparent overriding from there. So it may have been sending XOFF characters as flow control, although the documentation did not specifically address its use in KISS.
The documentation did state that "The TNC always uses hardware flow control". There is no documented option to disable it, and no documentation that it's disabled in KISS. (However, there is some cause to doubt the accuracy of both statements in the KISS context, though they may still prove accurate.)
Thus far, we have a suspicion that XON/XOFF flow control could be a problem. The PC may be seeing XOFF characters generated by the TNC, or possibly passed through in other packets. So what does the KISS specification[2] say about it? We're particularly interested in what it says about flow control in general, what also what steps it may take to prevent incoming Ctrl-S characters from being presented directly as such to the PC.
2: http://www.ax25.net/kiss.aspx
On the first point, we see:
Essentially, KISS says that there is to be no flow control; if buffers are exceeded, packets just get dropped, but the underlying AX.25 protocol already has other ways to deal with that.
So what about XON/XOFF characters in particular?
The "handshaking signals" here are XON and XOFF. Another concern is what if a Ctrl-S naturally occurs in the data coming in. KISS defines ways of dealing with four characters that have special meaning in section 2. However, Ctrl-S (0x13) is not among them.
Therefore:
I did Google about this, and found one other report of the problem. That solution[3], however, disabled RTS/CTS, and reported it helped the problem -- but only a little bit.
3: http://fettechnologies.com/tncx.html
So far, we've established that the TNC has XON/XOFF turned on, and thus may be generating these characters in KISS mode. What is the PC side doing?
# stty -a -F /dev/ttyUSB0 speed 19200 baud; rows 0; columns 0; line = 0; intr = ^C; quit = ^\; erase = ^?; kill = ^U; eof = ^D; eol = <undef>; eol2 = <undef>; swtch = <undef>; start = ^Q; stop = ^S; susp = ^Z; rprnt = ^R; werase = ^W; lnext = ^V; flush = ^O; min = 1; time = 0; -parenb -parodd cs8 hupcl -cstopb cread clocal -crtscts -ignbrk -brkint -ignpar -parmrk -inpck -istrip -inlcr -igncr icrnl ixon -ixoff -iuclc -ixany -imaxbel -iutf8 opost -olcuc -ocrnl onlcr -onocr -onlret -ofill -ofdel nl0 cr0 tab0 bs0 vt0 ff0 isig icanon iexten echo echoe echok -echonl -noflsh -xcase -tostop -echoprt echoctl echoke
Notice in there the `-crtscts`. The dash means it's disabled, though that's probably because I did that. Note also `ixon -ixoff`. Reading the stty manpage shows that that means it is processing XON/XOFF on incoming traffic but is not generating XON/XOFF on outgoing traffic. (It is unlikely that the PC would do so anyhow, since its processing speed is so far higher than 1200bps).
So looking at the above evidence, I think it makes sense that I need to:
Set `XFLOW OFF` in the TNC to make sure it doesn't insert RTS/CTS
Completely disabled XON/XOFF on the PC side
Experiment with turning on hardware flow control on the PC side
Why the hardware flow control enabling, even though KISS spec says we don't? Well, there's a chance that this could avoid the occasional dropped packet and help throughput. The danger is how big the buffer on the PC may be; bufferbloat[4] could turn out to be an even bigger problem. My first attempt, though, will be to turn it on.
So, how to do that? Just add this line to my startup script:
`stty -F /dev/ttyUSB0 raw crtscts`
The `raw` option includes `-ixon -ixoff`, which disables XON/XOFF handling, as well as disabling handling of other special control characters. This doesn't set parity; maybe `-parenb -parodd cs8` should be added, but `stty -a` shows that these are already set in my instance.
After doing the above, I found that the thing worked better, but not perfectly. Also disabling CTS/RTS and switching to 9600bps fixed it. I am suspecting a buggy 19200bps implementation in the Edgeport.
There is apparently a serious bug[5] in the KPC-3 and KPC-3+ KISS mode that may be at the root of this. As an experimental measure, I have dropped the serial interface rate from 9600 to 1200 to try to address it. Perhaps it was not the Edgeport with the buggy 19200bps implementation, but actually the Kantronics.
5: http://blog.aprs.fi/2011/03/kantronics-kpc3-considered-harmful.html
--------------------------------------------------------------------------------
Before proceeding, start with the
Packet Radio[7] page.
(c) 2022-2024 John Goerzen