RISKS-LIST: RISKS-FORUM Digest Wednesday 28 February 1990 Volume 9 : Issue 72 FORUM ON RISKS TO THE PUBLIC IN COMPUTERS AND RELATED SYSTEMS ACM Committee on Computers and Public Policy, Peter G. Neumann, moderator Contents: Clients cross about crossed wires (David Sherman) 100-year-old can drive four years without test (David Sherman) Some comments on the Airbus (Robert Dorsett, Martyn Thomas) Re: Problems/risks due to programming language (Bruce Hamilton) Comments on programmer error (Geoffrey Welsh) "Goto considered harmful" considered harmful (Brad Templeton) lockd (Caveh Jalali) Re: Railroad interlocking systems (J.A.Hunter via Brian Randell) The RISKS Forum is moderated. Contributions should be relevant, sound, in good taste, objective, coherent, concise, and nonrepetitious. Diversity is welcome. CONTRIBUTIONS to RISKS@CSL.SRI.COM, with relevant, substantive "Subject:" line (otherwise they may be ignored). REQUESTS to RISKS-Request@CSL.SRI.COM. TO FTP VOL i ISSUE j: ftp CRVAX.sri.comlogin anonymousAnyNonNullPW cd sys$user2:[risks]get risks-i.j . Vol summaries now in risks-i.0 (j=0) ---------------------------------------------------------------------- Date: Mon, 26 Feb 90 16:47:29 EST From: David Sherman Subject: Clients cross about crossed wires Toronto Star, February 25, 1990: BROCKVILLE (CP) - Ma Bell got a wrong number last month -- 17,000 times. That is the number of long-distance calls incorrectly charged to telephone numbers in Iroquois, Ont. A Bell Canada official has described the mistake, blamed on computer error, as the company's biggest foul-up in years. During the billing period from Jan. 12 to Feb. 16, calls made by residents of Athens, Ont. were charged to customers in the Iroquois exchange. [...] I"ve never seen anything like this in my seven years as area manager," said Ron Kristiansen, customer services manager for Bell's Brockville, Kingston and Belleville exchange areas. Some Iroquois customers reports they had been charged for up to 40 long-distance calls they had not made, more than doubling their monthly phone bills. ------------------------------ Date: Mon, 26 Feb 90 07:55:31 EST From: David Sherman Subject: 100-year-old can drive four years without test Toronto Star, February 26, 1990: "I think it's stupid -- the whole damn thing," grumbled Charles Narraway. The Nepean, Ont. [suburb of Ottawa - DS] resident will be 100 years old in March, and that seems to have turned up a bug in the transportation ministry's computer. For 20 years, Narraway has had to take an annual road test o get his driver's license renewed. And for 20 years he passed it the first time, every time. But this year he got a license, good for four years, without a test. Narraway's driver's license shows his date of birth as 90-03-04, and he figures the computers have tacked the wrong century on to the front.... [Some of you will recall the story of the man whose insurance rates tripled (high-risk) when he turned 101. The computer truncated... PGN] ------------------------------ Date: Mon, 26 Feb 90 23:28:54 -0600 From: rdd@walt.cc.utexas.edu (Robert Dorsett) Subject: Some comments on the Airbus Dave Morton wrote in RISKS-9.65: > We talked about the > Airbus crash and it seems that in pilot circles it was attributed to the > fact that the older Airbus machines had Rolls-Royce engines. It seems that > when given full power they reacted about 3 seconds faster that those on > The A320. In his opinion these three seconds were vital. However, it's a totally different category of engine (CFM-56 vs. large-fan). Different airplanes require different approaches, and "type conversion" training is usually very thorough. The pilots in question also had 300-odd hours in type (each), 10,000+ hours of experience (each, distributed among different aircraft types), and the captain (who was flying) was the chief of A320 training for Air France. What's noteworthy about the Mulhouse-Habsheim crash was the lack of formal cockpit procedures and the lack of flight crew experience in the type of maneuver performed. It was almost definitely pilot error. >a role in his judgement. On a side note the pilot also mentioned that the >757, which also has the fly by wire system, sometimes "hangs". The only >way to get the electronics to function again is to power cycle the lot. That may be so, but it's not possible to extend the lesson to the A320, since the design of the systems are totally different (the fundamental *design* philosophy as regards pilot access to the electrical system is different, too). And as a small note, the flight augmentation system of the 757 is in no way comparable to the A320's fly-by-wire system (or its associated software protections). The former is dedicated to damping aerodynamic instability; the former introduces new *control laws* and appears to be intended to work around problems involved in the use of side-stick controllers. George Michaelson wrote in RISKS-9.69: >I find it hard to raise any possible risks in technology transfer to developing >countries (does that label apply to India?) given the overtones of chauvinism >if not downright racism This sentiment bugged me--light-footing it around safety-critical issues because they might reflect poorly of the policies of developing nations. Bugged me enough to write a rather long-winded response (which I'll be glad to email to anyone who's interested in my ramblings). Suffice it to say that maintenance *is* a critical issue among all airlines operating new equipment, and that the situation is *particularly* serious in developing nations. Udo Voges wrote in RISKS-9.70: >have decided to require an extended basic training in India. The pilot S. S. >Gopujkar was probably acting as an instructor during the accident flight on >last Wednesday (14 Feb 90 uv), despite the fact that he didn't have the >required qualification. This would explain, why he asked the control tower to >make a manual landing on sight. The problems in this account are: 1. No airline that I'm aware of performs simulated in-flight systems failures on revenue flights. 2. Airbus proper regards "manual" (electric horizontal stabilizer trim and manual rudder) backup as a last-ditch emergency system. Numerous reports indicate that its operation isn't even part of the standard Airbus training curriculum. It's not something that a marginal pilot is likely to be playing with. Marginal pilots tend to hide *within* automation, to become systems managers, rather than pilots. 3. It's highly unusual for an aircraft to consult with an air traffic con- trol facility on the operation of on on-board system (unless, of course, the pilot expected to crash). The account sounds highly suspect. Upon examination, various terms could be examined--perhaps "manual" means hand-flying it (with protections) to the ground, rather than tracking an ILS, and the tower wished to confirm that the ILS wouldn't be used or something. I don't know. But note that if this IS the case, the A320's "safety" features should prevent precisely this sort of crash--the pilot should be able to fly to the surface, jerk back on the stick, do the most violent maneuvers, etc--all without crashing the airplane (that is, above 100', beneath which the protections disappear). > So again it is - only - human malfunctioning (insufficient training, > bad management procedures, financial reasons ?). Udo Voges But then the question is: assuming that the systems ARE working properly, when does it stop being "human error" and become poor ergonomics? RISKS has traditionally concentrated on software or hardware reliability issues, which are in themselves important issues. But don't forget that the very way the pilot INTERACTS with the aircraft is also a safety issue, and the A320 has thrown the lessons painfully learned over the past 80 years right out the window. Indeed, on a more general scale, the A320 is only the most extreme example of a disturbing trend. If one studies the different cockpit designs of airplanes introduced since 1981 (all of which use different display formats on CRT-based flight and engine instrumentation), one starts to recall the absolute lack of standardization that was rampant during the 1920's, 30's, and 40's. When one considers the problems this will cause in training, and factors in pilot gullibility (perhaps believing what the manufacturer says about the safety features of the airplane--I find it quite amusing that the pilot op- ponents to excessive automation or fly-by-wire are being branded with '50's- mentality "anti-progress" labels), I suggest that we can probably expect the same sort of safety records. ------------------------------ Date: Tue, 27 Feb 90 16:51:23 BST From: Martyn Thomas Subject: A320 sidesticks The A320 sidestick layout is asymmetric - the captain flies with the left hand, the first officer with the right. This means that when the first officer is flying in the left-hand seat - as will happen from time-to-time, e.g., for training - there is the added unfamiliarity of using the other hand. Martyn Thomas, Praxis plc, 20 Manvers Street, Bath BA1 1PX UK. [It is pessimal when the captain is right-handed and the first officer is left-handed and *both* are flying with the *wrong* hand. But the switching back and forth must undoubtably be confusing. PGN] ------------------------------ Date: 26 Feb 90 17:48:40 PST (Monday) From: Bruce Hamilton Subject: Re: Problems/risks due to programming language I'm incredulous at the number of replies seeming to say, "No language prohibits all errors, so we might as well use "C"". Some 25 years ago, PL/I had named scopes. You can say something like Foo: BEGIN ... EXIT Foo; ... END Foo; and the EXIT has a clearly-defined scope, irrespective of how many intermediate levels of nesting there are. --Bruce 213/333-8075 ------------------------------ Date: Wed, 28 Feb 90 11:46:17 EST 3From: root@zswamp.FIDONET.ORG (Geoffrey Welsh) Subject: Comments on programmer error by Clifford Johnson & others In RISKS 9.71, there seems to be a general discussion of programmer error and its inevitability. I think that the entire discussion can be summarized as follows (sorry, I don't know who originally wrote this): The tendency to err which programmers have been noticed to share with other human beings has often been treated as if it were an awkwardness attendant upon programming's adolescence, which (like acne) would disappear with the craft's coming of age. It has proved otherwise. For years I avoided `C' because I was literally intimidated by the ease with which one can confuse oneself, such as: #define square(x) ( x * x ) [...] b = square( a++ ); Eventually a wise friend pointed out that no language can hope to second-guess your intentions and point out possible errors in logic, and that few programmers could maintain their sanity near a compiler that did so. Since then I've developed a few rules of thumb, such as not modifying the contents of more than one variable per line wherever possible (a natural outgrowth of having done some assembler), and I have been able to put out occasional `C' code that is neither more nor less buggy on average than what I write in any other language - though that doesn't say much! Geoff Welsh, 602-66 Mooregate Crescent, Kitchener, Ontario N2M 5E6 CANADA ------------------------------ Date: Wed, 28 Feb 90 14:42:03 EST From: brad@looking.on.ca (Brad Templeton) Subject: "Goto considered harmful" considered harmful This is a most unusual RISK. That famous paper put the poor GOTO so out of vogue that programmers are actually afraid of using it -- they will make their programs more complex and harder to understand in order to avoid using the dreaded goto. They do this out of fear of rejection. Some great programmer will read their code, see the "goto" and go tsk-tsk. Horrors. Because C has no multi level break or continue, the use of a goto to do loop exception handling detracts in no way from the "structuredness" of the program. But the fear of the stigma of goto has lead programmers to bugs, it seems. Yet another example of a RISK due to fear of RISKS. Brad Templeton, ClariNet Communications Corp. -- Waterloo, Ontario 519/884-7473 ------------------------------ Date: Mon, 26 Feb 90 13:34:05 -0800 From: Caveh Jalali Subject: lockd To lock is to block! sometimes. Due to a bug in SunOS 4.0.3, lockd (/usr/etc/rpc.lockd) will occasionally dump core on NFS servers. This happens silently and goes unnoticed until some NFS client attempts to lock a file on that server. As a result, a process requesting a file lock on a file will block on disk wait indefinitely (`D' state report from ps) until lockd is restarted (manually) on the NFS server for that file. The quickest way to detect this is to check for a core file in /. "file /core" will provide the name of the process which is responsible for the core file. Now, quickly check if that process is still around using "ps". If it is not, it is probably necessary to restart the service manually. In the case of lockd, it is sufficient to: cd / ; /usr/etc/rpc.lockd to restart/restore the service. The "cd" makes sure the next core dump will go in / again, which is where we "expect" it. 00c - Caveh Jalali [Our MM -- but not RISKS' --was out of commission as a result of the lockd problem. The LOCK-MESS MONSTER STRIKES AGAIN. BY THE WAY, we are backing off to a `less advanced' (old standard) SENDMAIL. It will probably be ready for the next issue of RISKS, so please stand by for a different set of horror tales. We had a dilly yesterday when SENDMAIL suddenly started spawning processes like crazy, and had to be killed. PGN] ------------------------------ Date: Tue, 16 Jan 90 11:44:35 BST From: Brian Randell Subject: Re: Railroad interlocking systems A short while ago I passed a message I'd seen on the net (in RISKS?) about railway interlocking to one of my colleagues here at Newcastle who is very interested in such topics. He gave me permission to forward his reply to RISKS, so here it is. Brian Randell ======= >From: J.A.Hunter@newcastle.ac.uk With reference to: > Date: Fri, 5 Jan 90 09:49:21 CST > From: Douglas W. Jones > Subject: Railroad interlocking systems > [...] > I suspect that there are useful parallels between the history of railroad > interlocking machines and the more recent developments in safety critical > digital control systems. An error in the logical design of an interlocking > machine could easily go undetected until it caused a train wreck, and I > wonder if old cases involving railroad interlocking machines might provide > useful precidents for many of the software liability questions that have been > raised recently. There's one case in British railroad history of an interlocking frame being in service for many years and then (apparently) causing a train wreck. It happened on February 14th, 1928 at Paragon Street station, Hull, in the East Riding of Yorkshire. The area had a large and complex track layout controlling several main and secondary lines and freight routes to the local dockyards. On that day two trains met head on, one having been incorrectly routed onto the wrong track. The signal box (control tower) in question would have had around two hundred levers and had three signalmen in charge. To understand the accident it is necessary to realise that signalling safety depends on more than just the interlocking frame. There are trackside devices that "prove" the position of points and signals and prevent, for instance, a signal being cleared if the switch it refers to has not been set appropriately, with both blades in position and locked. Additional equipment detects the presence of a train to prevent a switch being moved under it. These days this is done using electrical track circuits, in which the presence of a train is detected by the short-circuit through the axles from one rail to the other, but in mechanical systems this was not always present and various mechanical detectors were used. One of these was a treadle fitted near to switches; depressed by the flanges of the vehicles' wheels, it locked the switch in place until cleared. In the Hull accident, the signalmen were concerned not to delay a late running train and one of them reset the signal cleared for train A after the locomotive and three coaches had passed it. At this point the locomotive was still thirty feet from the trackside treadle which would lock the next switch during passage. During the time the train took to reach the treadle, less than a second, it had effectively disappeared from the logic of the interlocking frame. During this "window" one of the other signalmen set the route for train B. Working from the situation found after the accident, it was possible to show that this second signalmen has mistakenly pulled levers 95 and 96 instead of the correct 96 and 97. Train A was then routed into the path of train B. The interlocking that would have prevented this had been disabled because the first signalman had been too quick in cancelling the cleared signal, releasing the interlock on the switch, and the trackside detection of the train had not yet been triggered. Clearly this type of accident requires a human presence. The two signalmen represent co-operating processes who failed to obey the rules on (human) interlocks, and the machinery failed to limit their actions. I suspect that many aspects of railroad safety are probably of greater interest to psychologists than to computer scientists. Railroad interlocking frames may represent some of the most reliable computer programs ever constructed, but they are not immune from misuse by the "user". In present-day railroad signalling systems, many more of the operations are automatic and detection systems are more complete. Modern British practice requires that once a route has been set, a time delay of about one minute is enforced by the software before a conflicting route can be set. That such systems are not perfect and still rely on human vigilance was shown by the Clapham (South London) accident late in 1988 where a faulty track circuit train detector "lost" a train standing at a signal allowing an automatic system to route another train into the rear of it. A good source book for information on British railroad safety is: "Red for Danger", by L.T.C.Rolt, published by David & Charles Inc. (North Pomfret, Vermont 05053), ISBN 0-7153-8362-0 My copy is a 1982 fourth edition, but the book is is still in print. The information on the Hull accident above is my precis from a more detailed study within that publication. Phone: +44 91 222 8226 Computing Laboratory Fax: +44 91 222 8232 The University of Newcastle upon Tyne, England ------------------------------ End of RISKS-FORUM Digest 9.72 ************************