Subject: RISKS DIGEST 11.14 REPLY-TO: risks@csl.sri.com RISKS-LIST: RISKS-FORUM Digest Wednesday 20 February 1991 Volume 11 : Issue 14 FORUM ON RISKS TO THE PUBLIC IN COMPUTERS AND RELATED SYSTEMS ACM Committee on Computers and Public Policy, Peter G. Neumann, moderator Contents: Another Computer Application fiasco story (Christopher Allen) Software failures and safety of American (Red Cross) blood supply (Rob James) MD-11 computer problems (Steve Bellovin) More on very reliable systems (Anthony E. Siegman) Re: Predicting system reliability (Henry Spencer) Broadcast LANs and data sensitivity (Henry Spencer) Re: software warranties (re: SysV bug) (Brian Kantor, David Lamb, Steve Eddins, Peter da Silva, Henry Spencer, Flint Pellett) The RISKS Forum is moderated. Contributions should be relevant, sound, in good taste, objective, coherent, concise, and nonrepetitious. Diversity is welcome. CONTRIBUTIONS to RISKS@CSL.SRI.COM, with relevant, substantive "Subject:" line. Others ignored! REQUESTS to RISKS-Request@CSL.SRI.COM. FTP VOL i ISSUE j: ftp CRVAX.sri.comlogin anonymousAnyNonNullPW CD RISKS:GET RISKS-i.j (where i=1 to 11, j is always TWO digits. Vol i summaries in j=00; "dir risks-*.*" gives directory; "bye" logs out. ALL CONTRIBUTIONS CONSIDERED AS PERSONAL COMMENTS; USUAL DISCLAIMERS APPLY. Relevant contributions may appear in the RISKS section of regular issues of ACM SIGSOFT's SOFTWARE ENGINEERING NOTES, unless you state otherwise. ---------------------------------------------------------------------- Date: 20 Feb 91 11:09:43 EST From: Christopher Allen Subject: Another Computer Application fiasco story The following copied from the (Manchester, UK) Guardian Weekly, Feb 17, 1991: High-tech blackout Whitehall's auditors are unable to verify Foreign Office spending last year because of a breakdown in the main computer controlling the accounts. John Bourn, the Comptroller and Auditor General, has refused to approve the Foreign Office accounts, covering embassies, Nato, the United Nations, military aid, the BBC World Service, and the British Council, because the ministry cannot produce accurate evidence of the spending. At one stage auditors found discrepancies totalling 458 million pounds between the money granted to the Foreign Office and their own records. After extensive checking the auditors were left with 26 million pounds of imbalances in accounts for embassies, external relations, the BBC and the British Council. The trouble began when it was decided to replace its six-year-old computer system with a new high-technology version. Ministers allocated 560 thousand pounds for the scheme, but ended up paying 937 thousand pounds employing a software company, Memory Computers, which failed to deliver on time and went into liquidation just after it did deliver. Meanwhile, a hard disc shattered inside the old computer, destroying all the information, and leaving officials to rely on the new untried system. Within months it started shutting down unexpectedly, and inexplicably posting money to the wrong accounts. All the bookkeeping staff left and their replacements were not able to familiarise themselves properly with the system to prevent further errors. A consultant from the bankrupt software company is now working for the FO at a salary of 53 thousand pounds a year to try to solve the problems. MPs on the Commons Public Accounts Committee are to summon Sir Patrick Wight, permanent secretary at the FO, to explain the mess. [FO=Foreign Office, MP=Member of Parliament, 1 pound ~ $2] ------------------------------ Date: Wed, 20 Feb 1991 12:51 EST From: JAMESRC@QUCDN.QueensU.CA Subject: Software failures and safety of American (Red Cross) blood supply Recently "60 minutes" presented a report on the American Red Cross and on the accidental release of untested donations to transfusion centres. The FDA documented >2000 such cases, although (to my knowledge) no documented cases of transfusion associated infection have yet been reported in the literature. Some of the failures were associated with software problems in the various regional AmCross centres, but that is as much as I know. If anyone has further information on these (apparent) software failures, I would appreciate additional information. Rob James, Department of Community Health and Epidemiology, Queen's University Kingston, Ontario, Canada K7l 2M1 613-542-3696 ------------------------------ Date: Wed, 20 Feb 91 15:50:02 EST From: smb@ulysses.att.com Subject: MD-11 computer problems According to the AP, American Airlines has suspended flight certification tests on a new MD-11 plane because of ``computer problems''. That, and some other problems with fuel economy, may lead the airline to refuse delivery of a second MD-11. No technical details on the computer problems were given in the article; does anyone on RISKS know more? ------------------------------ Date: Wed, 20 Feb 91 12:13:10 PST From: siegman@sierra.Stanford.EDU (Anthony E. Siegman) Subject: More on very reliable systems Anyone concerned with the subject of multiple correlated failures in systems with very reliable individual components should look back at the incident some years ago when a United Airlines jet lost all three engines simultaneous in flight over the Caribbean south of Miami. As best I recall, a mechanic servicing the plane had made the same mistake, leaving out a needed O-ring (!), on the oil pressure sensors in all three engines. He did this because the stockroom clerk, who normally installed the O-rings on the sensors before handing them to the mechanic, was temporarily away, so the mechanic went behind the counter and got the sensors himself. This incident seemed to have multiple classic elements: 1. Minor change in procedures had major consequences. 2. The problem was really a false alarm, i.e., the oil pressures were OK, just the sensor indications were wrong. 3. Confident claims that multiple jet engine failures were totally improbable proved completely wrong. Oh, they did get one (or two?) of the engines restated, just in the nick of time, however, and limped into Miami. ------------------------------ Date: Tue, 19 Feb 91 23:40:50 EST From: henry@zoo.toronto.edu Subject: Re: Predicting system reliability >Argument 2: A system as complex as SDI can never be evaluated in a way which >would give reasonable grounds for claiming that it would work correctly when >deployed. Of course, just what constitutes "reasonable grounds" is itself something that should be part of the specifications, and it is something that may have to be justified. None of the complex systems designed for fighting nuclear wars -- including the ones whose supposed efficacy has preserved the peace of the planet for circa 40 years -- has *ever* been evaluated in such a way (i.e. under nuclear attack!). The design of test criteria for such systems all too easily becomes a second-order way of cooking up fallacious "proofs" that the system is anywhere from trivial to impossible. Our lives depend frequently on systems that cannot possibly be tested to the "reasonable confidence" point before they are used... if you interpret that to imply reasonable confidence under the worst conceivable conditions. No airliner is ever tested in six-sigma turbulence. No building is tested in once-per-century wind loads. Space-shuttle payload limits are based on landing weight for the "Return To Launch Site" abort mode... a procedure that has never been tried and which some astronauts doubt is survivable. Operating-system kernel software is rarely stress-tested under truly severe operational loads. (As an example of that last, one of the major RISC-processor manufacturers does massive simulation of new designs, to the point where their machine rooms go from fairly quiet to furiously active at the time in late evening when the night's batch of simulation jobs fire up. This sudden huge surge in load has, I'm told, had a serendipitous side effect: at least once it uncovered the existence of very short "interrupt windows" in kernel code, where erroneous assumptions about the atomicity of operations caused system failures only if an interrupt struck in a window about a hundred nanoseconds long. (Specifically, the programmers had assumed that incrementing an integer in memory was an atomic operation, which is sort of true on single-processor CISCs but is rarely true on multiprocessors or RISC systems.) The code containing this botch is now theoretically obsolete, but it was in wide production use before the problem was discovered and is probably still in use here and there.) In traditional engineering, it is routine to assess worst-case behavior based on extrapolation from less severe testing. A demand that the worst case be tested is often a disguised call for a system's cancellation, since such testing is seldom feasible for large systems. The proper consideration is not whether we can safely extrapolate from less severe tests, because we must rely on such extrapolation and we already do; the questions are how best to do such extrapolation and what form of testing must be done to permit confident extrapolation. Henry Spencer at U of Toronto Zoology utzoo!henry ------------------------------ Date: Tue, 19 Feb 91 23:45:37 EST From: henry@zoo.toronto.edu Subject: Broadcast LANs and data sensitivity >How much of your data on the average network is really a security issue? >I work here at Boeing and, at least in my area, sensitive data is not >kept on the network ... Unfortunately, this is probably a definition of "sensitive" which is too narrow to be really applicable. How much of your area's data could be posted on a bulletin board in the bus station without upsetting Boeing or one of its customers? Probably not much. Even material which is not "sensitive" in a military sense is often of commercial value or significant to privacy. Henry Spencer at U of Toronto Zoology utzoo!henry ------------------------------ Date: 20 Feb 91 06:00:25 GMT From: brian@ucsd.Edu (Brian Kantor) Subject: Re: software warranties (was: the SysV security bug) >From: rick@pavlov.ssctr.bcm.tmc.edu (Richard H. Miller) >[As an aside, I have a difficult time understanding why a person who does not >pay for software maintenance expects to have bugs fixed. If you choose to >not pay for the service, then don't expect the vendor to fix it for free.] I do expect the vendor to fix it for free. When I pay for software, I do so under the assumption that it would perform as specified, not that it sorta might work kinda like what the manual said. When you buy a device (say, a car) you are granted a warrantee that it's free of defects, and will remain so for some length of time that is predicated (disregarding, for the moment, marketing factors) on the designed length of time it's to operate before it wears out. Software does not wear out, although it can become obsolete, so I would maintain that a piece of software which is found, at any time, to not perform as specified at time of purchase is defective and must be remedied by the vendor. A manufacturer must remedy the errors of his employees. That you hire peoples' mistakes when you hire their talents is one of the RISKS of doing business. A company which is selling defective or adulterated materials should be made to replace those materials with goods of the quality advertised. I see no reason that software should be exempt from this basic principle of fair dealing in business. Vendors who require paid maintenance agreements to repair faults where their software does not perform as specified are ripping off their customers. I believe that it is unjust enrichment. Does it seem fair to you that the product you bought does not work as specified and could not have been tested by the maker of the product or he would have found out that it didn't so work? Not only have you been burned by buying something that is broken, but you've had to act as the manufacturer's unpaid quality control department, and they want you to pay for the privilege of getting a working item! Recall the definition of chutzpah - the man who kills his parents then begs mercy from the court because he is now an orphan. Now while I've overstated this for effect, I would really like people to think about software warrantees, before the courts do it for us. Ethics ARE important: a professional should strive to be "an ornament to his profession." - Brian ------------------------------ Date: 20 Feb 91 14:07:18 GMT From: dalamb@avi.umiacs.umd.edu (David Lamb) Subject: serious bug in SVR3.2 gives root access easily (RISKS-11.13) A *bug* is a defect in the product. Why *shouldn't* the vendor fix it for free? Enhancements, I'll agree, ought to be paid for. Maybe this is diverging from what RISKS ought to talk about, but...What are the risks to society of fostering an attitude that the vendor has no responsibility for defects like the security bug mentioned? "Due to a design defect, your 1986 Model X car blows up if hit from behind in an accident. What, you didn't buy the $4000/year maintenance agreement?** Sorry, no free recall for you. Buy a 1991 Model X." **25% of purchace price per year: a software "maintenance" price I've seen a few times. David Alex Lamb ------------------------------ Date: Wed, 20 Feb 91 14:37:22 GMT From: eddins@uicbert.eecs.uic.edu (Steve Eddins) Subject: Charging for bug fixes (was: serious bug in SVR3.2 ... RISKS-11.10) I'll be the first to admit that I don't understand the economics of the software industry and the in's and out's of software maintenance. I'm just a customer. However, as a customer, I expect that when I pay for any product it will work as advertised. If it doesn't work, and the vendor wants to charge me more money to make it work, I will cease purchasing from that vendor. Steve Eddins University of Illinois at Chicago, EECS Dept., M/C 154, 1120 SEO Bldg, Box 4348, Chicago, IL 60680 (312) 996-5771 ------------------------------ Date: Wed, 20 Feb 1991 13:34:26 GMT From: peter@taronga.hackercorp.com (Peter da Silva) Subject: Re: RISKS DIGEST 11.13 Poster #1: > [As an aside, I have a difficult time understanding why a person who does not > pay for software maintenance expects to have bugs fixed. If you choose to > not pay for the service, then don't expect the vendor to fix it for free.] Why not? The bug is the vendor's responsibility. Imagine Ford selling cars with doors that unlocked if you hit them in the right place. Do you think they would get away with only providing fixes to people with maintainance agreements? Poster #2: > Because these systems are so 'cheap' (you can build a nice workstation for > significantly less than $1000), Could you explain how? My own system was more than that and I bought it from a friend for a phenomenal price. The operating system alone is a significant part of the cost. I've seen 386SX platforms for $875. Add $375 for the cheapest runtime only 2-user UNIX I've seen, and you're already at $1250. Add $300 for the RAM and as much or more for the bigger hard drive, and you're in the area of $2000. Now, $2000 I might be able to believe. ------------------------------ Date: Wed, 20 Feb 91 13:12:44 EST From: henry@zoo.toronto.edu Subject: Re: serious bug in SVR3.2 gives root access easily (RISKS-11.10) As I understand it, the people in question are not demanding free software maintenance. They are demanding that when they pay good money for software, they get software that meets specifications at least to the extent of being free of catastrophic flaws, and that if this cannot be assured at time of purchase, some minimal effort is made to assure it when flaws are found. Henry Spencer at U of Toronto Zoology ------------------------------ Date: 20 Feb 91 16:31:20 GMT From: flint@gistdev.gist.com (Flint Pellett) Subject: Re: serious bug in SVR3.2 gives root access easily (RISKS-11.10) I would say that there is no reason software should be any different than anything else you buy. If you buy a new appliance, you get a guarantee for 90 days against defects in materials and workmanship, and when you buy software you ought to get a similar guarantee. A bug like this is a defect in the workmanship, and if you are still covered by the guarantee, you ought to get it fixed free. But if you are past the 90 days (or however long) and you didn't buy a maintenance contract, then you are (and ought to be) just as much out of luck as if you didn't buy a maintenance contract on your fridge. Software buyers are likely to start paying attention to how long the guarantee is for, and not buy from companies with really short guarantee periods. However, there is another category of safety related bugs which have always been in a seperate class: If the gas tank in your Pinto tends to explode, or the car tends to shift itself into gear by itself, Ford does a recall (often because the Govt. ordered it to) and fixes it at their expense. The laws about automobile recalls are a lot stricter than the ones on Software Developers, but if software companies can't clean up their act themselves, then what we are going to end up with is a big Federal bureaucracy monitoring stuff like this just like the one that monitors the automakers. This particular bug is pretty clearly a "safety related" bug, in that failure to fix it could result in substantial losses by customers. I believe the software vendors also have a responsibility to let their customers know about it too, and need to mail something out to all their users they know about that either gives them the fix or tells them how to get it, just like the recall notices that you get if something is wrong with your car that could affect your safety. If too many vendors decide that it will cost too much now to act responsibly, then they're going to end up having to deal with a bunch of federal regulators who will make them act responsibly, and it will end up costing them (and us software buyers) a lot more. Flint Pellett, Global Information Systems Technology, Inc., 1800 Woodfield Drive, Savoy, IL 61874 (217) 352-1165 uunet!gistdev!flint ------------------------------ End of RISKS-FORUM Digest 11.14 ************************