25-Feb-86 00:57:30-PST,11142;000000000000 Mail-From: NEUMANN created at 25-Feb-86 00:55:53 Date: Tue 25 Feb 86 00:55:53-PST From: RISKS FORUM (Peter G. Neumann, Coordinator) Subject: RISKS-2.15 Sender: NEUMANN@SRI-CSL.ARPA To: RISKS-LIST@SRI-CSL.ARPA RISKS-LIST: RISKS-FORUM Digest, Tuesday, 25 Feb 1986 Volume 2 : Issue 15 FORUM ON RISKS TO THE PUBLIC IN COMPUTER SYSTEMS ACM Committee on Computers and Public Policy, Peter G. Neumann, moderator Contents: Software Safety Survey (Nancy Leveson) Titanic Effect (Nancy Leveson) F-18 spin accident (Henry Spencer) Space shuttle problems (Brad Davis) Misdirected modems (Matt Bishop) The RISKS Forum is moderated. Contributions should be relevant, sound, in good taste, objective, coherent, concise, nonrepetitious. Diversity is welcome. (Contributions to RISKS@SRI-CSL.ARPA, Requests to RISKS-Request@SRI-CSL.ARPA.) (Back issues Vol i Issue j stored in SRI-CSL:RISKS-i.j. Vol 1: MAXj=45) ---------------------------------------------------------------------- Received: from ICSD.UCI.EDU ... Mon 24 Feb 86 18:59:38-PST [DELAYED!] Date: 22 Feb 86 12:20:49 PST (Sat) From: Nancy Leveson To: RISKS@SRI-CSL.ARPA Subject: Software Safety Survey I have been interested in safety for the past five years and have just completed a long-term project to write a survey of software safety. It includes sections on whether there is a problem (probably not of doubt to those who already read RISKS), why there is a problem, the implications for software engineering research, the relationship of software safety to software reliability and security research, a definition of software safety, a brief survey of relevant aspects of system safety, and a description of software safety techniques and research issues (requirements analysis, verification and validation of safety, assessment, software design, and human factors). Although good software engineering techniques will undoubtedly add to the safety of software, this is not a software engineering survey -- emphasis is on needed additions and changes to current software engineering techniques and research and on new procedures which have special relevance to safety. It is also a technical rather than a political document (although a few ethical and political issues are mentioned in the conclusions). The paper is currently in the form of a technical report although it has been submitted to Computing Surveys (chosen primarily because of the size of the document -- about 60 pages). I will be glad to send out a reasonable number of copies in exchange for any comments which might help me to improve it. Comments on what you like and think is correct or helpful would be nice along with the complaints. If you would like a copy, send your regular mail address (not e-mail) to me: nancy@uci.edu (after March 4 my e-mail address is changing to nancy@ics.uci.edu). Nancy Leveson ICS Dept. University of California, Irvine ------------------------------ Received: from ICSD.UCI.EDU ... Mon 24 Feb 86 19:00:53-PST [ALSO DELAYED] Date: 22 Feb 86 12:21:30 PST (Sat) From: Nancy Leveson To: RISKS@SRI-CSL.ARPA Subject: Titanic Effect In Peter Neumann's latest SEN column, he mentions the Titanic Effect without an explanation of why it occurs. [Actually, JAN Lee introduced it unattributably in RISKS-1.21: The severity with which a system fails is directly proportional to the intensity of the designer's belief that it cannot. PGN] I would like to suggest a hypothesis. The Titanic effect is essentially the statement that the worst accidents often occur in systems which are thought to be completely safe. The Titanic was thought to be so safe that normal safety procedures, such as having an adequate number of lifeboats, were neglected with the result that many more lives were lost than might have been necessary. The lesson is an important one because it goes back to the problems of using quantitative assessment techniques. Quantitative risk assessment can provide insight and understanding and allow comparison of alternatives. Probabilistic approaches have merit in that the necessity to calculate very low probability numbers forces on the analyst a discipline that requires studying the system in great detail. But there is also the danger of placing implicit belief in the accuracy of a calculated number. That is, it is easy to place too much emphasis on the models and forget the many assumptions which are implied. The models can also never capture all the factors, such as quality of life, that are important in a problem. (see Morgan -- Probing the Question of Technology-Induced Risk, IEEE Spectrum, Nov. 1981). Getting back to the Titanic, certain assumptions were made in the analysis that did not hold in practice. For example, the ship was built to stay afloat if four or less of the sixteen water-tight compartments (spaces below the waterline) were flooded. Previously, there had never been an incident where more than four compartments of a ship were damaged so this assumption was considered reasonable. Unfortunately, the iceberg ruptured five spaces. It can be argued that the assumptions were the best possible given the state of knowledge at that time. The mistake was in placing too much faith in the assumptions and the models and not taking measures in case they were incorrect (like the added cost of putting on-board an adequate number of lifeboats). The Titanic is not an isolated example. Safety devices (such as sensors in solid-rocket boosters or software safety analysis and design procedures) cost -- usually in terms of dollars and performance. There are often attempts to get around them by using models which show that the hazards are so negligible that the cost is unjustified. On the other hand, too much safety can make the system unusable or unprofitable which is not the answer either. The Titanic provides an important lesson for us involved in building safety-critical computer systems. Our current models for software reliability make a large number of assumptions which may be unrealistic or just not hold for particular systems. This is true also, to a lesser extent, with the hardware reliability models. Much effort is frequently diverted to proving theoretically that a system meets a stipulated level of risk when the effort could much more profitably be applied to eliminating, minimizing, and controlling hazards. I have seen a great deal of effort spent on trying to prove that a system which contains software has two or three more "9's" after the decimal point in the reliability models when the effort and resources might have been more effective if applied to using sophisticated software engineering and software safety techniques. Nancy Leveson ICS Dept. University of California, Irvine ------------------------------ Date: Tue, 25 Feb 86 02:44:45 EST From: ihnp4!utzoo!henry@seismo.CSS.GOV To: risks@sri-csl.arpa Subject: F-18 spin accident Recent reading of a book on the F-18 turned up a couple of details on the spin accident that might be of interest; I don't think these were part of the original reports. (For those who haven't heard about this before... The US Navy's F-18 fighter is heavily computerized, including "fly by wire" controls in which the computers always mediate between the pilot's controls and the aircraft. One thing the software does is to limit control-surface movements to within safe ranges, so the pilot cannot accidentally break the aircraft in combat maneuvering. The 12th prototype was lost when it got into a peculiar type of spin and the software did not give the pilot enough control authority for recovery. The pilot ejected safely. After investigation, the software was modified.) Detail number one has something to say about the problems of exhaustive testing: even after the problem was known to exist, it took 110 attempts to duplicate the spin! Detail number two is that the spin was *not* unrecoverable. The spin-test F-18 was equipped with an anti-spin chute just in case, but the pilot who first duplicated the spin discovered that he could recover without the chute by setting the outer-side engine to flight idle and the inner-side engine to full afterburner. The original pilot could have done this, had he thought of it. This might strengthen the case for giving flight-control software more authority, so that such unorthodox substitutes for ineffective or damaged controls could be employed automatically. Reference: Bill Gunston, "F/A-18 Hornet", Ian Allan 1985, p. 43. Gunston does comment that apart from this one strange spin mode, the F-18 is probably the most spin-proof fighter in existence. Henry Spencer @ U of Toronto Zoology {allegra,ihnp4,linus,decvax}!utzoo!henry ------------------------------ Date: Mon, 24 Feb 86 19:53:39 MST From: b-davis@utah-cs.arpa (Brad Davis) To: RISKS@sri-csl.arpa Subject: Re: Space shuttle problems (a comment on risks in general) If the current speculation about the shuttle is true (that seals on the solid fuel rockets failed because of the cold and that the Thiokol engineers asked for a delay because of their worries) then we should look more at the humans in any decision chain. Most of the major failures that I can recall were due to humans overriding the expert system (whether it was a electronic, mechanical, or human expert system) or just messing things up in the first place (like the UPL power generator that was connected to the power grid backwards, made a real big electric motor for a short time). Brad Davis {ihnp4, decvax, seismo}!utah-cs!b-davis b-davis@utah-cs.ARPA One drunk driver can ruin your whole day. ------------------------------ Date: 24 Feb 1986 2221-PST (Monday) From: Matt Bishop To: risks@sri-csl.ARPA Subject: Re: Misdirected modems Reminds me of something I read at 7 SOSP. Someone got the bright idea of collecting computer horror stories (an excellent idea, by the way!) and one of them involved one computer calling another. The connection suddenly quit working after about a year. The system people got curious and hooked an audio device to the telephone line. They then told the computer to contact its counterpart. They heard the computer dial, the phone at the other end go off hook, and the computer send its whistling tones indicating it had something to say. From the other end came the words, "Martha, it's that nut with the whistle again." Problem solved. Matt [Thanks for the anonymous plug. It was I who anthologized the anecdotes for 7 SOSP, and they all appeared in ACM Software Engineering Notes Vol 5 no 1 (January 1980) -- as noted at the very bottom of my disaster list in RISKS-2.1! PGN ------------------------------ End of RISKS-FORUM Digest ************************ -------