17-Mar-86 19:25:11-PST,12624;000000000000 Mail-From: NEUMANN created at 17-Mar-86 19:23:21 Date: Mon 17 Mar 86 19:23:21-PST From: RISKS FORUM (Peter G. Neumann, Coordinator) Subject: RISKS-2.29 Sender: NEUMANN@SRI-CSL.ARPA To: RISKS-LIST@SRI-CSL.ARPA RISKS-LIST: RISKS-FORUM Digest, Monday, 17 Mar 1986 Volume 2 : Issue 29 FORUM ON RISKS TO THE PUBLIC IN COMPUTER SYSTEMS ACM Committee on Computers and Public Policy, Peter G. Neumann, moderator Contents: Commission vs. Omission (Martin J. Moore plus an example from Dave Parnas) A Stitch in Time (Jagan Jagannathan) Clockenspiel (Jim Horning) Cordless phones (Chris Koenigsberg) Money talks (Dirk Grunwald, date correction from Matthew Kruk) [Non]computerized train wreck (Mark Brader) On-line Safety Database (Ken Dymond) The RISKS Forum is moderated. Contributions should be relevant, sound, in good taste, objective, coherent, concise, nonrepetitious. Diversity is welcome. (Contributions to RISKS@SRI-CSL.ARPA, Requests to RISKS-Request@SRI-CSL.ARPA.) (Back issues Vol i Issue j stored in SRI-CSL:RISKS-i.j. Vol 1: MAXj=45) ---------------------------------------------------------------------- Received: from eglin-vax.ARPA ... Mon 17 Mar 86 17:26:49-PST Date: 0 0 00:00:00 CDT From: "MARTIN J. MOORE" Subject: Commission vs. Omission To: "risks" Dave Parnas's points regarding the shuttle destruct system are well taken. The policy, stated informally, was that "it better work if we need it -- but it absolutely better NOT 'work' when we DON'T need it" which generated the extreme emphasis on preventing what Dave calls "risks of commission." I feel that the risk of commission on the destruct system is extremely small, while the risk of omission is somewhat higher, although still small. During validation testing and in every pre-launch checkout, we performed "exhaustive" checks -- "exhaustive" meaning that we tried every combination of [(2 central computers) * (6 remote sites) * (2 computers per site) * (2 transmitters per site) * (2 comm paths to each site) * (2 possible commands in various sequences)]. Yeah, this takes a *LONG* time (with practice, we got it down to several hours if everything went smooth.) On one occasion during validation testing, we did find a software error which only manifested on a particular (central computer/comm path/remote computer/unusual command sequence) combination. Exhaustive tests *are* necessary. I have often wondered why the emphasis was to prevent errors of commission over errors of omission (not to say that we wanted either kind, but errors of commission were definitely considered to be worse!). An erroneous destruct would cost the lives of the flight crew, loss of the Orbiter, and possibly damage on the ground if it occurred early in the flight (e.g., windows blown out, etc.) An erroneous non-destruct, in the worst case (if the ET were to detonate near the crowded spectator area on the NASA causeway), could cause the loss of TENS OF THOUSANDS of lives. Certainly this is worse than an erroneous destruct. I believe there may be a subconscious feeling that an erroneous destruct means the difference between a success and a disaster, while an erroneous non-destruct means the difference between a disaster and a worse disaster. Subjectively, that difference is not as great as the first, although objectively it may be much greater. Martin Moore [By the way, Dave Parnas suggested the following example to illustrate his message in RISKS-2.28:] "Consider elevators. Consider how much easier it is to prevent the floor indicator from saying "13" than to assure that the floor indicator will always give the actual floor that the elevator is on. The risk of indicating "13" can be gotten acceptably low by eliminating "13" from the set of indicator lights. The risk of indicating an incorrect floor or not indicating the current floor is much harder to eliminate." [Dave Parnas] ------------------------------ Date: Mon 17 Mar 86 11:43:53-PST From: JAGAN@SRI-CSL.ARPA Subject: A Stitch in Time To: Neumann@SRI-CSL.ARPA [As it now turns out, the reboot occurred just moments BEFORE I logged in Sunday night. Here are some further details. PGN] This is the probable sequence of events that led us back in time on CSLA: 1. A power glitch (late night SUNDAY) caused the F4 to hard boot. 2. During a hard boot, the TIME is retrieved from eleven independent sources (which are assumed to be correct!) 3. One of these sources had the incorrect time of some warm day in 1972 causing the average to be wrongly computed resulting in Dec 6th/1985. Suggestion: 1. Change the statistical measure from MEAN to something less sensitive to one or two abnormal times; for example the average of the 5th, 6th, and 7th largest times. [IT IS ABSOLUTELY INCREDIBLE THAT UNSAFE ALGORITHMS continue to be used. This problem is as old as the hills. Statisticians routinely throw out the absurd values before computing the mean. Dorothy Denning pointed out the pun in their terminology (applicable to Byzantine agreement algorithms, where you don't trust anyone): the OUT-LIERS are really the OUT-LIARS. EVEN WORSE, Jagan points out that if the clock had been accidentally set INTO THE FUTURE, things could also get very sticky. We also have a problem of nonunique clock readings during the hour at 2AM when Daylight Savings Time ends. A good time to be asleep. PGN] [Here is some more background.] Date: Mon 17 Mar 86 12:37:37-PST From: Mark Lottor Subject: [Louis A. Mamakos : time] To: Jagan@SRI-CSL.ARPA This was just to verify that the problem was on the remote system and not some local problem... --------------- Date: Mon, 17 Mar 86 15:31:14 EST From: Louis A. Mamakos To: MKL@sri-nic.ARPA In-Reply-To: Mark Lottor's message of Mon 17 Mar 86 11:34:41-PST Subject: time Yes, I can verify that it was indeed the clock (actually the host the clock was on) that was screwed up. It it unfortunate that there is no way to get the current year out of the WWVB clock. There was work being done in the computer room, which reset our LSI-11/73 host, which subsequently got confused. Sorry about the problem. Louis A. Mamakos WA3YMH Internet: louie@TRANTOR.UMD.EDU University of Maryland, Computer Science Center - Systems Programming ------------------------------ From: horning@decwrl.DEC.COM (Jim Horning) Date: 17 Mar 1986 1436-PST (Monday) To: RISKS@SRI-CSL.ARPA Subject: Clockenspiel Your errant clock reminds me of something that happened at Stanford in the mid-sixties. I was apparently one of the first users of the 360/67 to run a job that started on one day and finished on the next. When the statement for my account arrived in the mail, I had quite a job convincing my wife that the huge figure (to a graduate student couple) was nothing to worry about: It was a CREDIT resulting from a job that was charged for minus 23 hours and 58 minutes! The Xerox Alto operating system had a compiled-in reasonableness check on the date and time. When it started up, if the local clock wasn't "reasonable," it sent a request over the Ethernet and put "Date and Time Unknown" in the banner. Well, you guessed it: The day came when the (correct) time from the time server was no longer "reasonable," and therefore couldn't be corrected by appeal to the time server.... Jim H. ------------------------------ Date: 17 Mar 1986 11:48-EST From: Chris.Koenigsberg@G.CS.CMU.EDU To: RISKS FORUM (Peter G. Neumann, Coordinator) Subject: RISKS re: Cordless phones My roommate has a cordless phone and it goes on the blink every few weeks. All the phones in the house stop working. When you pick any one up, all you get is a very loud static sound and you can't dial out. I have learned that I can fix this problem by sneaking in his room and unplugging the cradle for his cordless phone. A visitor in the house was very frightened one night when she was left alone and though someone had cut the phone lines or something. It was the cordless phone on the blink again. Chris Koenigsberg ckk@g.cs.cmu.edu , or ckk%andrew@pt.cs.cmu.edu {harvard,seismo,topaz,ucbvax}!g.cs.cmu.edu!ckk (412)268-8526 office, (412)362-6422 home Center for Design of Educational Computing Carnegie-Mellon U. Pgh, Pa. 15213 ------------------------------ Date: Mon, 17 Mar 86 16:44:15 CST From: grunwald@b.CS.UIUC.EDU (Dirk Grunwald) To: risks@sri-csl.arpa Subject: re: money talks I read the 'money talks' article with great amusement. One of the risks to society which is worth talking about is the risk of using inappropriate, or downright silly, technology. Talking money would appear to be such a waste of resources. Certainly some other method of denomination descrimination could be devised for the visually impared. Rasied lettering, coinage instead of paper money, different sized paper money, different paper stock. But talking money? dirk grunwald university of illinois ------------------------------ Date: Mon, 17 Mar 86 09:07:32 PST From: Matthew_Kruk%UBC.MAILNET@MIT-MULTICS.ARPA To: RISKS@SRI-CSL.ARPA Subject: Money Talks Correction to my previous message: The date of the article should be March 14th (Friday). ------------------------------ Date: Mon, 17 Mar 86 19:04:03 EST From: ihnp4!utzoo!lsuc!msb@seismo.CSS.GOV To: ihnp4!seismo!sri-csl.arpa!RISKS@seismo.CSS.GOV Subject: [Non]computerized train wreck Me: > Not only was this NOT a case of computer malfunction, but indeed, a more > fully computerized system (with cab signalling and automatic train stopping) > would probably have prevented the accident. PGN: > [... Thus, even though it appears NOT to be a computer problem, > we discover that the computer could have done better! But, of course, > don't blame the computer system. Blame the people who specified, > designed, and implemented it -- not JUST the train operator(s). PGN] You sound more critical than I meant to be. The cost of equipping all major railways with cab signalling and the like would be considerable, to say the least. While such installations certainly do exist, especially on busy high-speed lines, the "centralized traffic control" in use on the route in question is probably much more common. Are you calling on all railways to upgrade their signaling systems long before they are life-expired, every time something somewhat better comes along? Mark Brader (ihnp4!utzoo!lsuc!msb and ...!dciem!msb are both me.) [One would hope that new improvements do not always require everything to be thrown out. Long ago we discovered the advantages of software solutions over hardware solutions. But when human lives are at stake, safer systems may be worth the price of upgrading equipment. I think that the incredible escalation of law-suit awards and of rates for malpractice and liability insurance may provide some new incentives. PGN] ------------------------------ Date: 17 Mar 86 15:14:00 EST From: "DYMOND, KEN" Subject: On-line Safety Database To: "risks" Our Library Bulletin (and as a frequent user I'd have to say that the NBS has one of the best technical libraries going) for February contained a notice that Pergamon Infoline (evidently a supplier of such services) is offering a new online database service, SAFETY: "SAFETY, produced by Cambridge Scientific Abstracts, provides broad interdisciplinary coverage of safety, including industrial, transportation, environmental, and medical safety. This database indexes journals, books, reports, patents, and proceedings published in 1981 or later." If someone on the list uses this database, please let us know how well it covers computer and software safety. Ken Dymond National Bureau of Standards ------------------------------ End of RISKS-FORUM Digest ************************ -------