Subject: RISKS DIGEST 11.08 REPLY-TO: risks@csl.sri.com RISKS-LIST: RISKS-FORUM Digest Wednesday 13 February 1991 Volume 11 : Issue 08 FORUM ON RISKS TO THE PUBLIC IN COMPUTERS AND RELATED SYSTEMS ACM Committee on Computers and Public Policy, Peter G. Neumann, moderator Contents: News of His Death Much Exaggerated (Jeff Johnson) Prison terms for airline computer ticketing fraud (Rodney Hoffman) PWR system "abandoned owing to technical problems" (Martyn Thomas) Risks of having a sister (Robyn A Grunberg, Charles Meo) Re: Study links leukemia to power lines, TV's (Steve Bellovin) Re: Predicting System Reliability... (Jay Elinsky, Tanner Andrews, Martyn Thomas, Jay Elinsky, Paul Ammann) The RISKS Forum is moderated. Contributions should be relevant, sound, in good taste, objective, coherent, concise, and nonrepetitious. Diversity is welcome. CONTRIBUTIONS to RISKS@CSL.SRI.COM, with relevant, substantive "Subject:" line. Others ignored! REQUESTS to RISKS-Request@CSL.SRI.COM. FTP VOL i ISSUE j: ftp CRVAX.sri.comlogin anonymousAnyNonNullPW CD RISKS:GET RISKS-i.j (where i=1 to 11, j is always TWO digits. Vol i summaries in j=00; "dir risks-*.*" gives directory; "bye" logs out. ALL CONTRIBUTIONS CONSIDERED AS PERSONAL COMMENTS; USUAL DISCLAIMERS APPLY. Relevant contributions may appear in the RISKS section of regular issues of ACM SIGSOFT's SOFTWARE ENGINEERING NOTES, unless you state otherwise. ---------------------------------------------------------------------- Date: Tue, 12 Feb 91 14:15:18 PST From: Jeff Johnson Subject: News of His Death Much Exaggerated The San Francisco Chronicle (11 Feb 91) has on the second page a photo of a man pointing to the Vietnam War Memorial wall in Washington, D.C. The caption reads: "Vietnam veteran Eugene J. Toni of suburban Virginia pointed to his name on the Vietnam Memorial in Washington yesterday. Toni, a 41-year-old former Army sergeant, is one of 14 Americans who can find their own names carved in black granite among the 58,175 dead and missing in the war. Toni was listed because a wrong number was typed into a computer." JJ, HP Labs ------------------------------ Date: Wed, 13 Feb 1991 09:50:06 PST From: Rodney Hoffman Subject: Prison terms for airline computer ticketing fraud In RISKS 7.72, I summarized a 'Wall Street Journal' article about a travel agency employee charged with breaking into American Airline's computer reservations system for fraud. I believe this recent item is the conclusion of that case: 'Los Angeles Times', 11 Feb. '91: TRAVEL AGENTS SENTENCED: Their federal terms ranged from nearly two years to four years in prison for running a scheme to defraud American Airlines of frequent-flier tickets totaling $1.3 million between 1986 and 1987. Through a computer reservation terminal at North Ranch Travel Agency in Woodland Hills (CA), the three men changed American Airlines' records on frequent fliers, crediting fictitious accounts with miles flown by legitimate passengers not enrolled in the frequent-flier program. The defendants then used the miles to apply for free flights, sold them for profit or gave them to friends and family. They were convicted after a trial last year. (Case No. 90-409. Sentencing Feb. 5) ------------------------------ Date: Wed, 13 Feb 91 13:25:04 GMT From: Martyn Thomas Subject: PWR system "abandoned owing to technical problems" The following story is from Nucleonics Week (pubs: McGraw-Hill) Vol 32 No 1 (Jan 3 1991) and No 2 (Jan 10 1991). Electricite de France (EDF) has decided in principle to abandon the Controbloc P20 decentralised plant supervisory computer system developed by Cegelec for EDF's new N4 Pressurised Water Reactor (PWR) series, because of major difficulties in perfecting the new product, according to EDF officials. EDF does not yet know [as of Jan 3rd] what it can use in place of the P20 to control the N4 reactors, the first of which is nearly completed. [They were meeting to decide the way forward on January 25th. Options include trying to salvage parts of the P20, or reverting to the N20 system used to control the earlier P4 series of reactors {the numbering seems maximally confusing}. Unfortunately, the P20 data acquisition and control uses dual LANs, called Controbus, whereas the N20 uses cables. If they fall back to the N20, they will have to design miles of cables into the reactor to replace the LANs.] A Cegelec official described the P20 as "the most ambitious system you could imagine". It has distributed control and monitoring, programmable logic controllers, and 32-bit microprocessors. The N20 used 8-bit microprocessors. Cegelec blame EDF reorganisations for the cancellation, but EDF's engineering and construction division say that the problems were strictly technical. According to Pierre Bacher, the division's president, the failure to achieve sufficient capacity to process the mass of acquired reactor data with the original P20 architecture had led to "increasingly complex software programs" with "increasingly numerous interactions between subsystems". The complexity apparently grew to the point where modification became difficult and there was fear that the system could never be qualified [which I take to mean certified for use]. According to the report, "Ontario Hydro faced a similar situation at its Darlington station, in which proving the safety effectiveness of a sophisticated computerized shutdown system delayed startup of the first unit through much of 1989. Last year, faced with regulatory complaints that the software was too difficult to adapt to operating changes, Hydro decided to replace it altogether". [ I hope that Dave Parnas or Nancy Leveson can fill in the details here.] Of particular interest to UK RISKS readers is the fact that the P20 system is on order for the Sizewell B PWR (due to load fuel in November 1993, and the only remaining scheduled PWR in the UK nuclear power programme). The P20 "is to be applied less to safety systems at Sizewell than was planned on the N4", the report says. [Sizewell has a separate shutdown system, although there are rumours that all is not well with it.] There is a fully computerised N4 control room designed to go with the P20 system. If the P20 cannot be salvaged, presumably this will be abandoned too. [There is more detail in the two reports, which I recommend interested readers acquire]. Martyn Thomas, Praxis plc, 20 Manvers Street, Bath BA1 1PX UK. Tel: +44-225-444700. Email: mct@praxis.co.uk ------------------------------ Date: 12 Feb 91 02:32:26 GMT From: rag@yarra.oz.au (Robyn A Grunberg) Subject: Risks of having a sister On Thursday 7th February, I arrived home from an interstate trip to find a letter in the mail stating that my driver's license had been cancelled for 6 months. The cancellation took effect from January 15th, 1991 and was to continue until July 15th 1991. The cancellation was due to my driving a car while exceeding the state limit of .05% alcohol in my bloodstream. This interested me greatly as I had not been breathalised nor blood tested on (or even near) the day in question, stated on the notice as December 17th 1990. The following day I approached VICROADS who had sent me the notice. I explained to them that I was not the offender of the crime and the clerk called up the details of the charge. My name was listed, as well as my license number, however the registration number of the car involved in the incident was not the registration number of my car. The clerk suggested I fill out a Statuary Declaration and file that (along with the notice) with them so that the department could place the matter under investigation. I then went to the Police Station where I obtained a Statuary Declaration and had it witnessed. I also asked if the officer could check and see whose car was involved, as it wasn't mine. The officer checked out the records and returned to tell me that the car belonged to my sister, who is unlicensed. He also explained that I was able to drive as long as I carried the Stat Dec with me at all times. Unfortunately, my licensed expired *that day*, so I then had to approach VICROADS and try and get them to reissue my license. The clerk would not reissue my license as it was currently under cancellation. I showed him the Stat Dec, which was no use to him (or me) at all, he could not reissue the license until the matter is resolved. He suggested I continue driving with the Stat Dec. I would not accept this statement from him and asked he put in writing the fact that I had attempted to renew my license and he had refused to reissue it. He would not put it in writing, and suggested I speak to his supervisor. So here I am without a license, and waiting for the matter to be heard. It would appear that my sister was breathalised and gave my details when asked who she was. The car, she explained, she had borrowed from her sister Michealle, which the police accepted in good faith. As far as the police are concered, all you need do is state your name, address and birthdate (which she did) and the police will accept this and demand that you show your license at a later date. Unfortunately, they also went ahead and cancelled my license without any proof that she was who she stated being, as she hasn't produced the license at any stage. ------------------------------ Date: 13 Feb 91 04:17:20 GMT From: cm@yarra.oz.au (Charles Meo) Subject: Re: Risks of having a sister For non-Australians it is worth pointing out that under unique (and unsuccessfully opposed) legislation, the burden of proof has been reversed and police are empowered to record a conviction _without_ any judicial process whatever and the driver is then obliged to prove his or her _innocence_ in the matter. This has enabled local police to generate enormous government revenues as many traffic infringements are now handled in this way. I do not know of any other civilised country that would allow this (the spirits of the old prison governors are alive and well in our seats of government!) and of course, when this law is translated into computer systems with _no_ safe guards the situation Robyn has described can happen easily. C. Meo #6512441/L (Turn to the right!) ------------------------------ Date: Sat, 09 Feb 91 23:46:22 EST From: smb@ulysses.att.com Subject: Re: Study links leukemia to power lines, TV's The original AP story was considerably longer, and included many more qualifiers. As noted in the excerpt your paper ran, this study will receive very careful scrutiny because it was sponsored by an industry group. The methodology has been described as somewhat suspect by the Electric Power Research Institute, though the University describes the findings as significant. The parents of children being treated for leukemia were quizzed about their child's activities; their responses were compared with those of a control group. (The article did not say how the control group was selected.) Unfortunately, memory plays funny tricks; in an era where newspapers often seem to feature the carcinogen of the week, parents of such children may -- and I stress the word ``may'' -- be more likely to recall suspect behavior patterns. (For example, the article noted a correlation to home use of pesticides, and to paternal exposure to spray paint at work during the pregnancy.) More troubling, the objective measurements taken don't seem to agree with the incidence of disease. For example, bedroom electric field strength measurements did not differ between the two groups, though since the measurements used were 24 hour averages, there may have been differing peaks. Similarly, there is no particularly obvious reason to suspect black-and-white TVs; according to the article, the researchers ``speculated'' that such sets might be older, and hence might not meet current standards. (If we're guessing, I'd guess that such TVs are smaller, and hence would be watched from a closer distance.) No statistically significant correlation was found with use of electric blankets or hair curlers; the former, at least, would (as I recall) contradict other studies. The study itself has not been released, and will not be, pending peer review and publication in a refereed journal. But a precis was released by the university and by the sponsors. --Steve Bellovin ------------------------------ Date: Sun, 10 Feb 91 00:16:56 EST From: "Jay Elinsky" Subject: Re: Predicting System Reliability... Re Brad Knowles' statement that disk drives are tested in parallel, and Bruce Hamilton's rejoinder that mechanical systems can't be tested in parallel: Just as the airframes in Bruce's example are presumably pressurized and depressurized at a much higher rate than occurs in actual takeoff-landing cycles, disk drives can presumably be tested at a higher duty cycle than they see in actual use. That is, the manufacturer can keep the heads thrashing around continually, unlike the typical drive on a desktop computer. I don't know how one would accelerate a test on a mainframe disk drive that perhaps does thrash around 24 hours a day, nor do I know if it's possible to accelerate the testing of the platter bearings, which are spinning 24 hours a day even on powered-up but otherwise idle machines. So, I assume (and I'm no M.E. either) that parallel testing is combined with tests that tend to accelerate wear of components where possible. Jay Elinsky, IBM T.J. Watson Research Center, Yorktown Heights, NY ------------------------------ Date: Sun, 10 Feb 91 9:51:25 EST From: Dr. Tanner Andrews Subject: Re: Building very reliable systems ) The theory here is that running 100 units for 100 hours gives you ) the same information as running one unit for 10000 hours. The theory is crocked. It builds heat slowly. The actual behavior: 100 hours: a little warm 200 hours: case is softening 250 hours: case melts 257 hours: catches fire The times and failure modes will vary, depending on the type of device in question. ...!{bikini.cis.ufl.edu allegra uunet!cdin-1}!ki4pv!tanner ------------------------------ Date: Mon, 11 Feb 91 11:47:59 GMT From: Martyn Thomas Subject: Re: building very reliable systems Jerry Leichter writes: ..... : 2. Functional decomposition of the system into a number of modules : such that failure can occur only when ALL the modules fail. : Now, the criticism of technique 2 is that the multiplication of failure proba- : bilities is only valid when the failures of the different modules are known : to be uncorrelated. ..... : So, is technique 2 worthless? By no means: It's just often misapplied. To : use it, you need to establish (by formal techniques, testing, experience in : the field) not just the failure rates for the individual modules, but an : upper bound on failure correlation among modules. This is by no means impos- : sible to accomplish. ........... : That's not to say that some still-unknown variant of n-version programming : can't be made to work. In fact, I'd guess that it can be, though it won't : be easy - and I certainly wouldn't want to propose a mechanism. If so, then : software systems to which we can reasonably ascribe "1 in 10^9" failure : probabilities should be quite buildable. [I have extracted the elements from Jerry's article that I want to disagree with. I thought the articles as a whole was a very valuable contribution to the discussion. I apologise in advance if I have distorted his argument by selective quotation.] How can we have confidence that the means by which we have combined the n-versions (for example, the voting logic) has a failure probability below 1 in 10^9? How can we be sure that our analysis of the upper bound on failure correlation among modules is accurate? How accurate does it need to be - does it need to have a probability of less than 1 in 10^9 that it is grossly wrong? (By "grossly wrong" I mean wrong enough to invalidate the calculation that the overall system meets the "1 in 10^9" figure). This would seem impossible. Consider, for example, the probability that the common specification is wrong. I also have a question for statisticians: if we are attempting to build a system "to which we can reasonably ascribe a 1 in 10^9 failure probability", what *confidence level* should we aim for, if we are using statistical methods? Does it make sense to be satisfied with 99% confidence of 1 in 10^9? Or should we aim for 99.9999999%? (I hope the answer isn't simply "it depends what you mean by "reasonably". I am looking for guidance on how the failure probability and the confidence levels interact in practical use). (I suspect that I am missing some contributions to this discussion. I would be grateful if anyone following-up would also copy me by email). Martyn Thomas, Praxis plc, 20 Manvers Street, Bath BA1 1PX UK. Tel: +44-225-444700. Email: mct@praxis.co.uk ------------------------------- Date: Sun, 10 Feb 91 00:11:17 EST From: "Jay Elinsky" Subject: Re: Predicting System Reliability... Re Brad Knowles' statement that disk drives are tested in parallel, and Bruce Hamilton's rejoinder that mechanical systems can't be tested in parallel: Just as the airframes in Bruce's example are presumably pressurized and depressurized at a much higher rate than occurs in actual takeoff-landing cycles, disk drives can presumably be tested at a higher duty cycle than they see in actual use. That is, the manufacturer can keep the heads thrashing around continually, unlike the typical drive on a desktop computer. I don't know how one would accelerate a test on a mainframe disk drive that perhaps does thrash around 24 hours a day, nor do I know if it's possible to accelerate the testing of the platter bearings, which are spinning 24 hours a day even on powered-up but otherwise idle machines. So, I assume (and I'm no M.E. either) that parallel testing is combined with tests that tend to accelerate wear of components where possible. Jay Elinsky, IBM T.J. Watson Research Center, Yorktown Heights, NY ------------------------------ Date: Mon, 11 Feb 91 13:22:03 -0500 From: pammann@gmuvax2.gmu.edu (Paul Ammann) Subject: Re: Building very reliable systems (Jerry Leichter, RISKS-11.07) > 1. Testing (whether by explicit test in a lab or by actual use in > the field) of very large numbers of copies of the system > 2. Functional decomposition of the system into a number of modules > such that failure can occur only when ALL the modules fail. The first technique assesses performance directly, and can be applied to any system, regardless of its construction. As Jerry points out, various assumptions must be made about the environment in which the testing takes place. The second technique estimates performance from a predictive model. > [...] To >use [NVP], you need to establish (by formal techniques, testing, experience in >the field) not just the failure rates for the individual modules, but an >upper bound on failure correlation among modules. The Eckhardt and Lee model (TSE Dec 1985) makes it clear that performance prediction is much more difficult. To evaluate a particular type of system, one must know what fraction of the components are expected to fail over the entire distribution of inputs. The exact data is, from a practical point of view, impossible to collect. Unfortunately, minor variations in the data result in radically different estimates of performance. For a specific system, it is not clear (to me, anyway) what an appropriate "upper bound of failure correlation among modules" would be, let alone how one would obtain it. >In fact, techniques 1 and 2 are fundamentally the same thing: One cuts the >world "vertically" between many complete copies of the same system; the other >cuts the system itself "horizontally" across its components. The same two >issues - reliability of the individual slices; independence of failure modes - >occurs in both cases. I am uncomfortable with merging the issues of direct measurement with those of indirect estimation. The difficulties in 1 are primarily system issues; details of the various components are by and large irrelevant. In technique 2 the major issue is the failure relationship between components. > Either technique can be used to get believable failure >estimates in the 1 in 10^8 (or even better) range. Such estimates are never >easy to obtain - but they ARE possible. Rejecting them out of hand is as much >a risk as accepting them at face value. I am unaware of any application of NVP in which it has been (believably) demonstrated that components of modest failure probability (say 1 in 10^4) can been used to generate a system with a very low failure probability (say 1 in 10^8). The relatively scant empirical evidence indicates that NVP might be good for an order of magnitude or so (which may be great, depending upon the system). However, there are no guarantees; in certain circumstances, NVP may well be worse than the use of a single component. The real issue is economic: could better systems be built by applying development resources to other technique(s). There are strong views on both sides of the question. (As a final aside, there are random algorithms that, for certain well behaved problems, *can* justifiably employ an independence model to obtain very low system failure probabilities. However, these techniques are not in the domain of NVP). -- Paul Ammann: pammann@gmuvax2.gmu.edu (703) 764-4664 -- George Mason University, Fairfax VA ------------------------------ End of RISKS-FORUM Digest 11.08 ************************