RISKS-LIST: RISKS-FORUM Digest Saturday 20 January 1990 Volume 9 : Issue 61 FORUM ON RISKS TO THE PUBLIC IN COMPUTERS AND RELATED SYSTEMS ACM Committee on Computers and Public Policy, Peter G. Neumann, moderator Contents: Shortage of RISKS but no shortage of risks -- the week in review (PGN) AT&T Failure (Bill Murray, Jim Horning) Risks of Voicemail systems that expect a human at the other end (R. Aminzade) Risks of vote counting (Alayne McGregor) Risks of supermarket checkout scanners (David Marks) European R&D in Road Transportation (Brian Randell) Old habits die hard (Dave Horsfall) The RISKS Forum is moderated. Contributions should be relevant, sound, in good taste, objective, coherent, concise, and nonrepetitious. Diversity is welcome. CONTRIBUTIONS to RISKS@CSL.SRI.COM, with relevant, substantive "Subject:" line (otherwise they may be ignored). REQUESTS to RISKS-Request@CSL.SRI.COM. TO FTP VOL i ISSUE j: ftp CRVAX.sri.comlogin anonymousAnyNonNullPW cd sys$user2:[risks]get risks-i.j . Vol summaries now in risks-i.0 (j=0) ---------------------------------------------------------------------- Date: Fri, 19 Jan 1990 15:14:08 PST From: "Peter G. Neumann" Subject: Shortage of RISKS but no shortage of risks Well, it was a very strange week for me to be off-line, not to be able to report timely commentaries on the AT&T long-distance breakdown, several glitches aboard the shuttle Columbia, the Internet Worm trial of Robert T. Morris, and the San Jose CA indictment of three young computer people. On the other hand, most everything was well covered in the media, which had a field-week (= 5 field-days?). The exact cause of the 15 Jan 90 AT&T slowdown is apparently still not known, although the result involved the propagation of bogus status information describing the partial outage of one network node, due to a software flaw. Apparently this information propagated to other nodes, and was amplified in turn by each of them (because of the same program flaw?), creating the `illusion' of a major outage. It should be noted that AT&T's service record up until that time had been exceptionally good, reflecting a fundamental concern for very high system availability. This outage immediately brought to mind (1) the article by Eric Rosen (Vulnerabilities of Network Control Protocols, ACM Software Engineering Notes, vol 6, no 1, January 1981) on the October 1980 four-hour collapse of the Arpanet, as a result of accidentally propagated bogus status messages that could not be garbage collected, and (2) the possibility of intentional insertion of a malicious effect. The latter possibility has been discounted by AT&T, but I observe somewhat tangentially that if an effect (e.g., a fault mode or vulnerability) can be triggered accidentally, in many cases it could alternatively have been caused intentionally. This was indeed the case in the Arpanet collapse, which was completely accidental. The shuttle glitches included the spurious sounding of various alarms as well as the sudden rolling over of the spacecraft, among others. The Morris trial awaits the final summations, the jury's decision, and the verdict, probably Monday or Tuesday. The San Jose indictment involves three people accused varyingly of certain malicious computer-related and/or telephone-related security violations. This case will presumably drag on for a long time. In each of these cases, RISKS will look forward to some definitive, nonspeculative reports, i.e., first-hand analyses by people involved, rather than just press clippings. The article by Eric Rosen on the Arpanet outage, noted above, and the article by Jack R. Garman, on the first shuttle synchronization problem, The Bug Heard 'Round the World, Software Engineering Notes, vol 6 no 5, October 1981, are superb examples of what I have in mind. Those of you who haven't seen those articles really should. They are absolutely required reading for all RISKS folks. Our sendmail time-out multiple-copy problem may or may not still exist (I believe all known fixes and some other hopeful improvements have been installed). However, several of you suffered multiple copies of RISKS-9.59, apparently because of a gateway glitch caused by someone's overflowing mailbox. (I continue to further subpartitioned the mailing list, in an attempt to isolate or minimize the problem, but even with small sublists it seems to continue sporadically.) I apologize for the frustration that results from multiple copies. My own frustration is most considerable when I cannot do anything about the problem. Above all, the difficulties of getting concurrent processing programs flaw-free is illustrated by almost all of the above-mentioned cases, the AT&T slowdown, the Internet Worm, the first shuttle synchronization problem, and the Arpanet collapse. PGN ------------------------------ Date: Wed, 17 Jan 90 09:37 EST From: WHMurray.Catwalk@DOCKMASTER.NCSC.MIL [WHMurray@DOCKMASTER.ARPA] Subject: AT&T Failure The assertion by AT&T, "in an effort to allay customer fears about the networks reliability," that the outage was "traced to a single computer program," not only fails to reassure me, but alarms me greatly. It suggests a serious failure on their part to understand the nature of the problem. While the proximate cause of this problem was an error in the design or implementation of a single program, the actual cause is a system that is unable to isolate failing components, and indeed that specifically designed to propagate the failures in such a way as to cause the failure of the entire system. This is the second event in as many years to demonstrate the failure of the new telephone system to cope with the challenges that confront it. If, on the day before the Hinsdale fire, you had asked me if the failure of a single central office could cause the loss of all long distance service to 350,000 subscribers, I would have said "No, we do not build telephone systems that way. You might lose access to one carrier, but you would retain access to the others." Never would I have imagined that Illinois Bell, only partially in response to the equal access requirements, would centralize all access to all carriers in a single unattended central office. Likewise, if on Sunday, you had asked me if a change, error, or manipulation of a single program in a single switch could bring down the entire AT&T network, I would have been happy to reassure you that such could not happen. AT&T has been the leader in teaching the rest of the world how to avoid such failures. While an authorized programmer, or even a hacker, might be able to affect a single switch, the system is specifically designed to prevent the effect from propagating. Little did know; little would I have believed. If AT&T management actually believes its press releases, if they are not simply propaganda designed to comfort the sheep, then it truly bodes ill for our world economy. Of course, part of the difficulty with such propaganda is that you might yourself forget that that was what it was. In a large organization like AT&T, it is likely that at least some of your employees will be taken in. I recognize that there are limits to our ability to identify and isolate failing components, that at some point further attempts to do so become self-defeating. If AT&T were claiming that this event was, similar to the second great Northeastern blackout, caused by, or even in spite of, such measures, I would be less alarmed. What concerns me is the pretense that such a failure is so anomalous that special precautions or design considerations are not indicated. William Hugh Murray, Ernst & Young ------------------------------ Date: 16 Jan 1990 1108-PST (Tuesday) From: horning@decpa.pa.dec.com (Jim Horning) Subject: Telephone malpractice? Did you have trouble completing long-distance calls yesterday? Maybe you should sue AT&T for malpractice. Consider the following Congressional testimony (on the topic of SDI software) from Solomon J. Buchsbaum, Executive Vice President, Customer Systems, AT&T Bell Laboratories, December 3, 1985: Some critics have specifically questioned if it is possible to generate great quantities of {\it error-free} software for the system, and to ensure that it is, indeed, {\it error-free software.} This is the wrong question. ... Software is always part of a larger system that includes hardware, communications, data, procedures, and people. The right question, as well as the key issue, is the broader one of whether the total BM/C3 system can be designed to be robust and resilient in a changing and error-prone environment. The key, then, is not whether the software contains errors, but how the whole system compensates for such errors as well as for possible subsystem failures. ... Can such a large, robust, and resilient system be designed--and not only designed, but built, tested, deployed, operated and further evolved and improved? I believe the answer is yes. I seem confident of this answer because most if not all of the essential attributes of the BM/C3 system have, I believe, been demonstrated in comparable terrestrial systems. The system most applicable to the issues at hand is the U.S. Public Telecommunications Network. ... There are three keys to achieving high reliability, availability, maintainability, and adaptability. The first is the use of distributed architectures both for the entire network and for major systems within the network. The approach compartmentalizes crucial functions in modules throughout the country ... The second key is the use of redundancy, again both in the entire network and in the component systems. And the third key is the coupling together, the integration, of all the component systems by means of well-specified, well-controlled interfaces. The network as a whole is much more reliable than its individual components. That's because the network is designed to be fault tolerant. It continuously and automatically checks its own condition. When a problem is detected, it isolates the faulty component, so that the network can continue to function using a substitute or redundant component. For high availability, the public telecommunications network is designed to work at its specified level of performance even when some of its component elements are unavailable. ... ... This approach not only reduces software complexity. It also permits the fullest use of software as a strength, enhancing network flexibility and resiliency. Perhaps Dr. Buchsbaum envisioned an SDI that might for a significant fraction of a day tell over half the incoming ICBMs "We can't handle you right now, please keep trying"? Perhaps Dr. Teller thinks that the problem would go away if we just gave everyone a Brilliant Telephone? Jim H. ------------------------------ Date: Thu, 18 Jan 90 08:24:18 EST From: r.aminzade@lynx.northeastern.edu Subject: Risks of Voicemail systems that expect a human at the other end Last night my car had a dead battery (I left the lights on -- something that a very simple piece of digital circuitry could have prevented, but I digress), so I called AAA road service. I noted that they had installed a new digital routing system for phone calls. "If you are cancelling a service call press 1,if this is an inquiry about an existing service call, Press 2, if this is a new service call, Press 3." All well and good, except that when I finally reached a real operator, she informed me that the towtruck would arrive "within 90 minutes." In less than the proposed hour and a half I managed to beg jumper cables off of an innocent passerby and get the car strarted, so I decided to call AAA and cancel the service call. I dialed, pressed 1 as instructed, and waited. The reader should realize that my car was illegally parked (this is Boston), running (I wasn't going to get stuck with a dead battery again!), and had the keys in the ignition. I was not patient. I waited about four minutes, then tried again. Same result. I was now out of dimes, but I noticed that the AAA machine began its message with "we will accept your collect call..." so I decided to call collect. Surprise! I discovered that New England Telephone had just installed _its_ digital system for collect calls. It is quite sophisticated, using some kind of voice recognition circuit. The caller dials the usual 0-(phone number), and then is asked "If you wish to make a collect call, press 1...If you wish to..." Then the recording asks "please say your name." The intended recipient of the collect call then gets a call that begins "Will you accept a collect call from " I knew what was coming, but I didn't want to miss this experience. I gave my name as something like "Russell, Goddammit!," and NETs machine began asking AAAs machine if it would accept a collect call (which it had already, plain to the human ear, said it _would_ accept) from "Russell Goddammitt!". Ms. NET (why are these always female voices?) kept telling Ms. AAA "I'm sorry, I don't understand you, please answer yes or no," but Ms. AAA went blithely on with her shpiel, instructing Ms. NET which buttons to push. I stood at the phone (car still running...machines nattering away at each other) wondering who could do this episode justice. Kafka? Orwell? Groucho? I was sure that one machine or the other would eventually give up and turned things over to a human being, but, I finally decided to dial a human operator, and subject the poor woman to a stream of abuse. She connected me to AAA, where I punched 3 (rather than the appropriate but obviously malfunctioning 1), and subjected yet another underpaid clerk to my wrath. ------------------------------ Date: Fri, 19 Jan 90 14:40:17 EST From: alayne@gandalf.UUCP (Alayne McGregor) Subject: risks of vote counting High-tech vote counting unreliable, city decides In the wake of a chaotic election last year, the first Canadian city to use sophisticated electronic-voting machines is going back to counting votes the old-fashioned way. By a 13-3 vote, Toronto politicians decided yesterday to sell more than $1.6-million worth of optical scanning machines bought for the November, 1988, municipal election. That race was marred by a recount brought about by the discovery that 1,408 ballots had been improperly handled by the machines. More than a year later, the election is still not over. On Monday, a court-ordered recount is to be held in the Wards 3 and 4 separate school trustee race. [In Ontario, separate == Roman Catholic.] The council's decision flew in the face of a task force proposal, put forward before Council yesterday, to double the number of electronic voting machines, to 500 from 250, in time for the next election. It would have cost an additional $1.6-million. Not only are the machines more efficient, they are more accurate and provide quicker results at the end of the day, said Martin Silva, who headed an elections task force set up last year. "There's nothing wrong with the machines," Mr. Silva said in an interview. He attributed the election problems to the previous council's decision to buy only half the number of machines needed. Toronto was the first Canadian city to use the optical scanning machines, in a by-election before the 1988 election, said an official in the city clerk's department. It is also the first to get rid of them, he said. Other cities using the same machines include North York and Vancouver. A similar system is also used in Scarborough and Etobicoke, said Robert Clark, director of legislative services in the Toronto clerk's department. = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = A few comments: >From my experiences in working for candidates in many elections (municipal, provincial, and federal), I would say that it's not just optical scanning machines that get confused by ballots. Manual counting depends on the quality of the people doing the counting. Given that poll clerks and Deputy Returning Officers (the people who actually count the votes on election day) are patronage appointments (in Canada), I'm not too hopeful. And there are the confused rules: Here in Ottawa, we had endless recounts in one city ward election (first one candidate was declared the winner, then the other). It finally had to be settled by a new election (won convincingly by one candidate, thank heavens), after the last recount declared a tie. And what was one of the big issues in the recounts: whether a "happy face" beside a candidate was a vote for that candidate! ------------------------------ Date: 19 Jan 90 16:33:23 GMT From: djm408@tijc02.uucp (David Marks) Subject: Risks of supermarket checkout scanners The following appeared in the BUSINESS BRIEFING section of the 22 January 1990 issue of INSIGHT magazine: "Supermarket shoppers would be wise to keep a watchful eye as their purchases are passed over the checkout scanning systems used in most stores, says the New York State Department of Agriculture. The agency reported last month that inspections of 33 stores with electronic checkout showed that all but one overcharged their customers. 'It is in the shoppers' best interest to check that the prices they are being charged are those that are being advertised,' says department spokesman Gerald Moore. The problem, he adds, is that 'scanners are only as accurate as the people who program them.' The department contends that stores are not reprogramming the computerized price scanners frequently enough to reflect sales and other price changes. Inspectors posing as customers and purchasing some 150 items in each store were overcharged for 3 percent on average. The department kept no tally of whether inspectors were overcharged. Retailers contend that just as many pricing errors are made in favor of the customer and that overall accuracy oof scanning systems is superior to conventional cashier checkout. `Consumers would not accept the technology if they felt it was inaccurate,' says Peter Larkin, spokesman for the Food Marketing Institute, a grocery trade group. To ease consumer anxiety about the technology, many supermarkets offer a guarantee that backs up the accuracy with product giveaways for items consumers find overpriced by the scanner. Whether customers come out even in the end is not the issue according to Moore. `It shouldn't be a crapshoot. People should be charged waht is advertised.'" I seems to me that consumers accept the technology not because they feel it is accurate, but because it makes checkout faster. Also, where I live Kroger gives your money back on items that are not charged the price marked on the shelf. However, this still puts the burden on the customer. In the "old days," you could check the price marked on the product against the one rung up. Now you have to write down the price on the shelf and compare it at the checkout. When there are lots of people impatiently waiting in line behind you, are you really going to do that? David J. Marks M/S 3520 Texas Instruments, Johnson City, TN. 37605 ------------------------------ Date: Thu, 18 Jan 90 17:58:04 BST From: Brian Randell Subject: European R&D in Road Transportation Today's Guardian Newspaper contains an article by Rex Malik, a well-known UK commentator on the computer scene, entitled "Every Move You Make ...", discussing possible implications of two current European R&D initiatives concerning the use of computers in road transportation, DRIVE and PROMETHEUS. As I understand it, DRIVE is an EC (European Community) initiative that has been largely motivated by concerns about social and environmental impacts of road traffic, whereas PROMETHEUS is a EUREKA project (which means that it is collaboratively sponsored by various European national governments directly) that is backed mainly by the automative manufacturers. Brian Randell, Computing Laboratory, University of Newcastle upon Tyne, UK PS: Some years ago Malik authored a book entitled "And Tomorrow .. The World? Inside IBM" (Millington, London, 1975), which - how shall I put it - redressed the somewhat uncritical view of IBM to be found in Thomas J. Watson Jr's "A Business And Its Beliefs: The Ideas That helped build IBM" (McGraw-Hill, New York 1963). I recommend both - but only if read together! (This "recommendation" is not intended to have any specific explicit relevance to RISKS :-) ===== Contracts for the EC's Drive (Dedicated Road Infrastructure for Vehicle Safety in Europe) programme preliminary phase were awarded earlier this year.... ... Drive is a major EC pre-competitive research and development programme. It is almost a cousin of the EC's Esprit, and it is likely to have radical consequences. ... Drive is an attempt to work out how to use information technology to create the infra-structure needed to help reduce the EC's road accident-related death and injury figures, (currently around 55,000 of the former and 350,000 of the latter), reduce environmental pollution and improve road traffic efficiency by enmeshing the road system in a web of IT. That's the radical bit. The parallel Prometheus programme aims to add electronics - automatic guidance and navigation systems - to vehicles, creating "smart cars" permanently linked to the Drive traffic environment management control systems. The long-term aim is to transform driving and driving conditions. But when put in the context of other trends, these technical developments may have different connotations. The initial Drive contracts are for the specification phase. Proposal T416, for example, is for a "Black Box", a Vehicle Journey Data Recorder - an improved digital version of the tachographs which monitor the driving times of professional drivers. But T416 goes much further. It aims to produce a record for non-professional drivers giving the exact trajectory of a vehicle during the last 1,000 meters preceding an accident. This system is likely to operate within a road management environment in which vehicles will be linked at all times via a radio telecommunications network which will provide information about routes and traffic conditions - which, in the longer term, may over-ride systems of driver control. It could effectively turn vehicles into part of instantly assembled or disassembled what-you-might-call electronically-coupled trains. It could also provide a running record that we lawfully behave by recording in real time where and who we are when we leave our own private world and enter society's. This traffic management environment will operate across national frontiers. You thought you could get away? You thought wrong. Orwell may well have been right in intent even if he wasn't with the date, technology or politics. We are, with the best of motives - what after all could be a better motive than the saving of lives in massive numbers? - and for the greatest good of the greatest number (where have I heard that before?) creating the conditions which underlie the Orwellian nightmare. Whether we operate them in the Orwellian way is, of course, a different issue. ... ------------------------------ Date: Thu, 18 Jan 90 09:15:42 est From: Dave Horsfall Subject: Old habits die hard Taken from the "Sydney Morning Herald" 15 Jan 90: ``A [Sydney] reader recalls his time in Zimbabwe, when computer setting was installed at the country's main commercial printers. A supervisor from the hot-metal printing days had always used a mallet to jog the linotype machines back into action, and found that old habits die hard. The result? A technician flown in from Johannesburg to repair a badly bruised computer.'' [ Not so much a risk to the public from computers as a risk to the computer from the public, I guess ] Dave Horsfall (VK2KFU), Alcatel STC Australia, dave@stcns3.stc.oz.AU ------------------------------ End of RISKS-FORUM Digest 9.61 ************************