precedence: bulk Subject: Risks Digest 20.49 RISKS-LIST: Risks-Forum Digest Wednesday 21 July 1999 Volume 20 : Issue 49 FORUM ON RISKS TO THE PUBLIC IN COMPUTERS AND RELATED SYSTEMS (comp.risks) ACM Committee on Computers and Public Policy, Peter G. Neumann, moderator ***** See last item for further information, disclaimers, caveats, etc. ***** This issue is archived at and at ftp.sri.com/risks/ . Contents: Intercom hang-up caused 1997 train collision (Mark Brader) Computer-based patient monitor problems: improvements still needed (John Doyle) Statistical errors in medicine (John Doyle) Centaur/Milstar Software Error (Peter B. Ladkin) Small problem escalates into major disruption (Doug Moore) Computer startup circuits (M. Simon) Netcom partial e-mail outage (Keith A Rhodes) junkfilter vs. xxx.lanl.gov (Thomas Roessler) "Bright Light" POP-based spam filtering: a bad idea (Lauren Weinstein) E-mail attachments and local names (Avi Rubin) Ab van Poortvliet: Risks, Disasters, and Management (PGN) REVIEW: "The Mythical Man-Month", Frederick P. Brooks Jr. (Rob Slade) Abridged info on RISKS (comp.risks) ---------------------------------------------------------------------- Date: Fri, 16 Jul 1999 00:40:38 -0400 (EDT) From: msbrader@interlog.com Subject: Intercom hang-up caused 1997 train collision On November 19, 1997, two GO commuter trains collided at low speed in Toronto's central station (Union Station). One was stationary, with 800+ people on board, of whom 50+ were injured enough to be taken to hospital; the other would have been the next train out of the same track, and only two crew members were aboard. is the official report, and two things it discusses seem worth a write-up here. The first is the "dwarf signals" used for shunting moves in the station area, where it's not uncommon for trains to be sent onto an already-occupied track. When a move was authorized, the dwarf signal would show green (15 mph okay) if the track was actually clear AND the movement was in the preferred direction for that track, but otherwise yellow (15 mph max, but must be able to stop within half the distance you can see). This is all right if everyone drives according to the rules, but it means that yellow sometimes does and sometimes doesn't mean that there's a train known to be on the track. And a survey of train crews after the accident revealed that they generally did not know this. The second and I think more serious problem was the intercom. Commuter trains with diesel locomotives often have the locomotive permanently at one end, and a driving cab in the car at the other end, allowing quick reversal between trips, and that's what GO does. On this occasion the train was making a short reverse move to change tracks, and the engineer (driver) elected to stay in the locomotive and drive from the rear for the second move. The conductor was to keep a lookout ahead from the cab car. Now, there is a good deal of train radio traffic in the area, and GO trains have an intercom between the two cabs, so the crew members naturally used that instead of their radios. To start a conversation they would have to buzz the other cab for attention and wait for the other person to pick up the handset. The risk factor is that there was NO indication of when the handset had been HUNG UP again -- and no official rules for intercoms, which could have required someone to say when they were hanging up. The conductor had been on the intercom a little before he sighted the other train, which was still a safe 200 feet away, and he assumed the engineer was still listening -- so when called for a stop, he didn't buzz first! And then when he realized that the train wasn't braking, he didn't have quite enough time left to correctly decide what it was best to try next. Mark Brader, Toronto msbrader@interlog.com ------------------------------ Date: Sat, 17 Jul 1999 14:46:39 -0400 From: "Dr D John Doyle" Subject: Computer-based patient monitor problems: improvements still needed As an anesthesiologist with 30 years of computer experience I sometimes give thought to the design of the computer-based instrumentation on which my patients lives depend. For example, when I put a patient to sleep for heart surgery, I insert monitoring lines to follow cardiac filling pressures, cardiac output pressures and other parameters such as the electrical pacing and conduction properties of the heart. Usually five or more signals are displayed in real-time on a high-resolution display. This data is just as important to us as flight-related data is to an aircraft pilot, and forms the basis on deciding, for example, which drugs we administer to get out of clinical crises. Regretfully, personal clinical experience has lead me to realize that much needs to be done to make patient monitors trustworthy and ergonomically sensible. Two examples illustrate the point. I know of two computer-based monitor designs that occasionally "reboot" in the middle of surgery for no apparent reason (possibly due to electromagnetic interference from the electrosurgical apparatus, or even nearby cellular phones, but possibly also software-related), with the result that the patient is at increased risk during the reboot period (where you are "flying blind"). The situation is especially frustrating when the pressure-signal-related zero offset information is lost on reboot and the pressure transducers must all be rezeroed! Also, another monitoring system I use frequently will annunciate an "asystole" (cardiac arrest) alarm when ever the electrocardiogram signal falls below a certain amplitude. The fact that normal cardiac pressures are still being generated is ignored by the alarm management software, resulting in an obviously wrong diagnosis. This kind of haphazard and ill-conceived alarm arrangement is the reason why many of my colleagues globally disable all alarms at the beginning of surgery, so they can concentrate on taking care of the patient rather than following up on countless false alarms. So much needs to be done! D. John Doyle MD PhD FRCPC, Associate Professor, Department of Anesthesia, University of Toronto djdoyle@home.com http://doyle.ibme.utorono.ca ------------------------------ Date: Tue, 20 Jul 1999 10:52:22 PDT From: "John Doyle" Subject: Statistical errors in medicine Since physicians are not in general particularly inclined towards mathematics, some individuals have expressed concern that statistical or other analytical errors may sometimes creep into medical research reports as a result. It would appear that, in fact, this is truly a problem. Several studies of statistical errors present in medical journals have been undertaken by statistical experts; their results suggest that statistical errors are frequent in published medical research reports, even in "prestigious" journals such as The New England Journal of Medicine. For example, in a British study [1] papers published in the British Journal of Psychiatry in 1993 were reviewed. Sixty-five (40% of 164) papers containing numerical results contained statistical errors. While many errors were not serious, some were viewed to be sufficiently problematic to cast doubt on the conclusions. The author concluded that the statistical error rate was "unacceptably high." Similarly, in an American study [2] the authors reviewed all papers included in the Clinical Articles section and transactions of societies sections of the January through June 1994 issues of the American Journal of Obstetrics and Gynecology (volume 170, numbers 1 to 6). The authors concluded: "The lack of complete and detailed listings of applied statistics made it difficult to assess the appropriateness of more than half the studies examined, suggesting a need for more detailed guidelines as to the listing of statistical procedures used. Despite this fact, nearly one third of the articles contained examples of statistics used inappropriately." More recently, a study on the misuse of correlation and regression in three top-rated medical journals found serious problems [3]. The authors noted: "Fifteen categories of errors were identified of which eight were important or common. These included: failure to define clearly the relevant sample number; the display of potentially misleading scatterplots; attachment of unwarranted importance to significance levels; and the omission of confidence intervals for correlation coefficients and around regression lines." I suspect that one problem may be the availability of easy-to-use statistical software that makes no requirement that the user actually understand the underlying principles behind the tests employed. In any event, it would appear that the general standard of statistics in medical journals is shabby. Perhaps special emphasis should given to the necessity for medical journals to have proper statistical refereeing of submitted papers. Indeed, some journals, embarrassed by reports such as these, are doing exactly that. References [1] McGuigan SM The use of statistics in the British Journal of Psychiatry. Br J Psychiatry 1995 Nov;167(5):683-8 [2] Welch GE 2nd, Gabbe SG Review of statistics usage in the American Journal of Obstetrics and Gynecology. Am J Obstet Gynecol 1996 Nov;175(5):1138-41 [3] Porter AM Misuse of correlation and regression in three medical journals. J R Soc Med 1999 Mar;92(3):123-8 D. John Doyle MD PhD, Department of Anesthesia, University of Toronto djdoyle@home.com http://doyle.ibme.utoronto.ca ------------------------------ Date: Tue, 22 Jun 1999 18:43:22 +0200 From: "Peter B. Ladkin" Subject: Centaur/Milstar Software Error (Re: RISKS-20.36 and 39) As reporting in RISKS-20.36 (PGN) and 20.39 (Rhodes), software failed in a Centaur upper stage of a Titan IVB rocket, launched from Cape Canaveral on 30 Apr 1999, resulting in the loss of an $800-million Milstar satellite. *Aviation Week* (21 Jun 1999, p21) writes that Air Force officials said that the error was in the Centaur's attitude control system software. The inertial nav unit `perceived' a zero roll rate, which was incorrect, `creating attitude errors'. The attitude control system tried to correct attitude, but the incorrect software parameter `prevented the system from orienting the stage properly'. Attitude control system fuel ran out. The USAF and LM don't yet know how the software error escaped detection during verification. Let me try to interpret this. I'm afraid I won't do very well. All these control system bits, software or hardware, have inputs and outputs. An inertial nav unit (INU) calculates position only, using information from the dynamics history of the vehicle. To say it `perceived' a zero roll rate is to say either that something fed it zeros instead of correct information about roll, or that it gave information out about a roll rate of zero, or maybe both. To say that this `created attitude errors' is to say that the control system responsible for controlling the attitude (ACS) failed (`attitude errors'), and to subscribe this to faulty input from the INU (`created'). Since the ACS tried to `correct' this, it must mean that although the ACS was working on faulty information from the INU, it obtained other input (from somewhere else?) or calculated the information internally that contradicted this faulty information, and resolved the conflict in favor of this new information, firing the attitude control thrusters (I think that's what such things are called). But apparently the faulty info from the INU still exerted an influence (`incorrect .. parameter prevented the [ACS] from orienting the stage properly'). As far as I understand such things, when a sensor/info conflict is detected, a system will act on the conflict by attempting to identify (or, if not, simply arbitrate) the faulty information and lock it out. Or maybe, if it's really sophisticated, use Byzantine resolution techniques, if ways have been found of implementing these efficiently now. I don't know of any system design which, having identified a conflict, proceeds to use both pieces of conflicting information further, unless it's trying Byzantine resolution techniques, which require four participants at least. I don't see the four here, so this story, even though sparse, still doesn't make sense to me yet. And I'm loathe to ascribe truth to a story which doesn't even make sense. But I guess I'm grateful to AvWeek for trying. Prof. Peter Ladkin, Univ. Bielefeld, Technische Fakultaet D-33501 Bielefeld GERMANY +49 (0)521 106-5326/5325 http://www.rvs.uni-bielefeld.de ------------------------------ Date: Sun, 18 Jul 1999 16:51:28 -0400 From: dougmoore@ibm.net Subject: Small problem escalates into major disruption A very small fire at a major Bell switching centre caused at least 113,000 phone lines to go out in Toronto's downtown business section, including lines to poison information centres and paramedics, and stopped bank machines, electronic credit card systems, Interac, many computerized traffic lights, nearly $1 billion of electronic stock exchange trading, and several major data networks and internet portals, and disrupted phone services as far away as Ottawa, Halifax,and Chicago. 911 emergency lines had reduced capacity but did stay up. The fire was caused by a technician dropping a rod in a single switching chamber; which started to fry or melt, and smoke filled the area and sprinklers went off. (Later reports say it was electric arcs, not fire.) Power in that one part of the building was lost --- putting some phones out of service. Employees were unable to get close to turn the power back on in that area. (There was no remote power switch to do it?) According to reports in the Toronto Star, the emergency power generator system was not designed to kick in unless power was lost to the entire building, and other floors still had commercial power. (Poor planning?) According to reports, a telephone company employee cut power to the whole building so as to get the emergency power to kick in, which it did, but fifteen minutes later the telephone switching system ran out of power. (Insufficient emergency power generating capacity?) Now the entire facility was out for another four and a half hours. Follow-up, Mon, 19 Jul 1999 06:14:03 -0400 According to a report in the *Toronto Star*, Bell's repairs to its switching facility that failed in downtown Toronto could not be completed since the building is kept locked during the weekend. (Isn't security that prevents restoration of service not in fact security?) In what is said to be a separate incident, about 960,000 residents of a very nearby city found themselves cut off from 911 police emergency service. Some phones that tried to call the number were "frozen". After about 9 hours, Bell managed to "call forward" 911 calls to the non-emergency police number, and three hours later full 911 service was restored. The cause of the failure is, apparently, unknown. ------------------------------ Date: Fri, 16 Jul 1999 18:41:57 -0700 From: "M. Simon" Subject: Computer startup circuits Starting up a computer system with all its necessary hardware in a known state is a notoriously difficult problem. There are so many opportunities for errors of omision. There are many and difficult interactions of hardware and software. Because the circuits themselves are very simple the work is usually given to novices. The real problem is that so few engineers are cross trained. Few hardware engineers know software and vice versa. I blame this on unnecessarily complicated software compilers. Minimum competency in C takes six to twelve months. I can teach software competency in my favorite language in 4 to 8 hours. (I did it with some engineers in my mfg. plant about three weeks ago, so don't tell me I am smoking something) If you are interested in the name of the language e-mail me. I am not interested in starting a general language flame war. Simon ------------------------------ Date: Fri, 16 Jul 1999 08:50:48 -0500 From: "Keith A Rhodes" Subject: Netcom partial e-mail outage For about 24 hours after 7pm on 14 Jul 1999, Netcom users with user names beginning with a, d, f, h, i, k, l, n, q, r, u, or v were unable to read stored e-mail, because of a hardware failure in a file server in Netcom's San Jose data center (now part of Mindspring). On 10 May 1999, an outage affected only users with names starting with the letter "d". [Source: ZDNN (http://www.zdnet.com/zdnn/), PGN-ed] ------------------------------ Date: Mon, 19 Jul 1999 18:06:58 +0200 From: Thomas Roessler Subject: junkfilter vs. xxx.lanl.gov This is essentially a "for the record" contribution on a well-known risk: False positives with content filtering. Junkfilter (see http://www.pobox.com/~gsutter/junkfilter/) is a rather sophisticated set of procmail rules and regular expression lists designed to catch spam mail. I've been using it for a couple of months now, and it does a nice job in preventing at least some of that "make money fast" stuff from coming through to me, while usually keeping false positives negligible. Today, I started to obviously miss important messages on an internal mailing list. It turned out that junkfilter's body check module triggered on the string "http://xxx", as in "http://xxx.lanl.gov/abs/quant-ph/9603004". This URL was contained in one list member's new .signature. The underlying problem is, in this case, drawing conclusions from a (partial) URL on the content referred to by this URL. (One may also criticize drawing conclusions from content referred to in e-mail messages to such messages qualifying as "spam". Anyway, it usually works. ;-) ------------------------------ Date: Tue, 20 Jul 99 11:16 PDT From: lauren@vortex.com (Lauren Weinstein) Subject: "Bright Light" POP-based spam filtering: a bad idea Greetings. Bright Light Technologies (http://www.brightlight.com), which sports an impressive list of technology partners and investors, has introduced a new "free" service to users of POP-based e-mail (previously Bright Light has apparently mainly worked through ISPs) that attempts to filter out most unsolicited e-mail (SPAM) before it reaches the user. They do this by trying to detect spam flowing around the net and then applying filtering rules. Rejected messages are pushed aside and can be viewed later if the user wishes, and lists of rejected messages are made available. I'm a long time spam-fighter myself--I maintain a public spam blocking list at http://www.vortex.com. I'm more than willing to declare the concept of trying to filter out spam (so long as there aren't too many false positives) to be a good one. Unfortunately, the method chosen by Bright Light for end-user POP use is a potentially major invasion of privacy--ironic in light of Bright Light's written statements that they want to "avoid the appearance of violating email privacy" (exact quote). The problem doesn't take a masters degree in Internet engineering to understand. To use their new POP service, you have to route ALL of your inbound e-mail through Bright Light servers. Your POP account accesses Bright Light, then they login to your ISP to pick up your mail. It passes through Bright Light, and then to you. From both a Privacy and Risks standpoint, it's hard to imagine a system more primed for potential trouble. Any centralization of e-mail handling systems in this manner, funneling in e-mail from numerous ISPs, represents an immense target for all manner of mischief--possibly even more attractive to problems than the largest individual ISPs. Systems failures and overloading can happen. Hackers can target the facilities. And of course, the concentration of e-mail traffic could make Bright Light the recipient of choice for legal actions, by those seeking to track or access e-mail messages for any number of purposes (an increasingly popular legal maneuver, as you probably know). The requirement to provide such information could occur regardless of how little (or how much) of users' e-mail is "normally" stored on disk at the service (as opposed to passing through) in the course of routine operations. With this service, users now have two entities with which they have to entrust their e-mail--their "real" ISP, and Bright Light. Bright Light has other products that send spam filtering rules directly to ISPs with the spam blocking applied at the ISP level--such services don't present these same concerns. The fundamental problem with the new service, aimed at individual POP e-mail users, is having the full text of users' total incoming e-mail passing through a centralized third party e-mail service outside of the users' direct control or affiliation. This isn't rocket science--it should be obvious that this sort of centralization of actual e-mail traffic flow is exactly the *wrong* direction to be moving in. I'd recommend thinking long and hard before participating, as an end-user, in any third party service that asks you to route all of your incoming e-mail through them. Even with the best of intentions (and I assume these on the part of Bright Light), and even with a "free" service, the price is much too high. My RealAudio "Vortex Daily Reality Report & Unreality Trivia Quiz" for 7/19/99 (see below for URL to the archive of segments) was devoted to this topic. Take care, all. Lauren Weinstein ; Moderator, PRIVACY Forum ; Member, ACM CCPP; Host, "Vortex Daily Reality Report & Unreality Trivia Quiz" ------------------------------ Date: Tue, 20 Jul 1999 17:46:58 GMT From: rubin@research.att.com (Avi Rubin) Subject: E-mail attachments and local names I just had a mildly embarrassing incident because of a Netscape mail "feature" when sending attachments. I'm sure that other mailers do this as well. When attaching a file to an e-mail message, the local name of the file is included in the attachment. So, the recipient gets to see the contents of the file, as you would hope, but also the local name that you gave it. Imagine several possible embarrassing scenarios: Attachment: our.most.gullable.client.invoice.doc Attachment: proposal.with.doctored.data.xls Etc. It seems that the user should be able to rename the attachment for the purposes of the message or that the file should just be called attachment1, attachment2, ... Avi Rubin ------------------------------ Date: Thu, 15 Jul 99 17:29:07 PDT From: "Peter G. Neumann" Subject: Ab van Poortvliet: Risks, Disasters, and Management A really interesting book on risks in transportation: Risks, Disasters, and Management: Understanding the Management of High Risks and Its Consequences for Passenger Safety by Ab van Poortvliet Eburon Publishers, P.O. Box 2867, Delft, The Netherlands, 1999 ------------------------------ Date: Fri, 16 Jul 1999 08:33:03 -0800 From: "Rob Slade, doting grandpa of Ryan and Trevor" Subject: REVIEW: "The Mythical Man-Month", Frederick P. Brooks Jr. BKMYMAMO.RVW 990502 "The Mythical Man-Month", Frederick P. Brooks Jr., 1995, 0-201-83595-9, U$24.69 %A Frederick P. Brooks Jr. %C P.O. Box 520, 26 Prince Andrew Place, Don Mills, Ontario M3C 2T8 %D 1995 %G 0-201-83595-9 %I Addison-Wesley Publishing Co. %O U$24.69 416-447-5101 fax: 416-443-0948 bkexpress@aw.com %P 308 p. %T "The Mythical Man-Month: 20th Anniversary Edition" Brooks' work on software management is a classic, but, even with a quarter million copies in print, it is more regarded than read. A great many can quote Brooks' Law ("Adding manpower to a late software project makes it later") without knowing that it was Brooks' who said it. Which is a pity. I can fully agree with the promotional literature that says Brooks' work is "timeless." On the "influential" side, however, I have my doubts. If Brooks' truly had the influence he deserved, we would have fewer late projects. Brooks was originally writing from his experiences as project manager for IBM's System/360, and later with OS/360. It is remarkable how well his ideas have stood the test of time. While today we would long for an operating system that was "bloated" to a mere 400K in size (and I still haven't figured out the "tables" that keep being referred to), the concepts outlined for project management are as sound today as when first set down. And the objections raised against sound management and documentation were as silly in 1975 as they are today. Chapter one outlines the joy of programming, and any creative work, and how tragically that joy can be drowned in the crush of a badly managed project. Brooks shows very clearly how work can, and cannot, be partitioned in chapter two. A very interesting method of structuring a project team is given in chapter three, slightly weakened by the ship and surgical analogies which do not fully hold up in programming: software project teams are never faced with immediate life and death decisions. The problem of abstracting all "interesting" work to team leaders is examined in chapter four. Chapter five notes a tendency to try to overcompensate for prior missed opportunities, leading to software bloat. Team communication is looked at two ways in chapters six and seven. Project estimation, in chapter eight, is shown to be a poorly practiced art. Software bloat is revisited in chapter nine, and documentation in ten. Chapter eleven notes something we have lost sight of in our reliance on demo software: it isn't meant to be a mere GUI shell, but a working system that we make all the mistakes on. The need for tools, outlined in chapter twelve, is well accepted today, but the insistence on common tools would pull more than a few programmers up short. Modularity, promoted in chapter thirteen, is also praised today--but not always used. Chapter fourteen has a very solid grasp of the reasons for project slip. Documentation, of the right sort, is reviewed in chapter fifteen, including an attack on the venerable flowchart. The preceding chapters are all essentially unchanged from the original 1975 edition. Chapter sixteen is Brooks' "No Silver Bullet" paper, wherein he opined (in 1986) that programming was an inherently complex and difficult task, and that no development in the next ten years would provide a productivity increase on the level of an order of magnitude. (It is interesting that Java was unleashed upon the world at the end of that ten year projection, but, three years later, it hasn't opened any programming floodgates either.) Brooks points out objections to "NSB" in chapter seventeen, and answers them. The first edition is recast in point form in chapter eighteen. Chapter nineteen analyses what was right and enduring in the initial material, and what was wrong in the first place. An enduring classic that deserves to be read and re-read. copyright Robert M. Slade, 1999 BKMYMAMO.RVW 990502 rslade@vcn.bc.ca rslade@sprint.ca slade@victoria.tc.ca p1@canada.com http://victoria.tc.ca/techrev or http://sun.soci.niu.edu/~rslade ------------------------------ Date: 23 Sep 1998 (LAST-MODIFIED) From: RISKS-request@csl.sri.com Subject: Abridged info on RISKS (comp.risks) The RISKS Forum is a MODERATED digest. Its Usenet equivalent is comp.risks. => SUBSCRIPTIONS: PLEASE read RISKS as a newsgroup (comp.risks or equivalent) if possible and convenient for you. Alternatively, via majordomo, SEND DIRECT E-MAIL REQUESTS to with one-line, SUBSCRIBE (or UNSUBSCRIBE) [with net address if different from FROM:] or INFO [for unabridged version of RISKS information] .MIL users should contact (Dennis Rears). .UK users should contact . => The INFO file (submissions, default disclaimers, archive sites, copyright policy, PRIVACY digests, etc.) is also obtainable from http://www.CSL.sri.com/risksinfo.html ftp://www.CSL.sri.com/pub/risks.info The full info file will appear now and then in future issues. *** All contributors are assumed to have read the full info file for guidelines. *** => SUBMISSIONS: to risks@CSL.sri.com with meaningful SUBJECT: line. => ARCHIVES are available: ftp://ftp.sri.com/risks or ftp ftp.sri.comlogin anonymous[YourNetAddress]cd risks [volume-summary issues are in risks-*.00] [back volumes have their own subdirectories, e.g., "cd 19" for volume 19] or http://catless.ncl.ac.uk/Risks/VL.IS.html [i.e., VoLume, ISsue]. PostScript copy of PGN's comprehensive historical summary of one liners: illustrative.PS at ftp.sri.com/risks . ------------------------------ End of RISKS-FORUM Digest 20.49 ************************