precedence: bulk Subject: RISKS DIGEST 19.54 RISKS-LIST: Risks-Forum Digest Saturday 10 January 1998 Volume 19 : Issue 54 FORUM ON RISKS TO THE PUBLIC IN COMPUTERS AND RELATED SYSTEMS (comp.risks) ACM Committee on Computers and Public Policy, Peter G. Neumann, moderator ***** See last item for further information, disclaimers, caveats, etc. ***** Contents: China Imposes New Controls on Internet Access (Edupage) Risks of too-friendly browsers (Russell Aminzade) British Prisoners to Fix Y2K Problem (Winn Schwartau) GPS Jamming (Marcus L. Rowland) Microsoft(TM) Car (Mark C. Langston) Re: What really happened on Mars? by Glenn Reeves (Mike Jones) Re: Priority Inversion and early Unix (Greg Rose) System and Software Safety in Critical Systems - survey (Jonathan Bowen) Formal methods in industrial critical systems, call for papers (Diego Latella) Abridged info on RISKS (comp.risks) ---------------------------------------------------------------------- Date: Thu, 8 Jan 1998 13:15:53 -0500 From: Edupage Editors Subject: China Imposes New Controls on Internet Access New rules against "defaming government agencies," spreading pornography and violence, and revealing state secrets have been imposed by the Chinese government. The rules, which are said particularly to target Internet users, call for criminal punishment and fines of up to $1,800 for Internet providers and users who are found to have spread "harmful" information or leak state secrets via the Internet. In announcing the rules, China's assistant minister for public security noted that Internet links had increased China's cultural and scientific exchanges around the world, but that "the connection has also brought about some security problems, including manufacturing and publicizing harmful information, as well as leaking state secrets." (Chronicle of Higher Education 9 Jan 1998; (Edupage, 8 Jan 1998) ------------------------------ Date: Fri, 9 Jan 1998 08:22:02 -0500 From: Russell Aminzade Subject: Risks of too-friendly browsers I had a job recently developing a site for an intranet for a division of a big (Fortune 100) computer company. As usual, I was working with staff members to create the kind of web pages they wanted. One of the things they wanted was a page of links to related materials. We were on a tight schedule, and I wanted to present them with a prototype of the page to evaluate before all the information was available, so I created the "links" pages with some dummy URLs. Throughout the website, I had used the same "placeholder" for missing information so I could later do a global search across the site to make sure all the placeholders had been removed. I used "xxxx" for a placeholder, and "xxxx.htm" for the missing URLs. My assumption was that, since "xxxx.htm" was an improperly formed URL (no http://, and there was no local file called xxxx.htm), the browser would simply present an error to users when that link was clicked on. I never realized how friendly Netscape can be about resolving improperly formed URLs. It accepted xxxx.htm, and went looking for a site called www.xxxx.com. Ooops. I got a frantic phone call from the company asking if I knew that I had included links to pornographics sites on my prototype. The moral: if you must use a placeholder for URLs, include a real, dummy page on your site, and point those URLs to it. Russell Aminzade - Computing As If People Matter - 802-351-4357 aminzade@sover.net ------------------------------ Date: Fri, 09 Jan 1998 11:52:14 -0500 From: Winn@infowar.com Subject: British Prisoners to Fix Y2K Problem Is it just me, or do I smell a RISK here? Winn Schwartau ICL is considering using computer-qualified British prison inmates to fix computers susceptible to the Y2K problem. The CEO of Greenwich MeanTime suggested that by 1999, ``using prisoners will seem like a good idea and even using spotty 14-year-olds will seem reasonable.'' The head of an IT worker's union called this inappropriate at a time when ICL is already seeking to reduce its workforce. [ICL Looks To Prisoners To Tackle Year 2000 Problem, Andrew Craig, TechWeb 8 Jan 1998, PGN Stark Abstracting] Winn Schwartau V: 813.393.6600 http://www.infowar.com ------------------------------ Date: Fri, 9 Jan 1998 22:29:14 +0000 From: "Marcus L. Rowland" Subject: GPS Jamming *New Scientist* (8 Jan 1998, http://www.newscientist.com) included an article saying that a Russian company called Aviaconversia was offering a 4-watt GPS/Glonass jammer for less than $4000 at the September Moscow Air Show. It says that it could stop civilian aircraft locking onto GPS signals over a 200 Km radius; military aircraft would be harder to jam, but a more powerful unit could be built. The risks (terrorism etc.) are fairly obvious, and it's mentioned that it would probably be easy to build one even if this company's product is somehow removed from the market. Marcus L. Rowland http://www.ffutures.demon.co.uk/ ------------------------------ Date: Fri, 09 Jan 98 10:01:21 -0600 From: fugue@dura.spc.uchicago.edu Subject: Microsoft(TM) Car According to an 8 Jan 1998 UPI newswire story, Microsoft Corp. will be working in conjunction with Ford, Volkswagen, and Nissan to install Windows CE-based PCs in the dashboards of new automobiles. Quoting from the article, these 'auto PCs' would "provide voice-activated wireless communications, electronic mail, games and climate control." Visteon Automotive systems is working with Ford to be the first to roll out these new Auto PCs in a production model. According to Visteon, current car owners will be able to upgrade to an Auto PC by Summer 1998. The article goes on to quote Mike Evans, a Ford Truck official, as saying the new Auto PCs will allow drivers to be "more productive in their vehicles." The RISK? Anyone remember that old joke about "...if Microsoft made cars"? Well, it looks like it's just around the corner. And integration of engine functions with Windows can't be far behind. I can imagine the tech support conversations now: (tech) "Good morning, MS tech support. How may I help you?" (driver) "My PC is frozen, and it's showing a blue screen! Help!" (tech) "Do you have your install CD handy?" (driver) "Of course not! I'm stuck on I-90, 200 miles from anything! And now my car refuses to start!" (tech) "Well, let's see what we can do for you. Are you sure you've been using Microsoft(TM) Gas?" ... Mark C. Langston ------------------------------ Date: Fri, 9 Jan 1998 14:13:58 -0800 From: Mike Jones Subject: Re: What really happened on Mars? by Glenn Reeves (RISKS-19.49) > Date: Monday, December 15, 1997 10:28 AM > From: Glenn E Reeves > Subject: Re: [Fwd: FW: What really happened on Mars?] > > What really happened on Mars ? > >By now most of you have read Mike's (mbj@microsoft.com) summary of Dave >Wilner's comments given at the IEEE Real-Time Systems Symposium. I don't >know Mike and I didn't attend the symposium (though I really wish I had now) >and I have not talked to Dave Wilner since before the talk. However, I did >lead the software team for the Mars Pathfinder spacecraft. So, instead of >trying to find out what was said I will just tell you what happened. You >can make your own judgments. > >I sent this message out to everyone who was a recipient of Mike's original >that I had an e-mail address for. Please pass it on to anyone you sent the >first one to. Mike, I hope you will post this wherever you posted the >original. > >Since I want to make sure the problem is clearly understood I need to step >through each of the areas which contributed to the problem. > >THE HARDWARE > >The simplified view of the Mars Pathfinder hardware architecture looks like >this. A single CPU controls the spacecraft. It resides on a VME bus which >also contains interface cards for the radio, the camera, and an interface to >a 1553 bus. The 1553 bus connects to two places : The "cruise stage" part >of the spacecraft and the "lander" part of the spacecraft. The hardware on >the cruise part of the spacecraft controls thrusters, valves, a sun sensor, >and a star scanner. The hardware on the lander part provides an interface >to accelerometers, a radar altimeter, and an instrument for meteorological >science known as the ASI/MET. The hardware which we used to interface to >the 1553 bus (at both ends) was inherited from the Cassini spacecraft. This >hardware came with a specific paradigm for its usage : the software will >schedule activity at an 8 Hz rate. This **feature** dictated the >architecture of the software which controls both the 1553 bus and the >devices attached to it. > >THE SOFTWARE ARCHITECTURE > >The software to control the 1553 bus and the attached instruments was >implemented as two tasks. The first task controlled the setup of >transactions on the 1553 bus (called the bus scheduler or bc_sched task) and >the second task handled the collection of the transaction results i.e. the >data. The second task is referred to as the bc_dist (for distribution) >task. A typical timeline for the bus activity for a single cycle is shown >below. It is not to scale. This cycle was constantly repeated. > > |< ------------- .125 seconds ------------------------>| > > |<***************| |********| |**>| > > |<- bc_dist active ->| bc_sched active > |< -bus active ->| |<->| > > > ----|----------------|--------------------|--------|---|---|------- > t1 t2 t3 t4 t5 t1 > >The *** are periods when tasks other than the ones listed are executing. >Yes, there is some idle time. > >t1 - bus hardware starts via hardware control on the 8 Hz boundary. The >transactions for the this cycle had been set up by the previous execution of >the bc_sched task. >t2 - 1553 traffic is complete and the bc_dist task is awakened. >t3 - bc_dist task has completed all of the data distribution >t4 - bc_sched task is awakened to setup transactions for the next cycle >t5 - bc_sched activity is complete > >The bc_sched and bc_dist tasks check each cycle to be sure that the other >had completed its execution. The bc_sched task is the highest priority task >in the system (except for the vxWorks "tExec" task). The bc_dist is third >highest (a task controlling the entry and landing is second). All of the >tasks which perform other spacecraft functions are lower. Science >functions, such as imaging, image compression, and the ASI/MET task are >still lower. > >Data is collected from devices connected to the 1553 bus only when they are >powered. Most of the tasks in the system that access the information >collected over the 1553 do so via a double buffered shared memory mechanism >into which the bc_dist task places the latest data. The exception to this >is the ASI/MET task which is delivered its information via an interprocess >communication mechanism (IPC). The IPC mechanism uses the vxWorks pipe() >facility. Tasks wait on one or more IPC "queues" for messages to arrive. >Tasks use the select() mechanism to wait for message arrival. Multiple >queues are used when both high and lower priority messages are required. >Most of the IPC traffic in the system is not for the delivery of real-time >data. However, again, the exception to this is the use of the IPC mechanism >with the ASI/MET task. The cause of the reset on Mars was in the use and >configuration of the IPC mechanism. > >THE FAILURE > >The failure was identified by the spacecraft as a failure of the bc_dist >task to complete its execution before the bc_sched task started. The >reaction to this by the spacecraft was to reset the computer. This reset >reinitializes all of the hardware and software. It also terminates the >execution of the current ground commanded activities. No science or >engineering data is lost that has already been collected (the data in RAM is >recovered so long as power is not lost). However, the remainder of the >activities for that day were not accomplished until the next day. > >The failure turned out to be a case of priority inversion (how we discovered >this and how we fixed it are covered later). The higher priority bc_dist >task was blocked by the much lower priority ASI/MET task that was holding a >shared resource. The ASI/MET task had acquired this resource and then been >preempted by several of the medium priority tasks. When the bc_sched task >was activated, to setup the transactions for the next 1553 bus cycle, it >detected that the bc_dist task had not completed its execution. The >resource that caused this problem was a mutual exclusion semaphore used >within the select() mechanism to control access to the list of file >descriptors that the select() mechanism was to wait on. > >The select mechanism creates a mutual exclusion semaphore to protect the >"wait list" of file descriptors for those devices which support select. The >vxWorks pipe() mechanism is such a device and the IPC mechanism we used is >based on using pipes. The ASI/MET task had called select, which had called >pipeIoctl(), which had called selNodeAdd(), which was in the process of >giving the mutex semaphore. The ASI/ MET task was preempted and semGive() >was not completed. Several medium priority tasks ran until the bc_dist task >was activated. The bc_dist task attempted to send the newest ASI/MET data >via the IPC mechanism which called pipeWrite(). pipeWrite() blocked, taking >the mutex semaphore. More of the medium priority tasks ran, still not >allowing the ASI/MET task to run, until the bc_sched task was awakened. At >that point, the bc_sched task determined that the bc_dist task had not >completed its cycle (a hard deadline in the system) and declared the error >that initiated the reset. > >HOW WE FOUND IT > >The software that flies on Mars Pathfinder has several debug features within >it that are used in the lab but are not used on the flight spacecraft (not >used because some of them produce more information than we can send back to >Earth). These features were not "fortuitously" left enabled but remain in >the software by design. We strongly believe in the "test what you fly and >fly what you test" philosophy. > >One of these tools is a trace/log facility which was originally developed to >find a bug in an early version of the vxWorks port (Wind River ported >vxWorks to the RS6000 processor for us for this mission). This trace/log >facility was built by David Cummings who was one of the software engineers >on the task. Lisa Stanley, of Wind River, took this facility and >instrumented the pipe services, msgQ services, interrupt handling, select >services, and the tExec task. The facility initializes at startup and >continues to collect data (in ring buffers) until told to stop. The >facility produces a voluminous dump of information when asked. > >After the problem occurred on Mars we did run the same set of activities >over and over again in the lab. The bc_sched was already coded so as to >stop the trace/log collection and dump the data (even though we knew we >could not get the dump in flight) for this error. So, when we went into the >lab to test it we did not have to change the software. > >In less that 18 hours we were able to cause the problem to occur. Once we >were able to reproduce the failure the priority inversion problem was >obvious. > >HOW WAS THE PROBLEM CORRECTED > >Once we understood the problem the fix appeared obvious : change the >creation flags for the semaphore so as to enable the priority inheritance. >The Wind River folks, for many of their services, supply global >configuration variables for parameters such as the "options" parameter for >the semMCreate used by the select service (although this is not documented >and those who do not have vxWorks source code or have not studied the source >code might be unaware of this feature). However, the fix is not so obvious >for several reasons : > >1) The code for this is in the selectLib() and is common for all device >creations. When you change this global variable all of the select >semaphores created after that point will be created with the new options. >There was no easy way in our initialization logic to only modify the >semaphore associated with the pipe used for bc_dist task to ASI/MET task >communications. > >2) If we make this change, and it is applied on a global basis, how will >this change the behavior of the rest of the system ? > >3) The priority inversion option was deliberately left out by Wind River in >the default selectLib() service for optimum performance. How will >performance degrade if we turn the priority inversion on ? > >4) Was there some intrinsic behavior of the select mechanism itself that >would change if the priority inversion was enabled ? > >We did end up modifying the global variable to include the priority >inversion. This corrected the problem. We asked Wind River to analyze the >potential impacts for (3) and (4). They concluded that the performance >impact would be minimal and that the behavior of select() would not change >so long as there was always only one task waiting for any particular file >descriptor. This is true in our system. I believe that the debate at Wind >River still continues on whether the priority inversion option should be on >as the default. For (1) and (2) the change did alter the characteristics of >all of the select semaphores. We concluded, both by analysis and test, that >there was no adverse behavior. We tested the system extensively before we >changed the software on the spacecraft. > >HOW WE CHANGED THE SOFTWARE ON THE SPACECRAFT > >No, we did not use the vxWorks shell to change the software (although the >shell is usable on the spacecraft). The process of "patching" the software >on the spacecraft is a specialized process. It involves sending the >differences between what you have onboard and what you want (and have on >Earth) to the spacecraft. Custom software on the spacecraft (with a whole >bunch of validation) modifies the onboard copy. If you want more info you >can send me e-mail. > >WHY DIDN'T WE CATCH IT BEFORE LAUNCH ? > >The problem would only manifest itself when ASI/MET data was being collected >and intermediate tasks were heavily loaded. Our before launch testing was >limited to the "best case" high data rates and science activities. The fact >that data rates from the surface were higher than anticipated and the amount >of science activities proportionally greater served to aggravate the >problem. We did not expect nor test the "better than we could have ever >imagined" case. > >HUMAN NATURE, DEADLINE PRESSURES > >We did see the problem before landing but could not get it to repeat when we >tried to track it down. It was not forgotten nor was it deemed unimportant. > >Yes, we were concentrating heavily on the entry and landing software. Yes, >we considered this problem lower priority. Yes, we would have liked to have >everything perfect before landing. However, I don't see any problem here >other than we ran out of time to get the lower priority issues completed. > >We did have one other thing on our side; we knew how robust our system was >because that is the way we designed it. > >We knew that if this problem occurred we would reset. We built in >mechanisms to recover the current activity so that there would be no >interruptions in the science data (although this wasn't used until later in >the landed mission). We built in the ability (and tested it) to go through >multiple resets while we were going through the Martian atmosphere. We >designed the software to recover from radiation induced errors in the memory >or the processor. The spacecraft would have even done a 60 day mission on >its own, including deploying the rover, if the radio receiver had broken >when we landed. There are a large number of safeguards in the system to >ensure robust, continued operation in the event of a failure of this type. >These safeguards allowed us to designate problems of this nature as lower >priority. > >We had our priorities right. > >ANALYSIS AND LESSONS > >Did we (the JPL team) make an error in assuming how the select/pipe >mechanism would work ? Yes, probably. But there was no conscious decision >to not have the priority inversion enabled. We just missed it. There are >several other places in the flight software where similar protection is >required for critical data structures and the semaphores do have priority >inversion protection. A good lesson when you fly COTS stuff - make sure you >know how it works. > >Mike is quite correct in saying that we could not have figured this out >**ever** if we did not have the tools to give us the insight. We built many >of the tools into the software for exactly this type of problem. We always >planned to leave them in. In fact, the shell (and the stdout stream) were >very useful the entire mission. If you want more detail send me a note. > >SETTING THE RECORD STRAIGHT > >First, I want to make sure that everyone understands how I feel in regard to >Wind River. These folks did a fantastic job for us. They were enthusiastic >and supported us when we came to them and asked them to do an affordable >port of vxWorks. They delivered the alpha version in 3 months. When we had >a problem they put some of the brightest engineers I have ever worked with >on the problem. Our communication with them was fantastic. If they had not >done such a professional job the Mars Pathfinder mission would not have been >the success that it is. > >Second, Dave Wilner did talk to me about this problem before he gave his >talk. I could not find my notes where I had detailed the description of the >problem. So, I winged it and I sure did get it wrong. Sorry Dave. > >ACKNOWLEDGMENTS > >First, thanks to Mike for writing a very nice description of the talk. I >think I have had probably 400 people send me copies. You gave me the push >to write the part of the Mars Pathfinder End-of-Mission report that I had >been procrastinating doing. > >Special thanks to Steve Stolper for helping me do this. The biggest thanks >should go to the software team that I had the privilege of leading and whose >expertise allowed us to succeed: Pam Yoshioka, Dave Cummings, Don Meyer, >Karl Schneider, Greg Welz, Rick Achatz, Kim Gostelow, Dave Smyth, >Steve Stolper. Also, Miguel San Martin, Sam Sirlin, Brian Lazara (WRS), >Mike Deliman (WRS), Lisa Stanley (WRS) > >Glenn Reeves, Mars Pathfinder Flight Software Cognizant Engineer >glenn.e.reeves@jpl.nasa.gov ------------------------------ Date: Fri, 09 Jan 1998 08:11:19 +1100 From: Greg Rose Subject: Re: Priority Inversion and early Unix "Fred B. Schneider" , in "What really happened on Mars Rover Pathfinder (Mike Jones, R-19.49)" R-19.53, mentions Lampson and Redell's 1980 paper. John Lions' 1977 book "Commentary on UNIX 6th Edition", which contains the source code for this early version of Unix, and an extremely lucid examination of that code, has recently been republished (Peer-to-Peer Communications, ISBN 1-57398-013-7). Lions (p8-4) comments: "Some critical sections of code are executed by interrupt handlers. To protect other sections of code whose outcome may be affected by the handling of certain interrupts, the processor priority is raised temporarily high enough before the critical section is entered to delay such interrupts until it is safe, when the processor priority is reduced again. There are of course a number of conventions which interrupt handling code should observe ..." The code to which this refers was of the form: oldpri = spl7(); /* critical region here */ spl(oldpri); The processor priorities of the PDP-11 were allowed to dictate this structure to some extent; priority 7 was the highest, and disabled all interrupts. There was no attempt to set the priority to the lowest necessary, as the lack of different levels (0, 4, 5, 6, 7 for device interrupts) meant that 7 was almost always the lowest such level, although in some cases spl6() was used analogously, that being the clock interrupt priority. Most local critical regions, such as device drivers, used the simpler: spl5(); ... spl0(); (Here I have used 5 as an example of a known interrupt priority for this device. All "normal" code ran at a processor priority level of 0, with a separate software priority and context switching mechanism.) Clearly Thompson and Ritchie had thought about and found a way to solve the priority inversion problem for the limited case of Unix on a single CPU. Their code predates Lions' commentary by a couple of years at least (1975 or earlier); I don't know exactly which version of Unix introduced that construction. Greg Rose, QUALCOMM Australia, 6 Kingston Avenue, Mortlake NSW 2137 Australia http://people.qualcomm.com/ggr/ ggr@qualcomm.com +61-2-9743 4646 ------------------------------ Date: Thu, 8 Jan 1998 15:09:40 GMT From: J.P.Bowen@reading.ac.uk (Jonathan Bowen) Subject: System and Software Safety in Critical Systems - survey The following Technical Report may be of interest to RISKS readers: System and Software Safety in Critical Systems, Ulla Isaksen, Jonathan Bowen and Nimal Nissanke. Technical Report RUCS/97/TR/062/A, Department of Computer Science, The University of Reading, UK, 1997. URL: ftp://ftp.cs.reading.ac.uk/pub/formal/jpb/scs-survey.ps.Z (compressed PostScript format - omitting the trailing ".Z" gives uncompressed PostScript) Jonathan Bowen, Univ. of Reading, CompSci, Whiteknights, PO Box 225, Reading, Berks RG6 6AY, England +44-118-931-6544 http://www.cs.rdg.ac.uk/people/jpb/ ------------------------------ Date: Wed, 7 Jan 1998 14:53:27 +0100 (MET) From: Diego Latella Subject: Formal methods in industrial critical systems, call for papers Journal of Science of Computing Programming Editor-in-Chief: Prof. Michel Sintzoff SPECIAL ISSUE ON The Application of Formal Methods in Industrial Critical Systems Guest Editors: Jorge Cuellar, Stefania Gnesi, Diego Latella The Journal of Science of Computing Programming has planned a special issue on the use of formal methods in the industry for the development of critical systems. The aim of this special issue is to provide a forum for people interested in the development and use of formal methods in the industry. In particular, this special issue should create a link between scientists active in the area of formal methods willing to communicate their experience in the real industrial usage of these methods and people from industry that are interested to include the usage of formal methods inside their development methodologies. This special issue is promoted by the Working Group on Formal Methods for Industrial Critical Systems of the European Research Consortium on Informatics and Mathematics (ERCIM - http://www-ercim.inria.fr/). Deadline for submission: 30 Apr 1998. For instructions, contact S. Gnesi, CNR - Ist. Elaborazione dell'Informazione, Via S. Maria 46, I56126 Pisa - ITALY, phone +39-50-593489, fax +39-50-554342 gnesi@iei.pi.cnr.it ------------------------------ Date: 1 Apr 1997 (LAST-MODIFIED) From: RISKS-request@csl.sri.com Subject: Abridged info on RISKS (comp.risks) The RISKS Forum is a MODERATED digest. Its Usenet equivalent is comp.risks. => SUBSCRIPTIONS: PLEASE read RISKS as a newsgroup (comp.risks or equivalent) if possible and convenient for you. Or use Bitnet LISTSERV. Alternatively, (via majordomo) DIRECT REQUESTS to with one-line, SUBSCRIBE (or UNSUBSCRIBE) [with net address if different from FROM:] or INFO [for unabridged version of RISKS information] => The INFO file (submissions, default disclaimers, archive sites, .mil/.uk subscribers, copyright policy, PRIVACY digests, etc.) is also obtainable from http://www.CSL.sri.com/risksinfo.html ftp://www.CSL.sri.com/pub/risks.info The full info file will appear now and then in future issues. *** All contributors are assumed to have read the full info file for guidelines. *** => SUBMISSIONS: to risks@CSL.sri.com with meaningful SUBJECT: line. => ARCHIVES are available: ftp://ftp.sri.com/risks or ftp ftp.sri.comlogin anonymous[YourNetAddress]cd risks [volume-summary issues are in risks-*.00] [back volumes have their own subdirectories, e.g., "cd 18" for volume 18] or http://catless.ncl.ac.uk/Risks/VL.IS.html [i.e., VoLume, ISsue]. The ftp.sri.com site risks directory also contains the most recent PostScript copy of PGN's comprehensive historical summary of one liners: get illustrative.PS ------------------------------ End of RISKS-FORUM Digest 19.54 ************************