Subject: RISKS DIGEST 10.37
REPLY-TO: risks@csl.sri.com

RISKS-LIST: RISKS-FORUM Digest  Thursday 13 September 1990   Volume 10 : Issue 37

        FORUM ON RISKS TO THE PUBLIC IN COMPUTERS AND RELATED SYSTEMS 
   ACM Committee on Computers and Public Policy, Peter G. Neumann, moderator

Contents:
  Expert system in the loop (Martyn Thomas)
  Re: Railway Safe Working - large analogue systems (Dave Parnas)
  Re: Analog vs digital reliability (Rob Sartin, David Murphy)
  The need for software certification (John H. Whitehouse)
  ZIP code correcting software (Richard W. Meyer)
  Software Bugs "stay fixed"? (Jeff Jacobs)

The RISKS Forum is moderated.  Contributions should be relevant, sound, in good
taste, objective, coherent, concise, and nonrepetitious.  Diversity is welcome.
CONTRIBUTIONS to RISKS@CSL.SRI.COM, with relevant, substantive "Subject:" line
(otherwise they may be ignored).  REQUESTS to RISKS-Request@CSL.SRI.COM.
TO FTP VOL i ISSUE j:  ftp CRVAX.sri.com<CR>login anonymous<CR>AnyNonNullPW<CR>
cd sys$user2:[risks]<CR>GET RISKS-i.j <CR>; j is TWO digits.  Vol summaries in 
risks-i.00 (j=0); "dir risks-*.*<CR>" gives directory; bye logs out.
ALL CONTRIBUTIONS ARE CONSIDERED AS PERSONAL COMMENTS; USUAL DISCLAIMERS APPLY.
The most relevant contributions may appear in the RISKS section of regular
issues of ACM SIGSOFT's SOFTWARE ENGINEERING NOTES, unless you state otherwise.

----------------------------------------------------------------------

Date: Thu, 13 Sep 90 18:23:50 BST
From: Martyn Thomas <mct@praxis.co.uk>
Subject: Expert system in the loop

According to Electronics Weekly (Sept 12th, p2):

"Ferranti will study for MoD the feasibility of integrating a
knowledge-based expert system into naval command systems, to advise
commanders in battle.

"The system would reduce the risk of mistakes [sic] because battle situations
are too complex for a command team to appreciate properly in the short time
available. [...] If a trial system is built, it will be installed in a
type-23 frigate. "

So: the battle situation is too complex to understand. The expert system is
likely to be too complex to understand, too. The commander is unlikely to
ignore the advice of the expert system, unless it is clearly perverse. This
means that the decision (say, to launch a weapon) is being taken, in
practice, by the expert system.

Is there no way we can stop this trend towards automated launch systems,
before it becomes completely uncontrollable? It would be a good subject for
an international treaty, even though it would be hard to find a way to
verify compliance. The Aegis system on the USS Vincennes led to the death of
several hundred people when a civil airbus was shot down, on a scheduled
flight, in weather conditions where the aircraft would be clearly
recognisable from the bridge of the Vincennes through binoculars. That
tragedy was ascribed to the poor user-interface of Aegis, combined with an
atmosphere of eager tension on board which made a decision to fire more
likely. How can we stop people building ever-more-complex decision-support
systems, and thereby losing their ability to take decisions themselves?

Martyn Thomas, Praxis plc, 20 Manvers Street, Bath BA1 1PX UK.
Tel:	+44-225-444700.   Email:   mct@praxis.co.uk

------------------------------

Date: Wed, 12 Sep 90 23:24:14 EDT
From: parnas@qucis.queensu.ca (Dave Parnas)
Re: Railway Safe Working - large analogue systems (Skillicorn, RISKS-10.36)

Although I found the discussion of railway safety mechanisms interesting, the
most interesting question for me was why the systems described would be
considered "analogue".  There seems to be an implicit assumption that anything
that is not computerised is "analogue".

Conventionally analogue systems are those in which there are an infinite set of
stable states, e.g.  systems of springs and weights, electrical networks
comprising resistors, inductors, capacitors, etc.  Digital systems are
constructed of components that have been designed to have a relatively small
number of stable states with the transitions between those states being so
rapid that the time spent in other states is negligible.  The most common
instances are binary, two stable states, but other numbers have been tried.
For example, old telephone switches often used relays with 10 stable states.
These were actually hybrid systems because, when the relays were in one of
their stable states, analogue signals were conducted between the two
subscribers.

Strictly speaking digital systems do not exist.  Relays, flip flops etc are
actually constructed of analogue components, only by ignoring the time in which
the circuits are changing between "stable" states.  However, the circuits are
designed so that one can ignore those times.  The transition time determines
the clock rate for those systems.  Subtle hardware bugs often occur when,
because of poor design, the transition times are ignored when they should not
have been.

The description of the railway safety systems makes them sound like digital
systems made of old-fashioned technology.  The little arms that were described
are considered to be either "up" or "down"; one neglects the time in which they
are moving between those states.  The buttons described are either depressed or
released; one neglects the time in which they are between those two positions.
The switches are designed so that they are either "on" or "off" and good
switches are designed so that we can neglect the times when they are in an "in
between" situation.  It is important not to assume that "mechanical" is 
an antonym of "digital".

Looking back at the earliest digital computers one finds that they were made of
mechanical components not unlike those described as current railway technology.
If we are going to look for examples of large analogue systems, I think we have
to look elsewhere.  The servomechanisms used in speed control on most vehicles,
and for flight surface control on aircraft, would seem better candidates.  

Of course, just as the digital components are actually approximated by analogue
elements, those analogue systems can now be approximated by digital systems.
In those approximations the number of stable states is so large, and they are
so close in some topology, that one neglects the fact that it that the number
of states is finite and neglects the "gaps" between them.  Here too, we have
a rich source of subtle bugs.

------------------------------

Date: Tue, 11 Sep 90 11:37:28 PDT
From: Rob Sartin <sartin@hplcip.hpl.hp.com>
Subject: Re: Analog vs digital failure modes and conservation laws 

>we'd today call an axiomatic/object-oriented approach:  He designed a series
>of basic primitives to manipulate things like money, accounts, and so on.  The
>primitives enforced, in a very transparent way, such basic "laws" as "the law
>of conservation of money":  Money can be neither created nor destroyed, it

One can only hope that such a system would not (or would depending
on your views on controlling the economy) be used at the Federal Reserve.

Rob Sartin, Software &Systems Lab, Hewlett-Packard  hplabs!sartin  415-857-7592

------------------------------

Date: Wed, 12 Sep 90 09:15:35 BST
From: David Murphy <dvjm@cs.glasgow.ac.uk>
Subject: Analog vs digital reliability (Wolitzky, RISKS-10.33)

> Might as well carry this nit-picking one level further.  As long as
> your computer's transistors, capacitors, or whatever rely on electrons,
> photons, or other quantum-mechanical wave/particles with discrete
> states, you are justified in considering them to be digital.  But this
> is all silly -- the implementation is irrelevant.  If you can treat
> the computer as a black box that behaves digitally, why not label it
> as such?

And therein lies the risk. One cannot always treat `digital' systems as
digital because, as many posters have pointed out, mother nature will
sometimes find ways to let you know your abstraction is unsound. One
of my favourite examples is Jefferies' ``Bifurcation to Chaos in clocked
digital systems containing autonomous timing elements'' (Phys. Let. A,
Vol 115, No 3) where a deterministic communications protocol between
two simple `digital' systems is shown to display chaos; this underlines
the point that even if you start with two systems that display nice
finite easily-abstracted behaviours their composition may not.

Another point that is often neglected is that many of the things we
assume exist in the digital world are, in fact, forbidden. It is
completely impossible to build a fair arbiter or synchroniser for
instance; the axioms that should hold for such a beast are mutually
contradictory; no continuous system can behave that way. Thus there
will come a time in every digital design when certain questions will
remain unanswered until we move outside the digitial paradigm, -
questions like `how fast will it run ?' (meaning `how fast will it
run with a failure rate small enough that my customers won't notice').
There is no doubt that asynchronous design in a clocked digital paradigm
will only work with some constraining assumptions about how quickly
those `asynchronous' signals are actually likely to appear. And since
the real world is asynchronous, -- there is no such thing as an
isolated computer, -- this means that any digital design technique is
just a way of improving engineering confidence in the product, not
of guaranteeing correctness.

------------------------------

Date: Wed, 12 Sep 90 22:47:20 -0400
From: al357@cleveland.freenet.edu (John H. Whitehouse)
Subject: The need for software certification

In recent weeks, the risks forum has seen much discussion of software
certification and of the relative risk of digital vs. analog computing.  I
would like to suggest another realm of discussion which relates to the notion
of software certification.  Specifically, I would like to see some discussion
here of the value of certifying software professionals.  In this regard, I
refer to those certification designators administered by the Institute for
Certification of Computer Professionals in Des Plaines, Illinois.
 
From its inception, our field has never been able to find enough qualified
people to meet the demand.  As a result, we have drafted people from a wide
variety of academic disciplines.  Further, many workers in our field have never
even seen the inside of a university.  In addition, very few universities have
offered majors in computer-related disciplines until "recently", i.e. in the
last 25 years.  I am not trying to condemn those people now practicing in this
field who lack a "proper" academic background.  In fact, I will openly state
that many people who do have a computer science major are unable to perform
adequately in the field.
 
Regardless of background, some people seem to have developed an adequate
understanding in this field whereas others have not.  The fact that the vast
majority of managers were either non-technical from the start or have become
non-technical over the years means that probably the vast majority of people in
this field are not receiving evaluations which reflect the actual quality of
their work and the depth of their understanding.  This cuts two ways.  First,
many poor practitioners continue to survive in the field despite their poor
performance.  Second, many excellent technical people fail to receive proper
reward for their accomplishments.
 
Hiring is expensive and usually done pretty much in the blind.  Firing is
risk-laden in our litigious society.
 
It is my contention that the vast majority of software defects are the product
of people who lack understanding of what they are doing.  These defects present
a risk to the public and the public is not prepared to assess the relative
skill level of software professionals.

For these reasons, I favor the certification of software professionals.  We
have tried to bring this about for 28 years on a voluntary basis, but those who
know that they could never pass make high sounding arguments to convince others
that voluntary certification is not a desirable goal.  Academics have not
joined in the debate since they are generally immune from the problem.

I would like to hear some discussion on this issue.

   [We have been around this topic before, with Nancy Leveson in RISKS-5.28
   and following discussion in RISKS-5.33, Appendix B of the ACARD report
   noted in RISKS-4.14, John Shore with some appropriate references in
   RISKS-4.78, and earlier discussions.  Perhaps it is time to try again.  PGN]

------------------------------

Date: Thu, 13 Sep 90 09:43:15 GMT
From: DBQABAA@CFRVM
Subject: ZIP code correcting software

Several administrative departments on our campus are interested in
purchasing software which claims to validate and correct ZIP codes.
This could be used interactively to tell an operator that he/she is
keying a ZIP code which is invalid for the given street and city
information; it could also be used in batch to "clean-up" address
data.  Apparently the US Post Office maintains massive tables of
address data which various software vendors use as the basis for
these kinds of software packages.

Does anyone have experience with this kind of software?  My concerns
would be as follows:
  1. What is the error rate with this process?
  2. What happens when additions and changes are made by the Post
     Office to their tables but the vendor has not yet gotten the
     updates out to the end user of the software?  Will the software
     keep "correcting" a ZIP code which is in fact already correct?

I've had personal experiences where some of my mail suddenly shows up
with a changed (and wrong) ZIP code.  After contacting the sender and
getting the ZIP code changed back to what it should be, I've seen it
get changed again a few months later.  It's obvious to me that they are
running a batch update on their mailing list using software that says
my ZIP code is wrong.

This seems to me to be part of a trend for people to put their faith in
"error-correcting" software which can't always tell what really needs
correcting.
                                          --- Rich Meyer

Richard W. Meyer, University Computing Services, University of South Florida,
Tampa  DBQABAA@CFRVM.BITNET

------------------------------

Date: 12 Sep 90 16:19:14 EDT
From: Jeff Jacobs <76702.456@compuserve.com>
Subject: Software Bugs "stay fixed"? (Steve Smith, RISKS-10.35)

>My experience with "bit rot", where previously solved problems reappear, is
>that they are usually caused by poor configuration control.  While most systems
>have CC tools, like sccs on UNIX or CMS on VAX/VMS, getting your friendly
>average programmer to use them is like pulling teeth.  When management insists
>on use of the tools, you will find lots of log entries of the form:
 
>Solutions?
 
The following is a description of a "solution".
 
(Way back in the early days of the TDRSS ground station network development...)
 
During my troubleshooting activities, it quickly became obvious that many of
the "show-stoppers" were not "bugs", but were a combination of procedural,
operational and communication problems.  The large number of different test
configurations, the number of different groups performing testing, and the
technical inexperience of the test and configuration management personnel
resulted in a chaotic situation.  Although a conceptually sound set of
procedures had been drafted, they were manual and paper-based, and were unable
to keep up with the level of activity.  An average of 4 days per month were
lost simply due to errors in editing command files for compiling and linking.
These were *project* months, where several dozen people would be waiting for me
to resolve a "show-stopper".  I would spend enormous amounts of time trying to
resolve what software was in use, how it was used, etc, etc.  Furthermore, it
was almost impossible for management to determine the status of fixes,
enhancements and updates to the software.
 
The situation was quite amenable to automation.  Although my initial proposal
to TRW was turned down, I proceeded to create an initial version for my own
group's use; we were making "quick-fixes" at a rapid rate, and needed the
support that the tool would provide.  The tool, called the "Fix Processor", was
completed in six weeks.  It is effectively an expert system, built using Object
Oriented and AI techniques in a Lisp-like language.  (It was described in a
prize winning paper, "Utilizing Expert Systems to Improve the Configuration
Management Process", by Sherri Sweetman, George Washington, University, for the
Project Management Institute).
 
This is **NOT** a "version control" system; it is much more complex.  Software
"fixes" (including updates and enhancements), migrated physically.  Initially,
all software resides in controlled baselines on development machines.  As
developers made changes to a given task, code was taken from the baseline and
moved to work areas.  As it went through various stages of testing and
acceptance, it transitioned to physical control of other groups.  Prior to a
delivery to WSGT, all accepted changes would be "rolled" into a new baseline
and the entire process would start over.
 
The Fix Processor was initially used by the various integration and testing
groups.  Once its effectiveness was proven, it was also propagated to the
development groups, who, although initially opposing it, soon became its
strongest advocates!!!
 
Features of the Fix Processor included:
 
- Automated the tedious, time consuming and error prone operations required to
transition a piece of software.  This includes creation of new directories,
copying files, generation of command files, etc., all of which formerly had to
be performed manually.  This was crucial to the widespread acceptance and use
of the system.
 
- Separated compiling and linking of new fixes from actual incorporation into
testing configurations.  (Tasks had to compiled and linked prior to actual
testing.  It was also sometimes necessary to remove a new version of a task
from a test configuration).
 
- An on-line reporting and query facility for determining the status of
software, usage in test configurations, etc.  This was not only a key
management tool, it also provided a means of communication between the various
integration and testing groups.  Problems which formerly took days to resolve
were literally reduced to minutes.
 
- Extensive validation of user inputs and requests.
 
- Support and tracking for special requirements, such as debugging, "quick and
dirty" fixes, etc.
 
- Automated collection of all approved changes into a new baseline, a process
which formerly required more than a week (and was incredibly error prone).
 
- Co-ordinated task transitions with the necessary paper trail.  (I
subsequently revised the paper work system).
 
- Logged all activities and results into a database.
 
The net result of these changes was a smoothly functioning, manageable project.
Mammoth turn-overs were eliminated.  Software was turned over by the developers
in small, manageable increments on a continuous basis (as opposed to the
previous "here it all is method", which would take weeks to straighten out).
 
Problems were easily identified and quickly solved ; productivity and morale
were immeasurably improved and the project became truly manageable.
 
Note that one of the key elements to the success of the FIX Processor is that
it made life *easier* for everybody involved (including me).  Note also that
the "automation" helped ensure that problems didn't recur, and it was quite
easy to verify the complete path of software.
 
Jeffrey M. Jacobs, ConsArt Systems Inc, Technology & Management Consulting
P.O. Box 3016, Manhattan Beach, CA 90266  (213)376-3802

------------------------------

End of RISKS-FORUM Digest 10.37
************************