Subject: RISKS DIGEST 11.30
REPLY-TO: risks@csl.sri.com

RISKS-LIST: RISKS-FORUM Digest  Monday 18 March 1991  Volume 11 : Issue 30

        FORUM ON RISKS TO THE PUBLIC IN COMPUTERS AND RELATED SYSTEMS 
   ACM Committee on Computers and Public Policy, Peter G. Neumann, moderator

  Contents:
"The Trigger Effect" and Coke robot... (Dwight D. McKay)
Strange numbers on your beeper (Esther Filderman)
Re: "What the laws enforce" [RTM] (TK0JUT1)
Voice Recognition Experiment (Dave Turner)
`Sendsys' forgery - denial of service? (Doug Sewell)
Re: Medical privacy and urine testing (Alan Wexelblat)
Re: Long-lived bugs (Pete Mellor)
A cautionary tale [long] (John DeTreville) [On the Midway in a 3-ring SRCus?]

 The RISKS Forum is moderated.  Contributions should be relevant, sound, in 
 good taste, objective, coherent, concise, and nonrepetitious.  Diversity is
 welcome.  CONTRIBUTIONS to RISKS@CSL.SRI.COM, with relevant, substantive 
 "Subject:" line.  Others ignored!  REQUESTS to RISKS-Request@CSL.SRI.COM.  For
 vol i issue j, type "FTP CRVAX.SRI.COM<CR>login anonymous<CR>AnyNonNullPW<CR>
 CD RISKS:<CR>GET RISKS-i.j<CR>" (where i=1 to 11, j always TWO digits).  Vol i
 summaries in j=00; "dir risks-*.*<CR>" gives directory; "bye<CR>" logs out.
 ALL CONTRIBUTIONS CONSIDERED AS PERSONAL COMMENTS; USUAL DISCLAIMERS APPLY.
 Relevant contributions may appear in the RISKS section of regular issues
 of ACM SIGSOFT's SOFTWARE ENGINEERING NOTES, unless you state otherwise.

----------------------------------------------------------------------

Date: Fri, 15 Mar 1991 13:33:26 -0500 (EST)
From: "Dwight D. McKay" <mckay@ecn.purdue.edu>
Subject: "The Trigger Effect" and Coke robot...

Having just gotten into work after being stranded at home with no power
for two days due to ice storm here in the midwest, I am reminded of the
reliance we all place on basic services.

While I've not lost phone service (thank you AllTel!) nor Gas, but I had no
electricity.  This meant I had no heat as my furnace needs electricity to sense
temperature, run the air circulation fan and even start the gas burning (it's
pilot light-less).  Our kitchen is all electric so that was out, and so on.

Even when power was restored, the ordeal was not all over.  How many clocks,
and embedded computers do you have around your house?  I had to replace half a
dozen "backup" batteries, reset various devices which have no memory without
power, etc.

A very worthwhile description of this "technology trap" we are placed in my
depending on basic services like electricity is episode 1, "The Trigger Effect"
of James Burke's PBS series "Connections".  It covers in fairly good detail the
sequence of events and problems caused by the early 60's east coast blackout.
I'd recommend it as good video for Risks readers to watch or to show to others.
The video has started some very interesting conversations concerning the risks
of high technology with everyone I've shown it to.

BTW - Have any of the rest of you seen the drink dispensing robot Hardee's has
in some stores now?  It appears to be directly tied into the same network as
their cash registers and fills drink orders while the cashier takes your money.
I can see it now, "Sorry, we cannot give you a drink right now, our computer is
down."  Sigh...

Dwight D. McKay, Purdue University, Engineering Computer Network  (317) 494-3561
mckay@ecn.purdue.edu --or--  ...rutgers!pur-ee!mckay

------------------------------

Date: Fri, 15 Mar 91 16:29:12 -0500 (EST)
From: Esther Filderman <ef1c+@andrew.cmu.edu>
Subject: Strange numbers on your beeper

The article about the beeper scam reminded me of something that occured
to me two weeks ago.

When my beeper went off in the middle of a Saturday afternoon I was not
phased by the strange number that appeared, figuring that it was a
coworker calling from home.  When I called the number I got a real
surprise:  I reached US Air's pilot scheduling number!

The person I spoke with told me that the database of beeper numbers was very
out of date.  When I mentioned that I had had my beeper for over six months she
responded that she had once called a number a year out of date.

Meanwhile, some poor pilot was wondering when her/his next flight was....

Esther C. Filderman, System Manager, Mercury Project.  Computing Services, 
Carnegie Mellon University                            ef1c+@andrew.cmu.edu

------------------------------

Date:    Fri, 15 Mar 91 15:47 CST
From: TK0JUT1@NIU.BITNET
Subject: Re: "What the laws enforce" [RTM] (RISKS-11.29)

I rather liked PGN's comment that "that there is still a significant gap
between what it is thought the laws enforce and what computer systems actually
enforce." It's parsimonious and incisive. I interpreted it to mean simply that
the law has not caught up to changing technology, and old, comfortable legal
metaphors are inappropriately applied to new, qualitatively different
conditions. Calling simple computer trespass (even if files are perused) a
heavy-duty felony subjecting the offender to many years in prison does not
seem productive. I may walk on your grass and pick your flowers, even if there
is a prohibitive sign. But, it is unlikely there would be a prosecution
(informal sanctions, yes, but not a prosecution), and if there were, it would
unlikely be a highly publicized felony that subjects me to federal felony
charges, even though an ecological federal interest might be claimed.

The point seems to be that emerging computer laws are archaic.  Neither those
who write the laws nor those who implement them have a clear understanding of
what is involved or at stake.  When mere possession (not use, but possession)
of "forbidden knowledge" can be a felony (as it is in California), we must
begin to question what the law thinks it's enforcing. One can oppose trespass
while simultaneously opposing Draconian attempts to stomp on those who tread
tackily in our once-pastoral fields.  And, at the moment, I suggest that it's
law enforcement agents who are the greatest danger to the computer world, not
hackers.  Why?  Because "there is still a significant gap between what it is
thought the laws enforce and what computer systems actually enforce."

                                                         [Thanks!  PGN]

------------------------------

Date: Fri, 15 Mar 91 15:49:44 PST
From: dmturne@ptsfa.pacbell.com (Dave Turner)
Subject: Voice Recognition Experiment

The following was excerpted from comp.dcom.telecom. Although it appears
to be a legitimate study, the unscrupulous could reap vast rewards.

>The Oregon Graduate Institute of Science and Technology is building a
>huge database of voices as part of a project to develop voice
>recognition for US West directory assistance.
>
>They want to be able to classify sounds according to regional
>differences, and they need thousands of samples of speech to do this.
>
>Call 800-441-1037 (I assume this is nationwide ... it may not be) and
>follow the voice prompts.  They will ask your last name, where you are
>calling from, and where you grew up, and then ask you to pronounce
>several words and recite the alphabet.

This could be used for vocal forgery.

By combining the requested words, alphabet and, possibly, numbers a
digital vocabulary could be produced for everyone who participated
in the study. Once this is available, a "bad guy" could use it to place
phone calls using anyone's digital voice. If the hardware were fast enough,
the called party could be fooled into believing that he/she is talking
to the individual whose voice is being used. The addition of credit card
numbers and expiration dates for each "voice" will allow fraud that is
hard to dispute; after all, it's your word (voice) against his. 

Including your name, location and other personal information in this
study could be a big mistake.

This sort of risk is made easier by duping people to provide samples of
their voices but a determined "bad guy" could obtain the same information
by recording a ordinary phone call and processing the data later.

------------------------------

Date: Friday, 15 Mar 1991 20:25:20 EST
From: Doug Sewell <DOUG@YSUB.YSU.EDU>
Subject: `Sendsys' forgery - denial of service?

I don't get many SENDSYS requests in control, so they tend to stick out.  I've
also learned by experience that even limited hierarchy or limited distribution
will result in a disruptive amount of e-mail (how did I know that over 400
sites got some bit.* - I suspected 100, tops).  Many of them are big (UUNET's
was several thousand lines long), and they trickle in for days.

Having said this, one I got today stuck out as being rather unusual.

Someone forged a sendsys to rec.aquaria, misc.test, and alt.flame, in
the name of one of the 'celebrities' in those circles.  Distribution
was unlimited.  This type of prank amounts to a significant denial-
of-service attack, IMHO.  In this case, it may also mean bodily injury
for the perpetrator, if he's caught.

(If you want to know who, go look in alt.flame).

Doug Sewell, Tech Support, Computer Center, Youngstown State University,
Youngstown, OH 44555        doug@ysub.bitnet       doug@ysub.ysu.edu

------------------------------

Date: Mon, 18 Mar 91 13:42:25 est
From: wex@PWS.BULL.COM
Subject: Re: Medical privacy and urine testing (Larry Nathanson, RISKS-11.29)

The issues surrounding urine testing are something I have been researching
heavily for over a year, more as they began to affect my life more extensively.
While I generally agree with Nathanson's assertions, he does make one important
error:

  (drug testing is generally not part of your record)

This is true some of the time, but misleading.  The discussion revolves
around privacy and one of the concerns about urine testing is that testing
agencies (gov't, companies and the military) generally require you to sign a
form detailing any and all prescription medications you are taking.  In many
cases, the testing agencies require the testee to produce the actual
prescriptions, and may call the prescribing doctor to confirm the validity
of the prescriptions.

This information is clearly part of your medical record and it seems an
invasion of privacy to require the employee to reveal that s/he is taking
{birth control pills, AZT, insulin, anti-depressants, etc.}.

In each case, access to prescription information reveals an enormous amount
of medical information which is customarily assumed to be private.

--Alan Wexelblat			phone: (508)294-7485
Bull Worldwide Information Systems	internet: wex@pws.bull.com

------------------------------

Date: Mon, 18 Mar 91 10:30:54 PST
From: Pete Mellor <pm@cs.city.ac.uk>
Subject: Long-lived bugs

Jerry Bakin's item in RISKS-11.29 <Jerry@ymir.claremont.edu> about the 25
year-old known bug reminded me of some stories about fairly ancient unknown
bugs.

I was told by a colleague, who was a computer engineer, about a UK site which
required its operating system to be enormously reliable. (They were so highly
secret that I was not supposed to know that they existed, so he couldn't
provide much in the way of supporting detail.) They had learned the hard way
that each new version brought with it its own crop of new bugs, and so had
stayed resolutely out of date for many years. Running a stable job mix and not
updating, they eventually achieved 4 years of failure-free running. At the end
of that time, a new, serious, bug was discovered. This had lain dormant all
that time.

The Air-Traffic Control system at West Drayton has recently been replaced.
The previous system had been in use for many years. A software engineer who
had studied this system told us that a new bug was recently discovered in
a piece of COBOL code which had not been changed for 20 years.

Such anecdotes could be dismissed, except that they are supported by careful
research. E.N. Adams in "Optimizing preventive service of software 
products", IBM Research J., 28, (1), pp 2-14, 1984, describes investigations
into the times to detection of bugs in a widely used operating system. He
found that over 30% of all bugs reported caused a failure on average only once 
every 5000 running years.

Peter Mellor, Centre for Software Reliability, City University, Northampton Sq.,
London EC1V 0HB +44(0)71-253-4399 Ext. 4162/3/1 p.mellor@uk.ac.city (JANET)

------------------------------

Date: Mon, 18 Mar 91 15:59:57 PST
From: jdd@src.dec.com (John DeTreville)
Subject: A cautionary tale [long]            [On the Midway in a 3-ring SRCus?] 

This is a cautionary tale about a software failure that RISKS readers might
find interesting.  I wrote down this description soon after the failure, in as
much detail as I could, because it made such an interesting story.  I've listed
some possible lessons at the end, and readers are welcome to add their own.

Around 5:00 p.m. on Friday, February 16, 1990, much of the distributed
environment at Digital's Systems Research Center (SRC) became unavailable.
Although no machines crashed, most user operations failed with authentication
errors and users could get almost no work done.  Some quick diagnostic work
determined the problem: the contents of the distributed name service had become
corrupted.  Lengthier detective work determined the long sequence of accidents
that caused the corruption.

I should point out to start that at no point during this episode did the name
service itself fail.  The design and implementation of the name service were
both quite solid.  All the failures were elsewhere, although they manifested
themselves in the name service.  SRC is purposely dependent on our distributed
name service because it has numerous practical advantages over the
alternatives, and because it has given us very reliable service over an
extended period.  (Failures of unreliable systems aren't very instructive!)

First, some necessary background.  SRC's research software environment is
called Topaz.  Topaz can run stand-alone or layered on top of Ultrix, Digital's
product version of Unix.  We built Topaz at SRC, and while the research ideas
that we test in Topaz may influence Digital's product directions, Topaz is not
a production system.  Once every year or two, SRC exports snapshots of the
Topaz environment to a few universities that we maintain close ties with.  We
collect the components of an export release, then bring the snapshot up on an
isolated testbed and verify that its elements work together and do not
accidentally depend on anything not in the release.

Part of the information in SRC's name service is the user data traditionally
stored in /etc/passwd.  For administrative convenience, we still maintain
/etc/passwd files, and although Topaz accesses the name service instead of
/etc/passwd, administrative daemons track any changes in /etc/passwd (via a
"dailyUpdate" script).  For example, if users leave Digital, administrative
procedures delete them from /etc/passwd; once dailyUpdate runs, all mention of
them is removed from the name service.

On with the story.  A few months before, Lucille Glassman had built our most
recent export snapshot.  To test it, she put together a testbed environment in
the machine room, using a small VAX named "midway" as an Ultrix-based Topaz
server machine.

The export testbed ran on a small Ethernet disconnected from SRC's main
network.  The testbed environment had its own name service, and midway had its
own /etc/passwd.  Midway's /etc/passwd wasn't very large--about a dozen
users--and so its name service didn't hold many names.  But that was
intentional; it was just a testbed.

Since the testbed environment was disconnected from SRC's main network,
software snapshots were brought over to midway via a disk that was dual-ported
between midway and bigtop, a large Ultrix server machine on SRC's main network.
The disk appeared in each system's /etc/fstab (the file system table); it was
moved from system to system using the /etc/mount and /etc/umount commands.
Lucille would mount the disk on bigtop, copy /proj to the disk (/proj holds the
Topaz environment), then unmount it from bigtop and mount it on midway as
/proj.

Later, after the export was completed and the tapes had been sent out, Richard
Schedler and Lucille did some cleanup on midway.  They turned off its Topaz
servers, including the name server.  They also edited midway's crontab, which
runs various commands at various times, not to run dailyUpdate; there was no
need for it.  But they didn't reconnect midway to the network as an ordinary
Ultrix machine; they left it isolated on the testbed network.

Here comes the amusing part.  It turns out that on the version of Ultrix
running on midway, /usr/lib/crontab is a symbolic link to /etc/crontab.  The
cron daemon reads from /usr/lib/crontab, but the file physically resides in
/etc/crontab.  Knowing this, Richard and Lucille edited /etc/crontab to remove
the call to dailyUpdate.

The first thing that went wrong was that, at some point earlier, this symbolic
link had been broken, and /usr/lib/crontab had been replaced with a copy of
/etc/crontab.  Most people at SRC use the Ivy text editor, which runs as a
single server per user, creating a window per file.  Since Ivy runs with the
user's permissions, you can't use it to edit files like /usr/lib/crontab, which
you can't ordinarily write.  Users get around this limitation by editing
copies, then moving the copies back as super-user.  This is an error-prone
operation, and we believe that at some time someone fumble-fingered the last
step.

So when Richard and Lucille edited /etc/crontab, it had no real effect; cron
kept on using the old /usr/lib/crontab.  Every day, midway ran dailyUpdate.
But dailyUpdate tried to run a program from the Topaz environment on /proj, and
the dual-ported disk holding /proj had been claimed by bigtop, so midway
couldn't access it, and dailyUpdate silently failed every day.  Also, midway
was still disconnected from the network.

The second thing that went wrong was that midway got reconnected to the
network.  Someone threw a local/remote switch on a DELNI Ethernet connector.
This happened some time in the previous few months.  (A few months after this
writeup circulated internally, we found out what had happened; someone had been
fixing a broken workstation on the testbed network, and tested it by rejoining
the networks rather than moving the workstation's network connection.)

On Friday, 2/16/90, at 11:00 a.m., SRC had a power failure during heavy rains.
This was the third thing to go wrong.  When power came back, bigtop and midway
both came up.  Midway came up faster, being a smaller system.  This was the
first time in months that midway had booted while bigtop was down, and midway
got to claim the dual-ported disk.

Friday at 5:00 p.m., midway successfully ran dailyUpdate.  It contacted SRC's
name service, and made the name service contents consistent with its
abbreviated /etc/passwd.  Soon afterwards, when the authentication caches
expired on their workstations, most people found themselves unable to do
anything that required authentication: log in, run programs, even log out.

(Richard and Lucille and I didn't notice anything wrong at first, because we
were listed in midway's /etc/passwd.  But Lucille received mail from the name
service saying that root@midway had just made a bunch of changes, and I got
calls from people asking, "What does it mean when it says, `Not owner'?")

So Lucille and I went to the machine room (dodging the person installing some
new locks), and looked at midway's /etc/crontab.  Everything looked fine; no
mention of dailyUpdate there.  (It was much later we discovered the call to
dailyUpdate was still in /usr/lib/crontab).  Although we didn't know what had
made dailyUpdate run, Lucille rethrew the DELNI switch to isolate midway from
the network so it couldn't happen again.

(If the person installing new locks had been a little ahead of schedule, we
probably wouldn't have been able to get into the machine room, since we didn't
have keys yet.)

Lucille then ran dailyUpdate against a real copy of /etc/passwd, to get things
to the point where everyone could log in.  She discovered that there's an upper
bound to the number of additions that dailyUpdate can make to the name service
at once.  This had never been a problem before, but it was a problem now.
(Midway's dailyUpdate didn't have any problem with the same number of
deletions.)  Lucille finally coaxed dailyUpdate to run.  Unfortunately,
restoring information isn't as easy as deleting it, and even with a lot of hand
editing, things still weren't great at 7:00 p.m., when Lucille and I both had
to leave.

Richard had left long before, as had Andrew Birrell, our main name server
expert, but Lucille sent them mail explaining what had happened, and asking
whether they could fix it.  Ted Wobber and Andy Hisgen, two other name server
experts, were both out of town for the weekend.

When I got back at 10:30 p.m., I found the mail system was broken, probably as
a result of the name service problems, so Lucille's mail hadn't been delivered
and no one had done anything since 7:00 p.m.  (The file system holding server
logs had also overflowed, because of all the RARP failures caused by the name
server outage.)  By the time I brought the mail system back up, it seemed too
late to phone anyone at home, so, after confirming that no one else was fixing
things at the same time, I started to restore the name service contents from
the previous night's incremental dumps.

The name servers hold their state in VM, but keep stable copies in a set of
files that they write into a directory from time to time.  I found backup
copies of these files on bigtop from incremental dumps made at 6:00 a.m.
Friday.  Fortunately, bigtop's name server had written a full set of files to
disk between 6:00 a.m. Thursday and 6:00 a.m. Friday, or this wouldn't have
worked.  We didn't dump these directories on the other name servers, named
"jumbo" and "srcf4c"; we had figured we didn't need to, since the contents are
replicated.  Even so, extra dumps might have come in handy if bigtop's dump had
been unusable.  We've now started dumping these directories on jumbo too, just
in case.

(I had to do this restore before 6:00 a.m. Saturday, since the incremental
dumps are kept on disk, and each day's incremental dumps overwrite the previous
day's.)

So I reconstructed an appropriate directory for bigtop's name server, but
couldn't do the same for jumbo or srcf4c.  I killed all three name servers,
installed the restored state on bigtop, and restarted bigtop's name server.  At
this point, SRC had a name service, but it wasn't replicated.

I left the other name servers down, because the overnight skulkers would make
the name servers consistent, and bigtop's old information would lose to the
more recent information in the other servers.  I sent out a message describing
the state of the world and went home, figuring that things weren't really
fixed, but that nothing bad could happen before I came in Saturday morning.

Saturday morning, SRC had two more power failures during more rain.  Jumbo,
bigtop, and srcf4c all went down.  Srcf4c has an uninterruptible power supply,
but the batteries had probably gone flat, so it went down during each power
failure.

When power was restored, bigtop's name server came back up, but so did jumbo's
and srcf4c's.  I had only killed the running instances of the other servers,
not uninstalled them, since I was tired and thought it wouldn't matter
overnight.  Ha!  Jumbo's server rebooted automatically after the power came
back.  Perhaps as a result of the flaky UPS, srcf4c did not reboot, but a
helpful passer-by rebooted srcf4c by hand.  He hadn't read the electronic
message I'd left, since he couldn't log in, and, in any case, figured that some
inconsistency was better than total unavailability while waiting for jumbo and
bigtop to check their disks and finish booting.  Compounding Saturday morning's
confusion, I got to SRC later than I had planned, not wanting to travel in the
rain.

At this point, users had a 2/3 chance of getting their data from a bad name
server, and the bad servers were slowly propagating their contents to the good
one.

Fortunately, I had kept copies of the directory I had reconstructed on bigtop
the night before (plus the contents before I overwrote it, plus copies of
everything else I could find; I had known I was tired).  Even more fortunately,
Andrew and Richard agreed to come in.  We killed all the servers, reset
bigtop's contents and restarted its server, then Andrew used magic name service
commands to erase the name service replicas on jumbo and srcf4c and create new
ones, copying their contents from bigtop.  And that fixed everything.

Many thanks to everyone at SRC who helped understand the problem and to fix it.
Thanks also to Jim Horning, Cynthia Hibbard, and Chris Hanna for reviewing this
writeup.

What were the lessons?  Some might be:

1) Things break Fridays at 5:00 p.m., especially if it's a long weekend.
(Although SRC as a whole didn't get that Monday off for President's Day, many
people weren't back by then.  Perhaps some were trapped in the Sierras after
heavy snows and an avalanche closed the roads.)

2) The name service had been so reliable that there were few experts available
to fix it.  I'm not an expert, but I knew how it worked because I once released
a faulty garbage collector that caused some name servers to lose data and
eventually crash; I had done penance by fixing it.

3) You're always ready to fight the previous war.  When I discovered the name
server problems, my first reaction was that it was another garbage collector
bug (even though the collector had been stable for about a year).  Discovering
that garbage collection had nothing to do with the problem wasted some time.

4) Ivy's inability to edit protected files may not be a big problem on the
average, since those few users for whom this is a problem can work around it,
but the workarounds can be dangerous.  Moreover, these users didn't complain
about this limitation to the Ivy developers; they devised the workarounds on
their own.

5) After midway's /usr/lib/crontab got overwritten with a real file, it's
unfortunate that Richard and Lucille followed the link in their heads and
edited /etc/crontab, instead of editing /usr/lib/crontab and letting midway
follow the link.  Although a very similar situation had occurred two years
earlier, neither one expected it to happen again.

6) SRC's name service allowed only one instance of the name service on the same
network, virtually inviting this sort of collision of namespaces.  Since then,
Digital has developed product-quality name servers without this limitation, but
we were running our own earlier experimental software.  This limitation was
probably a mistake waiting to strike, but it's a sort of mistake that's
commonly made.

7) Although there are plenty of locks on the machine room, someone toggled the
DELNI.  Perhaps some network connectors should also have been unscrewed (and
hidden).  Again, this wouldn't have been a problem if we'd been using Digital's
product software.

8) While the export snapshot was being built, Lucille was very careful to keep
midway isolated from SRC's main network.  Afterwards, she watched midway for a
couple of days, making extra sure that it wasn't exporting its /etc/passwd
contents.  But she didn't watch it for months.  Perhaps she should have
reinstalled Ultrix on midway, deleting all old state.

9) Using dailyUpdate to keep the name service consistent with /etc/passwd seems
cumbersome and error-prone.  We may move toward a scheme where the name service
drives /etc/passwd instead, since even catastrophes like this one would not
lose information.

10) When I fixed things Friday night, I knew I was tired.  As a result, I was
very careful; I made copies of everything that might be overwritten.  They
might well have been overwritten even if I hadn't been tired, and I mightn't
have had the copies.

As I said at the beginning, the name service itself did not fail.  However,
some other parts of the environment were not as well thought out, and the end
result was a loss of data held in the name service.  Moreover, the experimental
name server's limitation to one instance per network made it especially
susceptible to failure caused by accidental network reconfiguration.
                                                                      John

------------------------------

End of RISKS-FORUM Digest 11.30
************************