gemini - kennedy.gemi.dev

💾 Archived View for mirrors.apple2.org.za › archive › apple.cabi.net › FAQs.and.INFO › DOSandProDOS … captured on 2024-12-17 at 12:53:31.
-=-=-=-=-=-=-
Newsgroups: comp.sys.apple2
Path: news.weeg.uiowa.edu!news.uiowa.edu!hobbes.physics.uiowa.edu!moe.ksu.ksu.edu!crcnis1.unl.edu!wupost!howland.reston.ans.net!paladin.american.edu!europa.eng.gtefsd.com!library.ucla.edu!news.mic.ucla.edu!unixg.ubc.ca!acs.ucalgary.ca!sbdocker
From: sbdocker@acs.ucalgary.ca (Sean Brendan Dockery)
Subject: Re: DOS 3.3 Nibble Encoding
Message-ID: <Nov15.045554.34306@acs.ucalgary.ca>
Date: Mon, 15 Nov 1993 04:55:54 GMT
References: <gregorya.752325765@marsh>
Organization: Griffin Software Development
Lines: 392

I didn't reply immediately because I didn't have the time and because
I thought that some Apple // fanatic might have beaten me to it.  :-)

In article <gregorya.752325765@marsh> gregorya@cs.curtin.edu.au
(Andrew Gregory) writes:

| This time I'm also wanting to know how DOS 3.3 and ProDOS 8 encode their
| data into something the disk drive can understand.  My understanding is that
| they use something called '6 and 2' which encodes 256 bytes of 'real' data
| into 342 (or thereabouts) of disk data.  I would  like to know the details
| of that process.

See below.

| I have heard that a book called 'Beneath Apple DOS' would explain all this
| but none of the bookshops here in this backwater of Perth, Western Australia
| carry it.

Beneath Apple DOS and Beneath Apple ProDOS are indeed important books.

| I've tried to figure out the //c disk ROM code without success (some

Hehehehe.  :-)

| Andrew Gregory
| gregorya@lillee.cs.curtin.edu.au

------------------------------------------------------------------------
Disclaimer:  I make no warranties as to the accuracy of the
information below.  I am simply citing from Beneath Apple DOS,
copyright (c) 1981 by Quality Software (RIP).

		  ---------------------------------
		  DATA STRUCTURE ON THE DISK MEDIUM
		  ---------------------------------

Under DOS 3.3 and ProDOS formatted systems (I'm not sure about HFS)
the structure of disk data is the same.

The following is the general format of disk data:

     GAP1 ADDRESS0 GAP2 DATA0 GAP3 ADDRESS1 GAP2 DATA1 GAP3 ...

The fields are as follows:

     GAP1 contains HEX $FF self sync bytes (discussed below.)
     Typically, there are anywhere between 40 and 95.

     ADDRESS0 contains address prologue, volume, track number, sector
     number, checksum, and epilogue fields.  The address prologue has
     the HEX values of D5 AA 96.  The epilogue field has the HEX
     values of DE AA EB.  All of the other fields are two bytes wide
     and subject to 4 and 4 encoding (the simplest kind.)

     GAP2 also contains HEX $FF self sync bytes.  There are 5 to 10
     bytes in this field.

     DATA0 contains data prologue, data, checksum, and epilogue
     fields.  The data prologue has the HEX values of D5 AA AD.  The
     epilogue has the HEX values of DE AA EB.  The data field is
     subject to 6 and 2 encoding and the checksum is a literal value.
     See NOTE below about the length of this field.

     GAP3 contains HEX $FF self sync bytes.  There are 14 to 24 bytes
     in this field.

NOTE:  There are 342 data bytes in a 6 and 2 encoded format; in DOS
3.2.1 and previous, the data field contained 410 bytes because it used
a 5 and 3 encoded format.

NOTE:  Also note that ProDOS (and Apple Pascal, too, I believe) use
two sectors consecutive sectors to form one disk block.  If memory
serves, HFS partitions use a disk block of size 524 (instead of 256 *
2 = 512) bytes--can anyone confirm this?

	    ---------------------------------------------
	    SELF SYNC BYTES: GETTING THE HARDWARE IN SYNC
	    ---------------------------------------------

When you begin a read (or write) operation, it is impossible to tell
if the disk head is at the beginning, middle, or end of a byte.  This
problem requires that some sort of mechanism be in place to let the
software know that it is "in sync" with the hardware.

Self sync bytes are special kind of byte to help the software remain
in sync with the where it is on the disk.  It is the method for the
operating system to know that it is not reading data from the middle
of a byte of the above mentioned fields; such a situation would result
in rather bizaare data and would probably be devasting to your system.

The HEX $FF self sync byte differs from a regular HEX $FF byte in that
it is 10 bits long.  The operating systems assume that it has read a
complete byte only when the most significant bit of a byte is set.  On
the other side of the coin, the operating system must ensure that it
writes bytes to disk with the most significant bit set--an encoding
(and sometimes translation) scheme must be applied.

Suppose that you have the following bit streams on a disk:

     0110101110101100111101101110101

As disk head moves across the stream during a read operation, bits are
shifted into the data latch.  Starting from the left-most position,
bits would be read and the data latch would evolve as follows:

     0000 0000 (initial state)
     0000 0000 (shift the zero bit in)
     0000 0001
     0000 0011
       (...)
     1101 0111 (final identified byte)

NOTE: The byte is NOT 0110 1011 as you might have expected because the
high bit would still be clear; therefore, another bit must be read
(and the data latch shift left) until the high bit becomes set.  This
is where the need for HEX $FF self sync bytes comes in.

The bit stream of several consecutive self sync bytes follows:

1111111100111111110011111111001111111100111111110011111111001111111100
     ^
Suppose that the read position started where the carret sign is.  The
data latch would progress to fill with the following byte.

     1110 0111

1111111100111111110011111111001111111100111111110011111111001111111100
             ^
The read position is again where the carret is.  The data latch again:

     1111 1001

Once again:

1111111100111111110011111111001111111100111111110011111111001111111100
                     ^
     1111 1110

And again:

1111111100111111110011111111001111111100111111110011111111001111111100
                             ^
     1111 1111 (remeber that a byte isn't a byte without bit 7 set)

This is the turning point; from this moment on, all bytes read into
the data latch will be HEX $FF.  The software is now synchronized with
the starting points of bytes on the disk medium.

	    ----------------------------------------------
	    MAKE BYTES CONFORM TO DISK MEDIUM REQUIREMENTS
	    ----------------------------------------------

Realistically, it is rather unlikely that all bytes to be read from
(or written to) the disk medium are going to have their high bit set.
Thus, encoding and decoding data (and sometimes translation) is a
necessary requirement for the operating system.

There are three types of encoding used by all of Apple's operating
systems.  They are 4 and 4, 5 and 3, and 6 and 2.  All encoding
requires the expansion of one byte to at least one byte and two bits.

			   ----------------
			   4 AND 4 ENCODING
			   ----------------

4 and 4 encoded bytes require two bytes (by splitting actual bits
evenly between two bytes) and have the following format:

     1  b7  1  b5  1  b3  1  b1
     1  b6  1  b4  1  b2  1  b0

In order to get a usable data byte, it is necessary to shift the first
byte left one bit cell, set the least significant bit in that byte,
and logically AND the two bytes.

Thus:

     b7   1  b5   1  b3   1  b1   1  (shift and logically OR 00000001)
      1  b6   1  b4   1  b2   1  b0
     ------------------------------
     b7  b6  b5  b4  b3  b2  b1  b0  (logical AND of previous)

4 and 4 encoding is used only on the ADDRESSx section of disk data in
the volume, track, sector, and checksum fields.  The rationale behind
it is that it is the speediest way to decode data so you can find out
where you are on the disk quickly; other forms of encoding and
decoding require translation first and last, respectively.

Encoding bytes with the 4 and 4 method is accomplished by the
following pseudo-code:

     load original data byte
     shift byte right one position
     logically OR with a bit mask of 10101010
     save as first output byte
     load original data byte
     logically OR with a bit mask of 10101010
     save as second output byte

NOTE:  The checksum field is the 4 and 4 encoded byte of the EXCLUSIVE
OR of the byte representations of the previous fields (volume, track,
and sector.)

			   ----------------
			   5 AND 3 ENCODING
			   ----------------

5 and 3 encoding has not been used since DOS 3.2.1.  DOS 3.3 began the
use of the 6 and 2 method; it is more memory efficient and makes the 5
and 3 method obsolete.  It is probably not necessary for you to deal
with the 5 and 3 scheme unless you want to write a file system
translator for 13 sector/track DOS disks, so an explanation of 5 and 3
encoding has been omitted.

			   ----------------
			   6 AND 2 ENCODING
			   ----------------

6 and 2 encoding might seem rather complicated, but after digging, it
is not as complex at it might seem to be.

Suppose that you have read in (and translated) the 343 bytes (data
field + checksum) of a sector into a buffer with the beginning of the
buffer being represented by byte position HEX $000 and the end being
represented by byte position HEX $156.  The buffer would have the
following appearance:

     $000:  0  0  a7  a6  a5  a4  a3  a2
     $001:  0  0  b7  b6  b5  b4  b3  b2
     $002:  0  0  c7  c6  c5  c4  c3  c2
     $003:  0  0   |   |   |   |   |   |
     $004:  0  0   |   |   |   |   |   |
     $005:  0  0   v   v   v   v   v   v

     $100:  0  0           <       <

     $150:  0  0   ^   ^   ^   ^   ^   ^
     $151:  0  0   |   |   |   |   |   |
     $152:  0  0   |   |   |   |   |   |
     $153:  0  0   |   |   |   |   |   |
     $154:  0  0   |   |   |   |  c0  c1
     $155:  0  0   |   |   |   |  b0  b1
     $156:  0  0   |   |   |   |  a0  a1

The v's and ^'s means that the same pattern continues until $0FF and
$100, respectively.  The <'s mean that the pattern continues but is
wrapped around to the next two bit cells at the bottom of the buffer.

NOTE:  The above figure only represents the data AFTER a proper read
(or before a proper write) from (or to) the disk medium.  See the next
section for more details.

	  --------------------------------------------------
	  ACTUAL READING FROM AND WRITING TO THE DISK MEDIUM
	  --------------------------------------------------

In order to validate the integrity of the data in the data field of a
disk sector, a checksum method is used.  The method of acquiring a
checksum requires a rather bizaare reading and writing algorithm--
probably the most difficult portion to grasp of all the information
presented here.

The checksum is acquire by logically EXCLUSIVELY ORing bytes in the
encoded buffer.  After the EXCLUSIVE OR, translation of bytes must
take place before writing can begin; this is discussed later.

The EXCLUSIVE ORing algorithm is rather strange in that the indexing
used seems to jump around the buffer.

Here is a chart of what happens during a writing session:

     buffer index                             disk data

     0    \
           > EOR  ---> write translation ---> byte 000
     $155 /

     $155 \
           > EOR  ---> write translation ---> byte 001
     $154 /

     $154 \
           > EOR  ---> write translation ---> byte 002
     $153 /

     (...)

     $101 \
           > EOR  ---> write translation ---> byte 085
     $100 /

     $100 \
           > EOR  ---> write translation ---> byte 086
     $000 /

     $000 \
           > EOR  ---> write translation ---> byte 087
     $001 /

     $001 \
           > EOR  ---> write translation ---> byte 088
     $002 /

     (...)

     $0FD \
           > EOR  ---> write translation ---> byte 340
     $0FE /

     $0FE \
           > EOR  ---> write translation ---> byte 341
     $0FF /

     $0FF  CHKSUM ---> write translation ---> byte 342

Conversely, this is what happens during a read session:

     disk data                                         buffer index

     byte 000 ---> read translation ---> EOR 0    ---> $155
     byte 001 ---> read translation ---> EOR $155 ---> $154
     byte 002 ---> read translation ---> EOR $154 ---> $153
     (...)
     byte 085 ---> read translation ---> EOR $101 ---> $100
     byte 086 ---> read translation ---> EOR $100 ---> $000
     byte 087 ---> read translation ---> EOR $000 ---> $001
     byte 088 ---> read translation ---> EOR $001 ---> $002
     (...)
     byte 340 ---> read translation ---> EOR $0FD ---> $0FE
     byte 341 ---> read translation ---> EOR $0FE ---> $0FF
     byte 342 ---> read translation ---> EOR $0FF ---> 0 if data valid

			  -----------------
			  TRANSLATING BYTES
			  -----------------

Because of the 6 and 2 encoding scheme, the high bit of every byte is
guarenteed to be clear.  But, as the values of the each byte are
limited between HEX $00 and HEX $3F, it is possible to translate each
byte to another byte with its high bit set.

Here is the 6 and 2 write translation table:

     00 = 96   10 = B4   20 = D6   30 = ED
     01 = 97   11 = B5   21 = D7   31 = EE
     02 = 9A   12 = B6   22 = D9   32 = EF
     03 = 9B   13 = B7   23 = DA   33 = F2
     04 = 9D   14 = B9   24 = DB   34 = F3
     05 = 9E   15 = BA   25 = DC   35 = F2
     06 = 9F   16 = BB   26 = DD   36 = F5
     07 = A6   17 = BC   27 = DE   37 = F6
     08 = A7   18 = BD   28 = DF   38 = F7
     09 = AB   19 = BE   29 = E5   39 = F9
     0A = AC   1A = BF   2A = E6   3A = FA
     0B = AD   1B = CB   2B = E7   3B = FB
     0C = AE   1C = CD   2C = E9   3C = FC
     0D = AF   1D = CE   2D = EA   3D = FD
     0E = B2   1E = CF   2E = EB   3E = FE
     0F = B3   1F = D3   2F = EC   3F = FF

Reserved bytes are AA and D5.  For the convenience of locating
prologue fields as quickly as possible.

The reverse of this table will give you a read tranlation table.

		    -----------------------------
		    QUESTIONS I CAN'T ANSWER HERE
		    -----------------------------

It is not explicitly stated in the Beneath Apple DOS manual as to how
the boot sector is encoded.  For the same drives to boot both 13
sectors/track DOS 3.2.1 disks and 16 sectors/track DOS 3.3 disks, it
seems rather unclear as to what encoding scheme would be present in
the firmware to overcome.  Can the Apple IIc and Apple IIGS internal
firmwares even handle booting DOS 3.2.1 disks directly?
------------------------------------------------------------------------

The information presented above is only of concern when working
directly with Apple storage disk media.  Smartport and SCSI drives
don't require (allow?) you to access the hardware directly, so it does
not apply.

I hope that this information is as complete as you need it to be.

Working directly with drive hardware is another task altogether, and I
am not nearly confident enough with that information at this time to
explain it.  Perhaps in the future.  ;-)
-- 
Sean Dockery

dockery@griffin.cuc.ab.ca
sbdocker@acs.ucalgary.ca