Newsgroups: comp.sys.apple2 Path: news.weeg.uiowa.edu!news.uiowa.edu!hobbes.physics.uiowa.edu!moe.ksu.ksu.edu!crcnis1.unl.edu!wupost!howland.reston.ans.net!paladin.american.edu!europa.eng.gtefsd.com!library.ucla.edu!news.mic.ucla.edu!unixg.ubc.ca!acs.ucalgary.ca!sbdocker From: sbdocker@acs.ucalgary.ca (Sean Brendan Dockery) Subject: Re: DOS 3.3 Nibble Encoding Message-ID: Date: Mon, 15 Nov 1993 04:55:54 GMT References: Organization: Griffin Software Development Lines: 392 I didn't reply immediately because I didn't have the time and because I thought that some Apple // fanatic might have beaten me to it. :-) In article gregorya@cs.curtin.edu.au (Andrew Gregory) writes: | This time I'm also wanting to know how DOS 3.3 and ProDOS 8 encode their | data into something the disk drive can understand. My understanding is that | they use something called '6 and 2' which encodes 256 bytes of 'real' data | into 342 (or thereabouts) of disk data. I would like to know the details | of that process. See below. | I have heard that a book called 'Beneath Apple DOS' would explain all this | but none of the bookshops here in this backwater of Perth, Western Australia | carry it. Beneath Apple DOS and Beneath Apple ProDOS are indeed important books. | I've tried to figure out the //c disk ROM code without success (some Hehehehe. :-) | Andrew Gregory | gregorya@lillee.cs.curtin.edu.au ------------------------------------------------------------------------ Disclaimer: I make no warranties as to the accuracy of the information below. I am simply citing from Beneath Apple DOS, copyright (c) 1981 by Quality Software (RIP). --------------------------------- DATA STRUCTURE ON THE DISK MEDIUM --------------------------------- Under DOS 3.3 and ProDOS formatted systems (I'm not sure about HFS) the structure of disk data is the same. The following is the general format of disk data: GAP1 ADDRESS0 GAP2 DATA0 GAP3 ADDRESS1 GAP2 DATA1 GAP3 ... The fields are as follows: GAP1 contains HEX $FF self sync bytes (discussed below.) Typically, there are anywhere between 40 and 95. ADDRESS0 contains address prologue, volume, track number, sector number, checksum, and epilogue fields. The address prologue has the HEX values of D5 AA 96. The epilogue field has the HEX values of DE AA EB. All of the other fields are two bytes wide and subject to 4 and 4 encoding (the simplest kind.) GAP2 also contains HEX $FF self sync bytes. There are 5 to 10 bytes in this field. DATA0 contains data prologue, data, checksum, and epilogue fields. The data prologue has the HEX values of D5 AA AD. The epilogue has the HEX values of DE AA EB. The data field is subject to 6 and 2 encoding and the checksum is a literal value. See NOTE below about the length of this field. GAP3 contains HEX $FF self sync bytes. There are 14 to 24 bytes in this field. NOTE: There are 342 data bytes in a 6 and 2 encoded format; in DOS 3.2.1 and previous, the data field contained 410 bytes because it used a 5 and 3 encoded format. NOTE: Also note that ProDOS (and Apple Pascal, too, I believe) use two sectors consecutive sectors to form one disk block. If memory serves, HFS partitions use a disk block of size 524 (instead of 256 * 2 = 512) bytes--can anyone confirm this? --------------------------------------------- SELF SYNC BYTES: GETTING THE HARDWARE IN SYNC --------------------------------------------- When you begin a read (or write) operation, it is impossible to tell if the disk head is at the beginning, middle, or end of a byte. This problem requires that some sort of mechanism be in place to let the software know that it is "in sync" with the hardware. Self sync bytes are special kind of byte to help the software remain in sync with the where it is on the disk. It is the method for the operating system to know that it is not reading data from the middle of a byte of the above mentioned fields; such a situation would result in rather bizaare data and would probably be devasting to your system. The HEX $FF self sync byte differs from a regular HEX $FF byte in that it is 10 bits long. The operating systems assume that it has read a complete byte only when the most significant bit of a byte is set. On the other side of the coin, the operating system must ensure that it writes bytes to disk with the most significant bit set--an encoding (and sometimes translation) scheme must be applied. Suppose that you have the following bit streams on a disk: 0110101110101100111101101110101 As disk head moves across the stream during a read operation, bits are shifted into the data latch. Starting from the left-most position, bits would be read and the data latch would evolve as follows: 0000 0000 (initial state) 0000 0000 (shift the zero bit in) 0000 0001 0000 0011 (...) 1101 0111 (final identified byte) NOTE: The byte is NOT 0110 1011 as you might have expected because the high bit would still be clear; therefore, another bit must be read (and the data latch shift left) until the high bit becomes set. This is where the need for HEX $FF self sync bytes comes in. The bit stream of several consecutive self sync bytes follows: 1111111100111111110011111111001111111100111111110011111111001111111100 ^ Suppose that the read position started where the carret sign is. The data latch would progress to fill with the following byte. 1110 0111 1111111100111111110011111111001111111100111111110011111111001111111100 ^ The read position is again where the carret is. The data latch again: 1111 1001 Once again: 1111111100111111110011111111001111111100111111110011111111001111111100 ^ 1111 1110 And again: 1111111100111111110011111111001111111100111111110011111111001111111100 ^ 1111 1111 (remeber that a byte isn't a byte without bit 7 set) This is the turning point; from this moment on, all bytes read into the data latch will be HEX $FF. The software is now synchronized with the starting points of bytes on the disk medium. ---------------------------------------------- MAKE BYTES CONFORM TO DISK MEDIUM REQUIREMENTS ---------------------------------------------- Realistically, it is rather unlikely that all bytes to be read from (or written to) the disk medium are going to have their high bit set. Thus, encoding and decoding data (and sometimes translation) is a necessary requirement for the operating system. There are three types of encoding used by all of Apple's operating systems. They are 4 and 4, 5 and 3, and 6 and 2. All encoding requires the expansion of one byte to at least one byte and two bits. ---------------- 4 AND 4 ENCODING ---------------- 4 and 4 encoded bytes require two bytes (by splitting actual bits evenly between two bytes) and have the following format: 1 b7 1 b5 1 b3 1 b1 1 b6 1 b4 1 b2 1 b0 In order to get a usable data byte, it is necessary to shift the first byte left one bit cell, set the least significant bit in that byte, and logically AND the two bytes. Thus: b7 1 b5 1 b3 1 b1 1 (shift and logically OR 00000001) 1 b6 1 b4 1 b2 1 b0 ------------------------------ b7 b6 b5 b4 b3 b2 b1 b0 (logical AND of previous) 4 and 4 encoding is used only on the ADDRESSx section of disk data in the volume, track, sector, and checksum fields. The rationale behind it is that it is the speediest way to decode data so you can find out where you are on the disk quickly; other forms of encoding and decoding require translation first and last, respectively. Encoding bytes with the 4 and 4 method is accomplished by the following pseudo-code: load original data byte shift byte right one position logically OR with a bit mask of 10101010 save as first output byte load original data byte logically OR with a bit mask of 10101010 save as second output byte NOTE: The checksum field is the 4 and 4 encoded byte of the EXCLUSIVE OR of the byte representations of the previous fields (volume, track, and sector.) ---------------- 5 AND 3 ENCODING ---------------- 5 and 3 encoding has not been used since DOS 3.2.1. DOS 3.3 began the use of the 6 and 2 method; it is more memory efficient and makes the 5 and 3 method obsolete. It is probably not necessary for you to deal with the 5 and 3 scheme unless you want to write a file system translator for 13 sector/track DOS disks, so an explanation of 5 and 3 encoding has been omitted. ---------------- 6 AND 2 ENCODING ---------------- 6 and 2 encoding might seem rather complicated, but after digging, it is not as complex at it might seem to be. Suppose that you have read in (and translated) the 343 bytes (data field + checksum) of a sector into a buffer with the beginning of the buffer being represented by byte position HEX $000 and the end being represented by byte position HEX $156. The buffer would have the following appearance: $000: 0 0 a7 a6 a5 a4 a3 a2 $001: 0 0 b7 b6 b5 b4 b3 b2 $002: 0 0 c7 c6 c5 c4 c3 c2 $003: 0 0 | | | | | | $004: 0 0 | | | | | | $005: 0 0 v v v v v v $100: 0 0 < < $150: 0 0 ^ ^ ^ ^ ^ ^ $151: 0 0 | | | | | | $152: 0 0 | | | | | | $153: 0 0 | | | | | | $154: 0 0 | | | | c0 c1 $155: 0 0 | | | | b0 b1 $156: 0 0 | | | | a0 a1 The v's and ^'s means that the same pattern continues until $0FF and $100, respectively. The <'s mean that the pattern continues but is wrapped around to the next two bit cells at the bottom of the buffer. NOTE: The above figure only represents the data AFTER a proper read (or before a proper write) from (or to) the disk medium. See the next section for more details. -------------------------------------------------- ACTUAL READING FROM AND WRITING TO THE DISK MEDIUM -------------------------------------------------- In order to validate the integrity of the data in the data field of a disk sector, a checksum method is used. The method of acquiring a checksum requires a rather bizaare reading and writing algorithm-- probably the most difficult portion to grasp of all the information presented here. The checksum is acquire by logically EXCLUSIVELY ORing bytes in the encoded buffer. After the EXCLUSIVE OR, translation of bytes must take place before writing can begin; this is discussed later. The EXCLUSIVE ORing algorithm is rather strange in that the indexing used seems to jump around the buffer. Here is a chart of what happens during a writing session: buffer index disk data 0 \ > EOR ---> write translation ---> byte 000 $155 / $155 \ > EOR ---> write translation ---> byte 001 $154 / $154 \ > EOR ---> write translation ---> byte 002 $153 / (...) $101 \ > EOR ---> write translation ---> byte 085 $100 / $100 \ > EOR ---> write translation ---> byte 086 $000 / $000 \ > EOR ---> write translation ---> byte 087 $001 / $001 \ > EOR ---> write translation ---> byte 088 $002 / (...) $0FD \ > EOR ---> write translation ---> byte 340 $0FE / $0FE \ > EOR ---> write translation ---> byte 341 $0FF / $0FF CHKSUM ---> write translation ---> byte 342 Conversely, this is what happens during a read session: disk data buffer index byte 000 ---> read translation ---> EOR 0 ---> $155 byte 001 ---> read translation ---> EOR $155 ---> $154 byte 002 ---> read translation ---> EOR $154 ---> $153 (...) byte 085 ---> read translation ---> EOR $101 ---> $100 byte 086 ---> read translation ---> EOR $100 ---> $000 byte 087 ---> read translation ---> EOR $000 ---> $001 byte 088 ---> read translation ---> EOR $001 ---> $002 (...) byte 340 ---> read translation ---> EOR $0FD ---> $0FE byte 341 ---> read translation ---> EOR $0FE ---> $0FF byte 342 ---> read translation ---> EOR $0FF ---> 0 if data valid ----------------- TRANSLATING BYTES ----------------- Because of the 6 and 2 encoding scheme, the high bit of every byte is guarenteed to be clear. But, as the values of the each byte are limited between HEX $00 and HEX $3F, it is possible to translate each byte to another byte with its high bit set. Here is the 6 and 2 write translation table: 00 = 96 10 = B4 20 = D6 30 = ED 01 = 97 11 = B5 21 = D7 31 = EE 02 = 9A 12 = B6 22 = D9 32 = EF 03 = 9B 13 = B7 23 = DA 33 = F2 04 = 9D 14 = B9 24 = DB 34 = F3 05 = 9E 15 = BA 25 = DC 35 = F2 06 = 9F 16 = BB 26 = DD 36 = F5 07 = A6 17 = BC 27 = DE 37 = F6 08 = A7 18 = BD 28 = DF 38 = F7 09 = AB 19 = BE 29 = E5 39 = F9 0A = AC 1A = BF 2A = E6 3A = FA 0B = AD 1B = CB 2B = E7 3B = FB 0C = AE 1C = CD 2C = E9 3C = FC 0D = AF 1D = CE 2D = EA 3D = FD 0E = B2 1E = CF 2E = EB 3E = FE 0F = B3 1F = D3 2F = EC 3F = FF Reserved bytes are AA and D5. For the convenience of locating prologue fields as quickly as possible. The reverse of this table will give you a read tranlation table. ----------------------------- QUESTIONS I CAN'T ANSWER HERE ----------------------------- It is not explicitly stated in the Beneath Apple DOS manual as to how the boot sector is encoded. For the same drives to boot both 13 sectors/track DOS 3.2.1 disks and 16 sectors/track DOS 3.3 disks, it seems rather unclear as to what encoding scheme would be present in the firmware to overcome. Can the Apple IIc and Apple IIGS internal firmwares even handle booting DOS 3.2.1 disks directly? ------------------------------------------------------------------------ The information presented above is only of concern when working directly with Apple storage disk media. Smartport and SCSI drives don't require (allow?) you to access the hardware directly, so it does not apply. I hope that this information is as complete as you need it to be. Working directly with drive hardware is another task altogether, and I am not nearly confident enough with that information at this time to explain it. Perhaps in the future. ;-) -- Sean Dockery dockery@griffin.cuc.ab.ca sbdocker@acs.ucalgary.ca