gemini - kennedy.gemi.dev

💾 Archived View for mirrors.apple2.org.za › archive › ground.icaen.uiowa.edu › Faqs › R011SNDFMTS.TX… captured on 2023-04-26 at 17:29:06.

-=-=-=-=-=-=-

The comp.sys.apple2 Usenet newsgroup Apple II FAQs originate from
the Ground Apple II archive. Administrator: Steve Nelson

Csa2 FAQs-on-Ground resource file: R011SNDFMTS.TXT

Audio File Formats Guide

Notes: This is a pure Text file which has no Font, Color,
etc. formatting and no set line length.

For best viewing on-line, set browser Word Wrap to ON or
copy to your favorite Text viewer and set Word Wrap.
Ex: On PC use WordPad with Options set to "Wrap to Window".

To correctly view tables and diagrams on a super-res display,
use a mono-spaced Font such as CoPilot or PCMononspaced.

____________________________

AUDIO FILE FORMAT RESOURCE GUIDE (Version 1.1)

by Dave Huizing

1 TABLE OF CONTENTS

2 GENERAL INFORMATION
2.1 Foreword
2.2 Printed Version
2.3 Copyrights
2.4 Disclaimer
2.5 Contributrors
3 TX WAVE FORMAT
4 YAMAHA TYPHOON WAVE FILE FORMAT
4.1 DWVW v1.2 compression
4.2 DWVW sample delta bit frame
5 D009
5.1 The D00 header
5.2 The Instrument data
5.3 The SpFX data
5.4 The Arrangement data
5.5 The Sequence data
6 MIDI SAMPLE DUMP STANDARD
6.1 INTRODUCTION
6.2 SPEC: SAMPLE DUMP FORMATS
6.3 SPEC: SAMPLE DUMP MESSAGES
6.4 HANDSHAKING MESSAGES:
6.5 DUMP PROCEDURE: MASTER (DUMP SOURCE)
6.6 DUMP PROCEDURE: SLAVE (DUMP DESTINATION)
6.7 SDS OVERVIEW
7 ROL
7.1 Structure of .ROL files
7.2 Notes
8 8SVX
8.1 FORMblock [VHDR]
8.2 FORMblock [BODY]
9 AIFF
10 AU
11 FSM
12 GF1 PATCH
13 S3I
14 UWF
15 WAVE
15.1 RiffBLOCK [data]
15.2 RiffBLOCK [fmt ]
15.3 RiffBLOCK [loop]
16 ZYXEL
17 CREATIVE LABS FILE FORMATS
17.1 Sound Blaster Instrument File Format (SBI)
17.2 Creative Music File Format (CMF)
17.3 The CMF Instrument Block
17.4 The CMF Music Block
17.5 Sound Blaster Instrument Bank File Format (IBK)
18 CREATIVE VOICE (VOC) FILE FORMAT
19 REVISION HISTORY

2 General information

2.1 Foreword

I started to compile this document after I thought there was a need for it. By surfing all around the web I collected these descriptions and brought them to this document.I have planed to keep this document updated so if theres any file format description thats not in this document, or you have any comments on this document please send me an email message at: stallion@worldonline.nl.

Happy developping,

Dave Huizing

2.2 Printed Version

If you need a printed version send an email.

2.3 Copyrights

Only the title and the compilation is copyrighted by Dave Huizing. As far as I know all this information is free for use. See the disclaimer part for more details. All trademarks, technical information and file extensions belong to their respectfull owners.

2.4 Disclaimer

This document is provided on a as is base. The information has been verified as far as possible, but I cannot be held responsible for any problems caused by use or misuse of the information. All due I think I wont happen I am also not responseble for any damage to any knid of computer system after or while
using parts form this documentation. Use this document on your own risk.

2.5 Contributrors

Dave Huizing, stallion@worldonline.nl
DJ, Producer, DTP designer, etc

muki pakesch, mpakesch@t0.or.at
Maintainer of the TX16W mailinglist

Markus - Jvnsson , f93-maj@nada.kth.se
Author of the Awave sample convertor

3 TX Wave Format

The file consists of a 32 byte header followed by the actual waveform (the first 16 bytes only identifies the file type). In C syntax the header would look like this:

char filetype[6] = "LM8953"

char nulls[10]

char dummy_aeg[6]
space for the AEG (never mind this

char format
0x49 = looped, 0xC9 = non-looped

char sample_rate
1 = 33 kHz, 2 = 50 kHz, 3 = 16 kHz

char atc_length[3]
I'll get to this...

char rpt_length[3]

char unused[2]
set these to null, to be on the safe side

The "atc_length" and "rpt_length" fields are quite complex. First of all you should know that there is no such thing as a looping point in a TX wave. Instead a wave is split into two parts, the attack part and the repeat part (of course the actual wave data isn't split, this is just a logical definition). As you might guess, the attack part is played first and the repeat part is looped until the key is released. Each of these parts are limited to a maximum of 128k words in length. That is the reason why waves can't be longer than 256k words (4096 blocks).

The length of a part is stored LSB first (Intel). And only the least significant _bit_ of the third byte (bit 0) is used (representing the most significant bit of the length). Are you confused yet? Then hold your breath. It seems that Yamaha has chosen to squeeze in the sample rate(!) of the wave in the unused _bits_ of these last bytes. Although they already have a separate byte for the sample rate, this isn't enough. I won't go into details on this now (or you would be even more confused). You only need to know that the possible values are:

0x06, 0x52 = 33 kHz
0x10, 0x00 = 50 kHz
0xF6, 0x52 = 16 kHz

(The first value is located in byte three of "atc_length" and the second value is located in byte three of "rpt_length".) To wrap it up, this is the format of the two length fields on a bit level:

[0]
[1]
[2]

atc_length
AAAAAAAA
BBBBBBBB
DDDDDDDC

rpt_length
EEEEEEEE
FFFFFFFF
HHHHHHHG

A
LSB of the attack length

B
MSB of the attack length (except for one bit)

C
the utterly most significant _bit_ of the attack length

D
the first value of the magic sample rate constant (0x06, 0x10 or 0xF6)

E
LSB of the repeat length

F
MSB of the repeat length (except for one bit)

G
the utterly most significant _bit_ of the repeat length

H
the second value of the magic sample rate constant (0x52, 0x00)

Now for the most important (and probably most interesting) part. The waveform data. As you certainly know the TX uses 12-bit sampling resolution, and this requires some kind of encoding if we are not willing to waste one fourth of our disk space. Yamaha has chosen to group the samples two by two, making three bytes of data in the file for each pair. I'll illustrate this on a bit level (as with the lengths above):

AA CD BB

A
MSB of the first sample

B
MSB of the second sample

C
least significant nybble (oh, is that the correct spelling?) of the first sample

D
least signiticant nybble of the second sample

4 Yamaha Typhoon wave file format

This specification describes the compression algorithm for Typhoon format waves. It does not cover the file format, which is AIFF-C. The documentation for AIFF-C is available at the site ftp.sgi.com in the directory /sgi/aiff-c.9.26.91.ps.Z (compressed Postscript file).

4.1 DWVW v1.2 compression

DWVW was invented 1991 by Magnus Lidstrom and is copyright 1993 by NuEdge Development. You have the right to use the algorithm freely as long as you make no false claims on its origin. DWVW is a lossless (or bit faithful) compression method for digital audio data. Lossless means that the exact original data will be preserved when compressing and decompressing.

The compression utilize the fact that the delta between the sample points is generally less than the full dynamic width. Each sample point is subtracted from the previous one and the difference is enthropy encoded in a special format. Therefore the compression works best on low frequency sounds with low noise ratio, where the difference between each sample is small.

DWVW can be applied on samples of any bit resolution and with any number of channels. As opposed to AIFF standard, sample bits are not "left justified". Instead the necessary translation should be done when decompressing. Also, while AIFF interleaves multichannel sounds, DWVW doesn't as this complicates compression and decompression.

Each channel follows one another with only a slight break in the bit run. The first delta for each channel should be put at an even 16-bit word position. The encoding stores the delta points with only as many bits as is required (hence the name "variable word width").

Thus, the number of bits used by each delta has to be stored as well. Since this count varies very little we apply a (simpler) delta encoding on this information.

To wrap it up, each compressed sample point consists of two values: the delta from the last sample and the difference in word width of this delta from the last delta (hereby referred to as "the WWM" - the word width modifier).

Even though the word width modifier is stored first in each delta frame we will describe the delta information first. The delta is always stored as an absolute difference (i.e. unsigned) in a varible number of bits. An extra bit follows that tells the sign (if the delta isn't zero). The number of bits required for the delta (i.e. the word width) is decided by the position of the most significant high bit in the absolut value. One bit less than this is actually stored since the first bit is always high.

For instance, the delta 11 (binary 1011) has a required word width of four bits ,but only the least significant three bits are stored. A zero delta will have a zero word width and consequently requires neither delta bits nor sign bit. A delta of one will require only a sign bit.

One special case requires attention. A normal two's complement number's lowest negative number is one less than the highest positive number. Treating zero as a positive value this gives exactly as many negative as positive numbers. The delta encoding on the other hand does not consider zero to be of any sign and does therefore not include the one extra negative value. If this value is encountered in the delta stream it is encoded as one greater than it actually is (putting it within the expressable range of values).

To distinguish it from the next lowest value one extra bit is inserted after the sign bit. The bit is high for the lowest value and low for the next lowest value.

For example, a 16-bit two's complement number can be -32768. It would be encoded as negative 32767 with an extra high bit. The value - 32767 would also be encoded as negative 32767 but with the extra bit low. Of course, only these two values require the extra bit.

The WWM preceeds the delta bits. It is encoded as a series of low bits (0) terminated by a high bit (1) (in most cases). The count of low bits tells the modifier amount. If the modifier isn't zero an extra bit follows that tells the modifier sign. A high bit means negative modifier. Word width "wraps" at the used bit resolution (new-width =3D (original-width + modifier) modula bit- resolution).

This enables us to go from a small width to a large width by using a negative modifier. Because of this fact a WWM will never need to be larger than the sound bit resolution divided by two (rounded downwards). If the modifier is the maximum the terminating high bit would be superfluous, so in this case it isn't inserted. (However; the sign bit is always included, even if the bit resolution is even.)

For encoding the current word width and sample value should be initially reset to zero for each channel (the first delta will thus be the sample value). A compressed channel always starts on an even 16-bit word boundary. Notice that the highest possible compression ratio is eight times, i.e. one bit per sample. This occurs when the source is continous series of zero samples.

4.2 DWVW sample delta bit frame:

0...
WWM is the count of low bits (can be none)

1
terminating high bit (if not max W=WM)

ms
WWM sign, high is negative (only on non-zero WWM)

delta
(word width - 1) sample delta bits (if delta 1)

sb
delta sign bit (only on non-zero delta)

xb
extra bit (only on lowest and next lowest possible delta value)

Some encoding examples (the examples all represent extreme situations with unusually poor
compression):
Bit resolution
16

Delta
923 (bin 00000011 10011011=)

Current width
1

New width
10

Modifier
-7 (mod 16 =3D 10)

Yields
0000000 1 1 110011011 0

Bit resolution
12

Delta
-2048 (bin 1000 00000000)

Current width
0

New width
11

Modifier
-1 (mod 12 =3D 11)

Yields
0 1 1 1111111111 1 1
(-2048 is encoded as 2047 with extra bit and negative high)

Bit resolution
8

Delta
-12 (bin 11110100, negated 00001100)

Current width
0

New width
4

Modifier
+4

Yields
0000 0 100 1 (no terminating bit for WWM)

5 D00

This part describes the D00 music format (used by the AdLib player v4.01 coded by JCH/Vibrants) in more detail than the docs of EdLib (the respective tracker, also coded by JCH) do. This document assumes that you already own EdLib and have some experience with it. Also, the availability of the EdLib docs as well as of the docs for the player included with EdLib is assumed. You should know some basics about AdLib programming and data formats (byte, word etc.) as well as the EdLib structures (Instruments, SpFX etc.) and with hexadecimal notation.

5.1 The D00 header

A description of the D00 header can be found in the player's docs. So I won't show it again here. But JCH gives very cryptic names to the other file structures, so I'll call them differently:

JCH's names
My names

TPoin tables
Arrangment data

SeqPointer tables
Sequence data

Instrument data
Instrument data

DataInfo text
Song description

Special tables
SpFX data

Also, I should mention that all the pointers to these tables are meant relative to the beginning of the D00 file.

5.2 The Instrument data

The instrument data simply consists of all instruments used in the song. Since the number of instruments is stored nowhere inside the file, loaders should the start offset of the next structure for determining if they have read enough data. The data for each instrument consists of 16 bytes, which occur in the same order as the corresponding bytes in the EdLib Instrument table:

xx xx xx xx xx xx xx xx xx xx xx xx xx xx xx xx
+------------+ +------------+ & & & & & &
Carrier data Modulator data & & & & +---Unused
& & & +Hard restart SR value
& & +Hard restart timer
& +Fine-tune
+AM/FM + Feedback

For the exact meaning of these bytes, read the EdLib manual. Note that in the Carrier and Modulator data the ADSR parts are not stored word-oriented, but byte-oriented. That means, they aren't stored as a word whose High byte is the AD part and whose Low byte is the SR part (although the display in EdLib creates that assumption).

Instead they're simply stored as two bytes of which the first one's the AD part and the second one's the SR part. 5.3 The SpFX data The SpFX data ist stored more or less like the Instrument data, but one single table entry consists of only 8 bytes arranged like this:

xxxx xx xx xx xx xxxx (note xx's are BYTES and xxxx's are WORDS!)
& & & & & &
& & & & & +Pointer to next SpFX entry
& & & & +Duration of SpFX entry in Frames
& & & +Modulator Level add
& & +New Modulator level
& +Note add value
+Instrument to use

Again, to really understand the meaning of these parts, you should read the EdLib docs.

5.4 The Arrangement data

The arrangement data determines which sequence is to be played on which channel at which moment and in which way, if you understand what I mean :) It consists of two parts: The Pointer part and the Data part (I simply call them that way now :). The Pointer part consists of 16 word pointers and one endmark (all endmarks are FFFFh, by the way). Only the first nine pointers are used at the moment: one for each one of the nine AdLib channels. Each one of these nine pointers points to the part of the Data part which belongs to its channel. The Data part consists, as you'd have guessed before, of nine independent arrangement streams. Each one of tese streams has the following format:

First comes a word telling the speed of that stream. Since this information is stored at the beginning of EVERY stream, I assume that every channel may have its own unique speed, and EdLib simply doesn't support this.

After that, the real arrangement data is stored. This data is organized like this: If a word below 8000h is read, it's the number of a sequence to be played. In that case, the saved transpose data is used.

But if a word 8XYYh is read, with X and YY being any value, the transpose data is updated to X and YY (see the EdLib docs for information on the meaning of X and YY).

I have found out that the first arrangement entry for an arrangement stream that contains at least one sequence is always such a command to set the
internal transpose data. So no default value is required to be loaded into the transpose data before playing. And looping the arrangement stream becomes easier.

If the word FFFFh is read, the arrangement stream has arrived at its looping point. The word following the FFFFh is an offset into the arrangement stream telling at which position the stream should be restarted. If the word FFFEh is read, the arrangement stream has reached its end. Unlike the Loop command (FFFFh), the stream mustn't get restarted but halted. Also, there is no word following the FFFEh command.

5.5 The Sequence data

The Sequence data again consists of a pointer part and a data part. But this time these two parts aren't stored in different parts of the file, the data part is stored directly after the pointer part. Therefore, a reference to a specific pattern should be seen as a reference to a word counted from the beginning of the Sequence data.

This word (e.g. the first word for Pattern 0000h) then points to the offset of the actual sequence data inside the file. I hope you got my point... Then, each sequence is stored as follows: Read a word. If it's high byte is below 20h, then it's a note. Note that RESTs and HOLDs are also counted as notes. In this case, the low byte can contain the following values:

00h = REST
The high byte tells the number of rests to insert minus one! e.g. a REST with a high byte of 01h means "Two RESTs"

01h - 7Dh = Note
The value of this note byte tells the amount of halfnotes to add to C-0 (e.g. 01h would mean C#0). In this case, the high byte tells the number of HOLDs to insert after the note.

7Fh = HOLD
The high byte tells the number of HOLDs minus one again!

If the high byte is 20h or above, but below 40h, it's a note again, but this time with Tienote switched on. The high word is used as repetition count again, but don't forget to substract 20h before evaluating it!!
If the high bzte is 40h or above, it's an effect. In this case, the complete word can simply be interpreted like any EdLib effect (set instrument, set volume etc.). See the EdLib docs for a list of them.

The note word this effect refers to follows directly after the ceffect word.
If the read word is FFFFh, it indicates the end of that sequence. In that case, the next sequence to be played should be determined and loaded and the first effect/note of it should be played.

6 MIDI SAMPLE DUMP STANDARD

6.1 INTRODUCTION

The MIDI SDS was adopted in January 1986 by the MIDI Manufacturers Association and the Japanese MIDI Standards Committee. The SDS defines the standard method for transfer of sound sample data between MIDI-equipped devices. Sample dumps may be accomplished with either an 'open loop' or 'closed loop' system.

The open loop method simply involves the straight dump of all sample data from its source to the destination, with no timeouts, packet acknowledgements, or any other form of handshaking, much as in the manner of a sysex bulk dump, usually intiated at the source.

The closed loop method allows the use of handshaking messages between the dump source and destination, and usually places the dump process under the control of the slave, to allow it time to process the incoming data as necessary. As with any standard, it can not be assumed that a device adheres to it unless the accompanying documentation specifically indicates it. Even then, it is best to check its conformity with non-critical data.

6.2 SPEC: SAMPLE DUMP FORMATS

DUMP HEADER: F0 7E cc 01 ss ss ee ff ff ff gg gg gg hh hh hh ii ii ii jj F7

cc
channel number

ss ss
sample number (LSB first)

ee
sample format (number of significant bits; 8->28)

ff ff ff
sample period (1/sample rate) in nanoseconds (LSB first)

gg gg gg
sample length, in words

hh hh hh
sustain loop start point (word number) (LSB first)

ii ii ii
sustain loop end point (word number) (LSB first)

jj
loop type (00:forwards only; 01:alternating)

DATA PACKET: F0 7E cc 02 kk <120 bytes> mm F7

cc
channel number

kk
running packet count (00->7F)

mm
checksum (XOR of 7E, cc, 02, kk <120 bytes>)

The total size of a data packet is 127 bytes. This is to avoid overflow of the MIDI input buffer of a device that may want to receive an entire packet before processing it. A data packet consists of its own header, a packet number, 120 bytes of data, a checksum, and an EOX. The packet number begins at 00 and increments with each new packet. It resets to 00 after it reaches 7F, and continues counting.

The packet number is used by the receiver to distinguish between a new data packet, or a resend of a previous packet. The packet number is followed by 120 bytes of data, which form 60, 40, or 30 words (MSB first for multiword samples), depending on the length of a single data sample. Each data byte hold seven bits, with the msb in each byte set to 0, in order to conform to the requirements of MIDI data transmission. Information is left justified within the 7-bit bytes, and unused bits are filled with 0. Example: Assume a data point in the memory of a 16-bit sampler, with the value 87E5. In binary, that would be:

1000 0111 1110 0101

and would be encoded as the following MIDI data stream:

01000011 01111001 00100000

The checksum is the running XOR of all the data after the SYSEX byte, up to but not including the checksum itself.

6.3 SPEC: SAMPLE DUMP MESSAGES

DUMP REQUEST: F0 7E cc 03 ss ss F7

cc
channel number

ss ss
sample number requested (LSB first)

Upon receiving the request, the sampler checks the sample number to see if it is within legal range. If it is not, the request is ignored. If it is, the sample dump is started. One packet at a time is sent, under control of the handshaking messages outlined below.

6.4 HANDSHAKING MESSAGES:

For all below:
cc
channel number

pp
packet number

Packet numbers are included in the handshaking messages to accomodate machines that have the intelligence to re-transmit specific packets after an entire dump is finished, or if synchronization is lost.

ACK
F0 7E cc 7F pp F7

Means last packet was recieved correctly (checksum OK, etc), please send next one. Packet number is packet being acknowledged as correct.

NAK
F0 7E cc 7E pp F7

Means last packet not received correctly, please send again. Packet number is packet being rejected.

CANCEL
F0 7E cc 7D pp F7

Means abort dump immediately. Packet number is packet on which abort occurs.

WAIT
F0 7E cc 7C pp F7

Means pause dump indefinitely, until next message is sent. Allows the unit recieving the dump to perform other functions (disk access, etc), before receiving the remainder of the dump. The next message it sends (eg ACK, ABORT) will determine if the dump continues or aborts.

6.5 DUMP PROCEDURE: MASTER (DUMP SOURCE)

Once a dump has been requested, either via MIDI or through the front panel, the DUMP HEADER is sent.

After sending the header, the master must time out for at least two seconds, to allow the receiver to decide if it will accept this sample (has enough memory, etc).If it receives a CANCEL, within this time, it should abort immediately.

If it receives an CAK, it will start sending packets immediately. If it receives a WAIT, it pauses until another message is received, and then processes that mesage normally. If nothing is recieved within the timeout, an open loop is assumed, and the dump starts with the first packet.
After sending each packet, the master should time out for at least 20 milliseconds and watch its MIDI In.

If an ACK is received, it sends the next packet immediately. If it receives an NAK, and the packet number matches the number of the last packet sent, it resend that packet If the packet numbers don't match, and
the device is incapable of sending packets out of order, the NAK will be ignored.

If a WAIT is received, the master should watch its MIDI In port indefinitely for another ACK, NAK, or CANCEL message, which it should then process normally.

If no messages are received within 20 milliseconds of the transmission of a packet, the master may assume an open loop configuration, and send the next packet.

This process continues until there are less than 121 data bytes to send. The final packet will still consist of 120n bytes, regardless of how many significant bytes actually remain, and the unused bytes will be filled with zeroes. The receiver should handshake after receiving the last packet.

6.6 DUMP PROCEDURE: SLAVE (DUMP DESTINATION)

When receiving a sample dump, a device should keep a running checksum during reception. If its checksum matches the checksum in the data packet, it will send an ACK and wait for the next packet.

If it does not match, it will send an NAK containing the number of the packet that caused the error, and wait for the next packet. If, after sending an NAK, the packet number of the next packet doesn't match the previous packet number (the one that was NAK'd), and the unit is not capable of accepting packets out of order, the error is ignored and the dump continues as if the checksums had matched.

If a receiver runs out of memory before the dumpo is completed, it should send a CANCEL to stop the dump.

6.7 SDS OVERVIEW

DUMP DATA FORMAT: DUMP HEADER

Sysex

ID: Universal Non-Real Time

Channel Number

Sub ID: Header

Sample Number (2 bytes, LSB first)

Sample Format

Sample Period (3 bytes, LSB first)

Sample Length (3 bytes, LSB first)

Sustain Loop Start Point (3 bytes, LSB first)

Sustain Loop End Point (3 bytes, LSB first)

Loop Type

Eox

SAMPLE DUMP DATA FORMAT: DATA PACKET

Sysex

ID: Universal Non-Real Time

Channel Number

Sub ID: Data Packet

Packet Number

Sample Data (120 bytes)

Checksum

Eox

SAMPLE DUMP MESSAGES: DUMP REQUEST

Sysex

ID: Universal Non-Real Time

Channel Number

Sub ID: Dump Request

Sample Number (2 bytes, LSB first)

Eox

SAMPLE DUMP MESSAGES: HANDSHAKING FLAGS:

Sysex

ID: Universal Non-Real Time

Channel Number

Sub ID: ACK or NAK or CANCEL or WAIT

Packet Number

Eox

7 ROL

This part contains details of .ROL files used by AdLib and compatible cards on PC It is also used by Visual Composer (TM).

7.1 Structure of .ROL files:

fld #
size
(bytes)
type
description

1
2
int
file version, major

2
2
int
file version, minor

3
40
char
unused

4
2
int
ticks per beat

5
2
int
beats per measure

6
2
int
editing scale (Y axis)

7
2
int
editing scale (X axis)

8
1
char
unused

9
1
char
0 = percussive mode
1 = melodic mode

10
90
char
unused

11
38
char
filler

12
15
char
filler

13
4
float
basic tempo

Field 14 indicates the number of times to repeat fields 15 and 16:
fld #
size
type
description (bytes)

14
2
int
number of tempo events

15
2
int
time of events, in ticks

16
4
float
tempo multiplier (0.01 - 10.0)

The remaining fields (17 to 34) are to be repeated for each of 11 voices:
fld #
size
type
description (bytes)

17
15
char
filler

18
2
int
time (in ticks) of last note +1

Repeat the next two fields (19 and 20) while the summation of field 20 is less than the value of field 18:
fld #
size
type
description (bytes)

19
2
int
note number: 0 => silence from 12 to 107 => normal note (you
must subtract 60 to obtain the correct value for the sound driver)

20
2
int
note duration, in ticks

21
15
char
filler

Field 22 indicates the number of times to repeat fields 23 to 26:
fld #
size
type
description (bytes)

22
2
int
number of instrument events

23
2
int
time of events, in ticks

24
9
char
instrument name

25
1
char
filler

26
2
int
unused

27
15
char
filler

Field 28 indicates the number of times to repeat fields 29 and 30:
fld #
size
type
description (bytes)

28
2
int
number of volume events

29
2
int
time of events, in ticks

30
4
float
volume multiplier (0.0 - 1.0)

31
15
char
filler

Field 32 indicates the number of times to repeat fields 33 and 34:
fld #
size
type
description (bytes)

32
2
int
number of pitch events

33
2
int
time of events, in ticks

34
4
float
pitch variation (0.0 - 2.0, nominal is 1.0)

7.2 Notes
Fields #1 and #2 should be set to 0 and 4 respectively. Field #10 should be filled with zeros.

8 8SVX

The 8SVX files are IFF files used for digital audio data. The format of the VHDR block is complete guesswork. These files use Motorola byte order. The 8SVX file format is fixed to 8-bit mono sample data - at least GoldWave does not support saving files in any other format than 8-bit mono.

8.1 FORMblock [VHDR]

This is the sample information block. The normal size is 20 bytes.

OFFSET
Count
TYPE
Description

0000h
1
dword
Sampling rate of digital data in Hz. This count seems not to
be too accurate, at least GoldWave v2.0 creates different
rates for Wave and 8SVX files.

0004h
4
dword
Other data, unknown

8.2 FORMblock [BODY]

This block contains the raw sample data, maybe the usual IFF compression was used. The details of both the compression and the information about the IFF format are unknow.

9 AIFF

The Audio Interchangeable File Format files are digital audio files stored in the IFF format; the samples are stored in signed PCM. The header block is [AIFF], different subblocks are :

[AUTH]
The authors information optional

[COMM]
This record stores information about the sampled data

OFFSET
Count
TYPE
Description

0000h
1
word
number of channels or number of instrument samples ???

0002h
1
dword
Sample length

0006h
1
dword
lower frequency

000Ah
1
dword
maximum frequency

000Dh
1
dword
???

[MARK]

[NAME]
The name of the instrument / sample

[SSND]
The stored sample data.

10 AU

The AU files are digital audio files used by the Sun and NeXT workstations. Further information wanted.

OFFSET
Count
TYPE
Description

0000h
4
char
ID='.snd'

0004h
1
dword
Offset of start of sample

0008h
1
dword
Length of stored sample

000Ch
1
dword
Sound encoding :
1 - 8-bit ISDN u-law,
2 - 8-bit linear PCM (REF-PCM),
3 - 16-bit linear PCM,
4 - 24-bit linear PCM,
5 - 32-bit linear PCM,
6 - 32-bit IEEE floating point,
7 - 64-bit IEEE floating point,
23 - 8-bit ISDN u-law compressed(G.721 ADPCM)

0010h
1
dword
Sampling rate

0014h
1
dword
Number of sample channels

11 FSM

The .FSM files are samples to be used for module style music with the Fandarole Composer. Currently only samples of up to 64K length are supported, altough the header reserves a dword for the sample size.

OFFSET
Count
TYPE
Description

0000h
4
char
ID='FSM',254

0004h
32
char
ASCII name of sample

0024h
3
char
ID=10,13,26

0027h
1
dword
Length of sample (<=64K)

0028h
1
byte
Fine tune value for sample (currently unsupported)

0029h
1
byte
Sample volume (currently unsupported)

002Ah
1
dword
Start of sample loop

002Dh
1
dword
End of sample loop. If the sample is not set to loop (see below)
this should be set to the end of the sample.

0032h
1
byte
Sample type bitmapped
0 - 8-bit/16-bit sample
1-7 - reserved

0033h
1
byte
Loop mode ?bit mapped?
0-2 - reserved
3 - loop off/loop on
4-7 - reserved

0034h
?
byte
Sample data in signed format

12 GF1 PATCH

The GF1 Patch files are multipart sound files for the Gravis Ultrasound sound card to emulate MIDI sounds in high quality. Each Patch can consist of many samples (for example, a string ensemble consists of Violin, Viola, Cello, Bass) which are played depending on the note to play. A patch can also
contain a part to be played before the loop and a part to be played after the tone has been released.

OFFSET
Count
TYPE
Description

0000h
12
char
ID='GF1PATCH110'

000Ch
10
char
Manufacturer ID

0018h
60
char
Description of the contained Instruments or copyright of
manufacturer.

0054h
1
byte
Number of instruments in this patch

0055h
1
byte
Number of voices for sample

0056h
1
byte
Number of output channels (1=mono,2=stereo)

0057h
1
word
Number of waveforms

0059h
1
word
Master volume for all samples

005Bh
1
dword
Size of the following data

0060h
36
byte
reserved

Following this header, the instruments with their headers follow. An instrument header contains the
name and other data about one instrument contained within the patch.
OFFSET
Count
TYPE
Description

0000h
1
word
Instrument number. ?Maybe the MIDI instrument number?. In the
Gravis patches, this is 0, in other patches, I found random values.

0002h
16
char
ASCII name of the instrument.

0012h
1
dword
Size of the whole instrument in bytes.

0016h
1
byte
Layers. Needed for whatever.

0017h
40
byte
reserved

About the patch, I don't know anything. Maybe somebody could enlighten me. Each patch record has the following format :

OFFSET
Count
TYPE
Description

0000h
7
char
Wave file name

0007h
1
byte
Fractions

0008h
1
dword
Wave size. Size of the wave digital data

000Ch
1
dword
Start of wave loop

0010h
1
dword
End of wave loop

0012h
1
word
Sample rate of the wave

0014h
1
word
Minimum frequency to play the wave

0016h
1
word
Maximum frequency to play the wave

0018h
1
dword
Original sample rate of the wave data

001Ch
1
int
Fine tune value for the wave

001Eh
1
byte
Stereo balance, values unknown**

001Fh
6
byte
Filter envelope rate

0025h
6
byte
Filter envelope offse

002Bh
1
byte
Tremolo sweep

002Ch
1
byte
Tremolo rate

002Dh
1
byte
Tremolo depth

002Fh
1
byte
Vibrato sweep

0030h
1
byte
Vibrato rate

0031h
1
byte
Vibrato depth

0032h
1
byte
Wave data, bitmapped
0 - 8/16 bit wave data
1 - signed/unsigned data
2 - de/enable looping
3 - no/has bidirectional looping
4 - loop forward/backward
5 - Turn envelope sustaining off/on
6 - Dis/Enable filter envelope
7 - reserved

0033h
1
int
Frequency scale, whatever that means

0035h
1
word
Frequency scale factor

0037h
36
byte
Reserved

13 S3I

This is the Digiplayer/ST3.0 digital sample file format. The sample files include information about the loop of the instrument. The AdLib instruments have another format listed below.

OFFSET
Count
TYPE
Description

0000h
1
byte
ID=01h

0001h
12
char
DOS filename

000Dh
1
byte
reserved (0)

000Eh
1
word
Paragraph offset of the raw sample data from beginning of file.

0010h
1
dword
Sample length in bytes

0014h
1
dword
Start of sample loop

0018h
1
dword
End of sample loop

001Ch
1
byte
Playback volumne of sample

001Dh
1
byte
??? "DSK" what ever that means

001Eh
1
byte
Pack type
0 - unpacked
1 - DP30ADPCM 1

001Fh
1
byte
Flags (bitmapped)
0 - loop on/off
1 - stereo sample (length bytes for left channel,
then another length bytes for right channel!)
2 - 16-Bit samples (in Intel byte order)

0020h
1
dword
C2 frequency

0024h
1
dword
reserved

0028h
1
word
reserved

002Ah
1
word
ID=512

002Ch
1
dword
?? Date of last modification ?? (see table 0009)

0030h
28
char
ASCIIZ Sample name

003Ch
4
char
ID='SCRS'

0040h
?
byte
Raw sample data

Here follows the AdLib instrument format for which I don't know the extension:

OFFSET
Count
TYPE
Description

0000h
1
byte
Instrument type
2 - melodic instrument
3 - bass drum
4 - snare drum
5 - tom tom
6 - cymbal
7 - hihat

0001h
12
char
DOS file name

000Dh
3
byte
reserved

0010h
1
byte
Modulator description (bitmapped)
0-3 - frequency multiplier
4 - scale envelope
5 - sustain
6 - pitch vibrato
7 - volume vibrato

0011h
1
byte
Carrier description (same as modulator)

0012h
1
byte
Modulator miscellaneous (bitmapped)
0-5 - 63-volume
6 - MSB of levelscale
7 - LSB of levelscale

0013h
1
byte
Carrier description (same as modulator)

0014h
1
byte
Modulator attack / decay byte (bitmapped)
0-3 - Decay
4-7 - Attack

0015h
1
byte
Carrier description (same as modulator)

0016h
1
byte
Modulator sustain / release byte (bitmapped)
0-3 - Release count
4-7 - 15-Sustain

0017h
1
byte
Carrier description (same as modulator)

0018h
1
byte
Modulator wave select

0019h
1
byte
Carrier wave select

001Ah
1
byte
Modulator feedback byte (bitmapped)
0 - additive synthesis on/off
1-7 - modulation feedback

001Bh
1
byte
reserved

001Ch
1
byte
Instrument playback volume

001Dh
1
byte
??? "DSK"

001Eh
1
word
reserved

0020h
1
dword
C2 frequency

0024h
12
byte
reserved

0030h
28
char
ASCIIZ Instrument name

004Ch
4
char
ID='SCRI'

14 UWF

The UWF files are sample files used by the UltraTracker. Further information wanted.

OFFSET
Count
TYPE
Description

0000h
32
char
ASCIIZ sample name

0020h
1
char
ID=1Ah

0021h
1
char
ID=10h

0022h
5
char
ID='MUWFB'

0027h
1
char
ID=0

0028h
6
char
Length of sample as ASCII long integer

002Eh
1
word
Length of sample

15 WAVE

The Windows .WAV files are RIFF format files. Some programs expect the fmt block right behind the RIFF header itself, so your programs should write out this block as the first block in the RIFF file. The subblocks for the wave files are:

15.1 RiffBLOCK [data]

This block contains the raw sample data. The necessary information for playback is contained in the
[fmt ] block.

15.2 RiffBLOCK [fmt ]
This block contains the data necessary for playback of the sound files. Note the blank after fmt.

OFFSET
Count
TYPE
Description

0000h
1
word
Format tag
1 = PCM (raw sample data)
2 etc. for APCDM, a-Law, u-Law ...

0002h
1
word
Channels (1=mono,2=stereo,...)

0004h
1
dword
Sampling rate

0008h
1
dword
Average bytes per second (=sampling rate*channels)

000Ch
1
word
Block alignment / reserved ??

000Eh
1
word
Bits per sample (8/12/16-bit samples)

15.3 RiffBLOCK [loop]

This block is for looped samples. Very few programs support this block, but if your program changes the wave file, it should preserve any unknown blocks.

OFFSET
Count
TYPE
Description

0000h
1
dword
Start of sample loop

0004h
1
dword
End of sample loop

16 ZyXEL

The ZyXEL Modems are capable of digitizing speech, the ZFAX software and answering machine software like VoiceConnect store the sampled data in those files. The Modems are capable of compressing the data down to 19.2k CPS (ADPCM) and 9.6k CPS (CELP), the algorithms for the compression may be found in the ZyxelVoc package by N. Igl, but as the firmware on the modems changes, so might the compression algorithm. Playback on the modem is always possible. Files are specified by the .ZVD and .ZYX extensions.

OFFSET
Count
TYPE
Description

0000h
5
char
ID='ZyXEL'

0005h
1
byte
02h, ??? format tag

0006h
4
byte
reserved

000Ah
1
word
Compression scheme
0 - CELP
1 - 2 bit ADPCM
2 - 3 bit ADPCM

000Ch
4
byte
reserved

0010h
?
????
Raw Data, The voice data is just the data received from U1496
Modem/Fax.

17 Creative Labs File Formats

17.1 Sound Blaster Instrument File Format (SBI)

The SBI format contains the register values for the FM chip to synthesize an instrument.

Offset
Description

00h-03h
Contains id characters "SBI" followed by byte 1Ah

04h-23h
Instrument name, NULL terminated string

24h
Modulator Sound Characteristic (Mult, KSR, EG, VIB, AM)

25h
Carrier Sound Characteristic

26h
Modulator Scaling/Output Level

27h
Carrier Scaling/Output Level

28h
Modulator Attack/Delay

29h
Carrier Attack/Delay

2Ah
Modulator Sustain/Release

2Bh
Carrier Sustain/Release

2Ch
Modulator Wave Seelct

2Dh
Carrier Wave Select

2Eh
Feedback/Connection

2Fh-33h
Reserved

17.2 Creative Music File Format (CMF)

The CMF file format consists of 3 blocks: the header block, the instrument block and the music block.

The CMF Header Block
Offset
Description

00h-03h
Contains id characters "CTMF"

04h-05h
CMF Format Version MSB = major version, lsb = minor version

06h-07h
File offset of the instrument block

08h-09h
File offset of the music block

0Ah-0Bh
Clock ticks per quarter note (one beat) default = 120

0Ch-0Dh
Clock ticks per second

0Eh-0Fh
File offset of the music title (0 = none)

10h-11h
File offset of the composer name (0 = none)

12h-13h
File offset of the remarks (0 = none)

14h-23h
Channel-In-Use Table

24h-25h
Number of instruments used

26h-27h
Basic Tempo

28h-?
Title, composer and remarks stored here

17.3 The CMF Instrument Block

The instrument block contains one 16 byte data structure for each instrument in the piece. Each record is of the same format as bytes 24h-33h in the SBI file format.

17.4 The CMF Music Block

The music block adheres to the standard MIDI file format, and can have from 1 to 16 instruments. The PC-GPE file MIDI.TXT contains more information on this file format.

The music block consists of an alternating seqence of time and MIDI event records:

dTime
MIDI Event
dTime
MIDI Event
dTime
MIDI Event
........

dTime (delta Time) is the amount of time before the following MIDI event. MIDI Event is any MIDI channel message.

The CMF file format defines the following MIDI Control Change events:

Control No
Control Data

66h
1-127, used as markers in the music

67h
0 - melody mode, 1 = rhythm mode

68h
0-127, changes the pitch of all following notes upward by the given number of 1/128
semitones

69h
0-127, changes the pitch of all following notes downward by the given number of
1/128 semitones

In rhythm mode, the last five channels are allocated for the percussion instruments:
Channel
Instrument

12h
Bass Drum

13h
Snare Drum

14h
Tom-Tom

15h
Top Cymbal

16h
High-hat Cymbal

17.5 Sound Blaster Instrument Bank File Format (IBK)

A bank file is a group of up to 128 instruments.

Offset
Description

00h-03h
Contains id characters "IBK" followed by byte 1Ah

04h-803h
Parameters for 128 instruments, 16 bytes for each instrument in the same format
as bytes 24h-33h in the SBI format

804h-C83h
Instrument names for 128 instruments, 9 bytes for each instrument, each name
must be null terminated

18 Creative Voice (VOC) file format

HEADER (bytes 00-19)
Series of DATA BLOCKS (bytes 1A+) [Must end w/ Terminator Block]

byte #
Description

00-12
"Creative Voice File"

13
1A (eof to abort printing of file)

14-15
Offset of first datablock in .voc file (std 1A 00 in Intel Notation)

16-17
Version number (minor,major) (VOC-HDR puts 0A 01)

18-19
2's Comp of Ver. # + 1234h (VOC-HDR puts 29 11)

Data Block: TYPE(1-byte), SIZE(3-bytes), INFO(0+ bytes)
NOTE: Terminator Block is an exception -- it has only the TYPE byte.

TYPE
Description
Size (3-byte int)
Info

00
Terminator
(NONE)
(NONE)

01
Sound data
2+length of data

02
Sound continue
length of data
Voice Data

03
Silence
3

04
Marker
2
Marker# (2 bytes)

05
ASCII
length of string
null terminated string

06
Repeat
2
Count# (2 bytes)

07
End repeat
0
(NONE)

08
Extended
4

Sound Info Format:
*Silence Info Format:

00 Sample Rate
00-01 Length of silence - 1

01 Compression Type
02 Sample Rate

02+ Voice Data

**Extended Info Format:

00-01
Time Constant:
Mono: 65536 - (256000000/sample_rate)
Stereo: 65536 - (25600000/(2*sample_rate))

02
Pack

03
Mode:
0 = mono
1 = stereo

Marker#
Driver keeps the most recent marker in a status byte

Count#
Number of repetitions + 1 Count# may be 1 to FFFE for 0 - FFFD
repetitions or FFFF for endless repetitions

Sample Rate
SR byte = 256-(1000000/sample_rate)

Length of silence
in units of sampling cycle

Compression Type
of voice data
8-bits= 0
4-bits = 1
2.6-bits = 2
2-bits = 3
Multi DAC = 3+(# of channels)
[interesting this isn't in the developer's manual]

19 Revision History

Version 1.0 - First document containing 15 formats
Version 1.1 - 2 More formats added