The VCDIFF Generic Differencing and Compression Data Format

Created: 2023-06-13T06:37:50-05:00

Return to the Index

This card pertains to a resource available on the internet.

1. Glossary

1a. Target File

1a1. The file you want to have.

1b. Source File

1b1. The file you have on hand already.

1c. Deltas

1c1. The changes you need to make to turn the source file in to the
target file.

2. Stated goals

2a. Output compactness

2a1. Provides a basic encoding format for dealing with patches.

2a2. Applications can add additional layers to get better compression if
needed.

2b. Data portability

2b1. Machine byte order and word size issues are worked around.

2b2. Base unit of measure is the 8-bit byte.

2c. Algorithm genericity

2c1. VCDiff only specifies a language to apply patch data; it leaves the
way you arrive at those changes undefined on purpose.

2d. Decoding efficiency

2d1. Uses only byte-aligned operations to avoid the need for bit
operations.

3. Integer encoding

3a. Variable length; each chunk is an 8-bit byte. Most significant bit
determines if another byte must be read to complete the integer. Values
are stored in the least significant 7 bits.

4. Windows

4a. There is a "source" and "target" window

4b. These windows are put together in a "superstring" called U.

4b1. The superstring is the equivalent of concatenating all bytes of the
source and target window together.

4c. Target window is initially blank when reconstructing a file--but is
appended to as delta instructions are followed.

5. Instructions

5a. Instructions apply within the context of a Window.

5b. Instructions are allowed to access indices which occur beyond the
source window. In that case data is being referenced from data that has
already been emitted to the target window. This is allowed as long as
the data has already been pushed to the target and you are only
referencing something you already injected or copied.

5c. ADD

5c1. Holds the number of bytes to be added, and the payload to be
injected directly.

5d. COPY

5d1. Holds the number of bytes to be copied from the source window, and
an offset to the window to copy from.

5e. RUN

5e1. As in, "run length encoding."

5e2. Holds a count and a byte. The byte is repeated `count` number of
times.

6. File layout

6a. There are exact byte specifications for how instructions should be
encoded in to the file. I am not providing those here.

6b. Header

6c. Windows

6c1. Targets a size and offset from a source file.

6c2. Contains the instruction set to run to perform the transformation.