💾 Archived View for rpy.ink › gemlog › kolmogorov-intuition.gmi captured on 2022-03-01 at 15:17:49. Gemini links have been rewritten to link to archived content

-=-=-=-=-=-=-

Intuitive review of Kolmogorov Complexity

In the mid-nineties, Ray Solomonoff and Andrey Kolmogorov independently researched and published work that ultimately lead to the foundation of Algorithmic Information Theory. While closely related, the purpose of their research was different, creating two branches: Algorithmic Probability and Algorithmic Complexity. Kolmogorov was concerned primarily with Complexity.

Kolmogorov's paper titled "Three approaches to defining Quantity of Information" proposed an alternative approach to the existing Combinatorial and Probabilistic ones: Algorithmic. Kolmogorov theorized that the complexity (quantity of information) of a data object is relative to the smallest program that can generate it (in a fixed language):

K(x) of a string x is the smallest k, such that there exists a representation (L, y) of x, such that |(L, y)|≤ k, where:

L is a fixed language

y is a representation of x in language L

Typically analysis is performed assuming a Turing machine, but for intuition, given two strings of length 100:

1. A finite sequence of repeating numbers,

01460146...0146

2. A statistically random sequence of numbers,

82095163...2472

a 43 byte program can be written in Python to generate the first string:

''.join([str(2*(i%4)) for i in range(100)])

However, there is no program that can generate the second string which is shorter than representing the sequence as a string literal.

The second string is said to be incompressible, unlike the first, for which it was possible to write a compression algorithm.

Alternatively, an incompressible string is one whose K(x) equals the length of the string itself.

For any given length there exist incompressible strings, which means that universal lossless compression is impossible.

Most strings are either highly complex, or incompressible.

It is natural to ponder what influence the choice of language has on the result of K(x). The Invariance Theorem proves that it bears no meaningful impact for theoretical analysis. The difference is constant between the optimal language description of the data object and that of a description in another language. Unfortunately, K(x) is uncomputable, therefore it is impossible to know what the optimal compression is for a data object.

The theory of Kolmogorov's complexity has interesting philosophical implications for "randomness". It is commonly understood that programs (without an entropy source exhibiting quantum randomness, e.g., radioactive decay) can generate only pseudorandom data. This can be intuitively derived if to consider the Kolmogorov complexity of the output of such a program. A program is known to exist that can generate that output, therefore it is not Kolmogorov-random. It might be statistically random, though, depending on the implementation.

Postscript

The incompressibility insight can be applied to simplify a variety of proofs, using a method named The Incompressibility Method.

The book "An Introduction to Kolmogorov Complexity and Its Applications" by Ming Li and Paul Vitaniy is a seminal text which delves into applying this proof method.

Date: 3rd April 2021