💾 Archived View for dioskouroi.xyz › thread › 29346810 captured on 2021-11-30 at 20:18:30. Gemini links have been rewritten to link to archived content

-=-=-=-=-=-=-

Researchers Defeat Randomness to Create Ideal Code

Author: digital55

Score: 54

Comments: 28

Date: 2021-11-26 01:46:03

Web Link

________________________________________________________________________________

pizza wrote at 2021-11-26 18:45:26:

Somehow this reminds me of Polar codes (used in the 5G control protocol), the first code to provably approach the Shannon limit (iiuc). It works by turning a physical channel into N virtual channels, and it uses the polar coding algorithm to _increase_ the noisiness of some virtual channels, while _decreasing_ the noisiness of others. So you need to pick which virtual channels to listen to, depending on the noisiness characteristics of the environment. And it kind of uses a simple recursive strategy to generate these codes as a simple mixture of inputs.

If I understand correctly, this approach is kind of using a recursive checksum-like system to use local bits to verify the integrity of the code globally?

dataflow wrote at 2021-11-26 20:23:04:

I don't understand the definition of local testability. Could someone explain? Like say I send you 1 million bits of information, and the 5th one is corrupted. The proposition is you can somehow figure this out by examining only a small number of bits? How in the world is that possible? You have no idea which bit might be corrupted a priori, so wouldn't you have to at least _read_ all of them once?

TrueDuality wrote at 2021-11-27 17:51:32:

Imagine taking a book off a shelf, flipping to a random page, then pointing at a random word on that page. Without any knowledge of the rest of the page or book you can tell whether that word is spelled correctly and exists in the dictionary. That is local testability.

With code words you need to be able to distinguish where the code word starts and stops and how long it is. For more complex code words not all bit combinations are valid just like not all combinations of letters are valid words.

The example given at the beginning of the article doesn't really go far enough. Yes if you repeat a bit 3 times, you can tell its corrupted if any of the three don't match (ignore the correction part of it because that is not part of local testability).

What the article doesn't cover in its example is that if you have a message that is a full gigabyte. You can skip to any part of that message (as long as you know the index you're at), find the beginning of the code word (it will have index % 3 == 0) and the next three bits and verify whether those have been corrupted or not as well.

ajb wrote at 2021-11-27 00:37:31:

It's the original you care about, not the encoding. Suppose you have a code such that at least 50% of the encoded bits must be corrupted to prevent a good decoding of the source. If the encoding is that corrupted, if you choose 20 bits at random to examine, there's less than 1/1000000 chance that you chose one of the corrupted ones. So this code might be one in which for any 20 bits, you can tell if they form part of a valid codeword.

dataflow wrote at 2021-11-27 00:57:47:

> If the encoding is that corrupted, if you choose 20 bits at random to examine, there's less than 1/1000000 chance that you chose one of the corrupted ones.

Is there a typo somewhere here? I'm confused what you're saying. So if my message is 40 bits long and you randomly pick 20 bits to examine, there's a less than one-in-a-million chance that you choose a corrupted bit?

Why? Wouldn't that be an almost 100% chance (= 1 - (20/40 * 19/39 * 18/38 * ...))? Or in terms of whether you recovered the correct value, wouldn't that be a 50% chance (20/40)? Sorry this probably sounds really dumb but I feel like I'm not understanding what you intended to write.

ajb wrote at 2021-11-27 07:50:43:

Yes, if the message is only 40 bits long there is a 100% chance. But the point is that you still only need 20 bits even if the message is 100000 bits long

Because if 50% of bits are corrupted, each bit has a >50% chance of being corrupted, so

With 20 bits you have only a 1 in 2^20 chance of missing all the corrupted bits.

The 20 bits don't allow you do correct anything, BTW. To do that you need a locally _decodable_ code. What we are talking about here is the ability to detect corruption _of the original data_.

mirekrusin wrote at 2021-11-26 21:40:44:

Good naive example is provided at the beginning - you repeat each bit 3 times – you'll be protected against single bit flip by taking majority vote (value of other 2 bits).

dataflow wrote at 2021-11-26 21:41:46:

Don't you have to read all the bits first though?

tromp wrote at 2021-11-26 22:14:46:

From

https://en.wikipedia.org/wiki/Locally_testable_code

"Clearly, any valid codeword should be accepted as a codeword, but strings that are not codewords could be only one bit off, which would require many (certainly more than a constant number) probes. To account for this, testing failure is only defined if the string is off by at least a set fraction of its bits. This implies words of the code must be longer than the input strings by adding some redundancy."

dataflow wrote at 2021-11-26 22:21:06:

I'm not sure I'm really following how that answers my question unfortunately. Are they assuming O(n) corruptions? Distributed randomly in the string?

dmurray wrote at 2021-11-26 21:49:51:

No, you just have to read the first 3 bits, if the corruption is in those. But yeah, you can't detect a single-bit corruption if you haven't actually read that bit.

ajb wrote at 2021-11-27 00:29:10:

You can't detect single bit corruption of _the encoding_ without reading that bit. But what you care about is whether the encoding is sufficiently corrupted that you can't decode the original.

dataflow wrote at 2021-11-26 21:52:55:

What if the corruption is in the last bit? Do you somehow magically know it's not in the first 3 without reading them?

cdtwigg wrote at 2021-11-26 22:19:32:

In this formulation I believe the last bit would also be repeated 3 times, and you’d have to read those three bits to detect corruption.

dataflow wrote at 2021-11-26 22:28:40:

Yes but how would you know that without reading all 6 bits first?

netizen-936824 wrote at 2021-11-26 23:46:18:

You wouldn't, as long as the bits are received sequentially. I think the idea is that, depending on where the corruption is, you only need to read up to two bits past the corrupted one

mirekrusin wrote at 2021-11-27 07:33:51:

No you don’t, you need to read at most 2 bits past corrupted one to detect it.

dataflow wrote at 2021-11-27 07:35:54:

The question is how do you know which one is corrupted before reading all of them??

mirekrusin wrote at 2021-11-27 08:28:50:

You don't need to read 1 million bits to detect 5th being corrupted one, you will know it after reading 3 * 5 = 15 bits.

dataflow wrote at 2021-11-27 08:31:59:

Yeah and what if instead of being the 5th bit it was the 1 millionth bit that was corrupted?

mirekrusin wrote at 2021-11-27 10:45:40:

For 1 million bits message where millionth bit is corrupted last bit doesn't have to be read, in general in 3 replica coding, reading first 2 is enough if they are the same - reading last one will have no effect.

dataflow wrote at 2021-11-27 10:48:18:

For heaven's sake... how in the world would you know it's the millionth bit that's corrupted without scanning all 999,999 other bits at least once?!?!

Do you data streams always come magically tagged with sentences like _"Hey, watch out for bits 999,998-1,000,000, but all my other bits are intact so you can skip them!"_ or something??! You somehow "know" the error is in the final 3 bits and not in the earlier 999,997 bits because... magic??

mirekrusin wrote at 2021-11-27 17:24:37:

No, it doesn't work that way.

vcdimension wrote at 2021-11-29 14:12:33:

Here is one application that this research could be applied to:

https://github.com/Parchive/par2cmdline

https://en.wikipedia.org/wiki/Parchive

floatingatoll wrote at 2021-11-27 04:40:29:

See also: _Locally Testable Codes with constant rate, distance, and locality_

https://arxiv.org/abs/2111.04808

For example, from heading “High Dimensional Expansion”:

The current paper is mainly elementary and almost self-contained (with the exception of Section 6 which uses the existence of some Ramanujan Cayley graphs with specific properties and can be taken as a black box). But it came up as a result of a much longer and intensive journey. Some interesting open problems were left aside along the way. It is, therefore, worthwhile to give the story here.

ajb wrote at 2021-11-26 20:49:16:

Hmm I think Leonid Levin did something similar:

https://www.cs.bu.edu/fac/lnd/expo/holo.htm

Not sure yet what this one is doing that that one doesn't.

(Edited to add) Having read further, it seems that this code is close to optimal in rate and distance too. The link above is actually for probabilistically checkable proofs, which are similar, and the article thinks this is likely to lead to advances there too

unbanned wrote at 2021-11-26 18:25:03:

Dupe of

https://news.ycombinator.com/item?id=29350983

– this current submission has a lower ID and was submitted earlier.

Neither have comments so far but both are on the front page. @dang if you read this, can this one be marked as as a dupe?

tlarkworthy wrote at 2021-11-26 18:32:03:

flag the other one (

https://news.ycombinator.com/item?id=29350983

)