💾 Archived View for dioskouroi.xyz › thread › 24917679 captured on 2020-10-31 at 00:59:31. Gemini links have been rewritten to link to archived content
-=-=-=-=-=-=-
________________________________________________________________________________
If curious see also
2014
https://news.ycombinator.com/item?id=8728532
2013
https://news.ycombinator.com/item?id=6231992
2009
https://news.ycombinator.com/item?id=506986
And I submitted it in 2017 but it didn't get any traction. Glad to see it again!
This is actually very helpful to mentally understand why people who are not thinking with regards to the logic that software works at don’t see eye to eye on these issues. I would have never imagined something like nuance on copyright would somehow weave back around into opinion about obscenity law, and yet this is about as elegant as any explanation of it.
This is my first time reading this, but I suspect it has become a new part of my mental modeling of the world for years to come.
I think it would be better to think in terms of information theory, rather than the hypothetical color of bits. If I have a legally obtained copy of a copyrighted song (that is not permissively licensed), then I am restricted in my right to share copies. Certainly I can create a one-time pad and communicate that to someone, and I can XOR the bits of the song recording with that one-time pad, and the author would argue that I now have colorless bits. But I send it to you along with instructions for how to decrypt it, I'm using a mechanism to communicate a perfect copy. It isn't that the ciphertext is now "colored", it's that the net effect of my mechanism is that I've communicated the original song, with perfect fidelity. That's what matters: the net effect of the system as a whole. I will have created a communication channel (perfectly legal), and used that channel to share a copyrighted work without the creator's permission (not legal, depending on the details).
> I can XOR the bits of the song recording with that one-time pad, and the author would argue that I now have colorless bits.
No, the author would say the opposite. That’s the whole point - the process by which you get to the bits matters.
OK, you are right, he is using "color" in a funny way to reflect the idea that the violation is somehow in those bits. But in my opinion, the violation is in the channel as a whole (the ciphertext, the one-time pad, and the instructions on how to decrypt, shared for the purpose of distributing copies), not because of "colored bits".
Your opinion is an engineer or computer science opinion. I am not criticizing when I say that; I share it with you. But the opinion of most of the rest world is not with us, and this is a good essay explaining that opinion. It is important to understand it if you want to understand the world, make correct predictions about how most people will operate in these matters, or figure out how to best change people's minds. (I can definitely speak from experience that the direct approach is not very effective. Can't tell you what is, unfortunately...)
on the one hand, yeah it's funny and arbitrary. On the other hand, it's how at least the US legal system understands, litigates, and enforces IP laws
The moment you xor your one time pad (OTP) with the song to generate a ciphertext C, both OTP and C become, you could say, _conditionally-colored_.
You're not restricted in redistributing one, so long as you don't redistribute the other. If you redistribute both, even in different channels, logically that's a copyright violation unless you have sufficient controls or restrictions to prevent further redistribution until someone downstream from ending up with both and being able to recombine them.
Realistically, the only reason you'd generate C and distribute both OTP and C is so that someone eventually gets them both and can reconstruct the song. Trying to claim you didn't have that intent wouldn't work in a civil context, and might not even work in a criminal context.
Could we add [2004] please? This is a great article, always glad to see it pop up again!
Colour _does_ exist. It's the Second Law of Thermodynamics: information cannot be created, only lost, and creating a faithful copy of document A requires you to discard an equivalent amount of information (usually in the form of “there's usable energy over here”, in a very lossy process, but physics doesn't _require_ this) – which you can then recover by comparing your copy A with the original document A, and discarding one.
You're not going to get the Complete Works of Shakespeare by any means other than by copying an existing copy of the Complete Works of Shakespeare; the odds are astronomical. You _might_ get a description of calculus by other means (expending the effort _yourself_ to produce it), but it's not very likely you'd come up with it before somebody else made such a discovery public, unless:
• it's recently become obvious that such a thing would be useful; and
• it's easy enough to discover that you could figure it out in a year or two; and
• enough people were looking in that direction at the time (or one of them was a secretive sort of person).
That's not to say that authorial monopolies are necessarily a _good_ thing. But they _are_ a meaningful concept.
The blog post writes:
Suppose you publish an article that happens to contain a sentence identical to one from this article, like "The law sees Colour." That's just four words, all of them common, and it might well occur by random chance. Maybe you were thinking about similar ideas to mine and happened to put the words together in a similar way. If so, fine. But maybe you wrote "your" article by cutting and pasting from "mine" - in that case, the words have the Colour that obligates you to follow quotation procedures and worry about "derivative work" status under copyright law and so on.
There was a real court case in 2012 which I think is interesting because it's very similar to this example. A photographer was accused of "copying" the concept of taking a photo of a red bus in front of a grey Houses of Parliament. He defended himself by saying that that those ideas are very common and should not be copyrightable---but failed:
https://youzicha.tumblr.com/post/162846191544/what-colour-ar...
If I understand the verdict correctly it says that it doesn't matter that the visual idea was trivial and that many other people have come up with the same idea independently. What matters is that this particular photographer deliberately wanted to copy a known image.
A very good example of "color", since the exact same photograph (same bits) would be non-infringing if the photographer had got the idea independently.
I just read this about 5 years after it was published.
Working with computers the whole life and graduating in Law, I always had the problem of not understanding how my colleagues could think that the bits from one email where different from the bits of something else when you were doing a dd from one disk, or how reading one disk was "breaking" some correspondence secret....
There is the thing itself and there is the history and context of the thing, and they are both important. We try to keep them together, but the connection is often fragile.
One example: a product and its price. The price roughly summarizes information about how it was made and how valuable people think it is, which are not attributes of the thing itself and often can’t be deduced from it. In some cases we physically attach a label, the price tag, to keep track. In other cases we attach a lot more info.
Another example: a photo and information about when and where it was taken. For a photo to serve as _evidence_ we need a reliable history. In court, this is the chain of custody. Too often on the Internet, we pretend that having the photo is proof enough, but without knowing its history, it could be a fake.
This is a great metaphor because you will often see people with a technical background arguing copyright law is if it was purely about the bits. Because for us, the bits are the only real thing. But the law does not see it like that.
This is an elaboration of the idea that where data comes from is just as important as what the data happens to be. In expert systems this is related to the logical justification of a truth statement, as the statement is still true or false, but there is a Pedigree or Provenance for the data provided as metadata about the data stored.
I've wondered how long it will be before input events are traceable to specific devices (e.g., you type 'A' on your keyboard, registered to you, and get a scancode . . . plus a few hundred bits timestamped and signed by the device before anyone can actually treat it as an 'A' keystroke). "Secure Unicode," anyone?
I figure this kind of dystopian mechanism for input provenance is at least 100 years out. Please don't anyone prove me wrong.
This is reminiscent of Vinge's _Rainbows End_ where 90% of every chip is DRM and 10% is actual functionality.
Your keystrokes are safely stored on Microsoft servers to improve Cortana and the advertisements you'll see
It'll be time to update to 2020!
Medium grey and opaque, thanks to cryptography.
this is almost a description of a type system
This essay is written to create the impression that it's imparting something profound, but it's really just identifying the existence of side channels.
We encode bits in all kinds of things. You can store some bits on a flash drive. Then you can write some words on the outside of the flash drive. They're both just bits.
Whether you own a flash drive with some bits on it which is in a safety deposit box at a bank may depend on the bits in the bank's computer and not the bits on the flash drive, but it's still bits that it depends on.
The examples it uses aren't accurate:
Maybe you were thinking about similar ideas to mine and happened to put the words together in a similar way. If so, fine. But maybe you wrote "your" article by cutting and pasting from "mine" - in that case, the words have the Colour that obligates you to follow quotation procedures and worry about "derivative work" status under copyright law and so on.
As it turns out copyright law _doesn't_ really care about this, even if people might expect it to, because the law is too pragmatic for that. Proving that you came up with some particular phrasing is hard. Disproving it is hard too. So in practice the courts don't look at whether you actually copied the bits. They look at whether you could have copied them ("access"), which they then go on to assume is the case for anything widely disseminated (since establishing that would be hard too), and then whether the bits are similar ("substantial similarity"):
https://www.theiplawblog.com/2007/02/articles/copyright-law/...
Whether you _actually_ copied the bits doesn't come into consideration, apparently.
Because the courts can only take into consideration the information they have available to them. Which is all bits, because all information is bits.
You take a file to which someone claims copyright, mix it up with a public file, and then the result, which is mixed-up garbage supposedly containing no information, is supposedly free of copyright claims even though someone else can later undo the mixing operation and produce a copy of the copyright-encumbered file you started with.
There is still no Colour here, and the essay is missing a rather decent practical attack on the "Colour theory" version of the copyright system.
Suppose Alice publishes R1 and Bob publishes R2. R1 is Alice's message xor Alice's one time pad. R2 is Bob's message xor Bob's one time pad. The one time pads are "random".
Then it's discovered that R1 xor R2 generates a third party copyrighted work. Which is statistically impossible unless either Alice or Bob (_but not necessarily both_) chose their one-time pad specifically in order to cause this.
According to "Colour theory" the one who chose their one-time pad specifically in order to cause this has given their ciphertext the "Colour" of the copyrighted work. But that's a real problem _in practice_ when there is no way to tell which one it was. The other person may not even be in on it. So which one do you haul into court when you don't know that? Which one do you take down?
In real life if something like that becomes popular what happens is not that they figure out who it really was that created the derivative work, it's that they come up with some kind of disgusting hack like the DMCA takedown process which imposes no practical consequences on fraudulent takedowns, and hope that the innocent victims of the collateral damage don't have enough political clout to do anything about it.
It seems like the same fallacy as the model of the law that we teach to high school students. Computer scientists understand that it's wrong. Lawyers understand that it's wrong. But certain people benefit from pretending that it isn't in front of the general population because the misleading abstraction is prettier than what actually happens under the hood, and a better understanding of the latter would make people upset.
> Suppose Alice publishes R1 and Bob publishes R2. R1 is Alice's message xor Alice's one time pad. R2 is Bob's message xor Bob's one time pad. The one time pads are "random". Then it's discovered that R1 xor R2 generates a third party copyrighted work.
It is common among computer people to think the law can be hacked like an algorithm. It does not work like that. If you xor two apparently random files and they surprisingly produce the full text of the Harry Potter series, you do _not_ have plausible deniability if you start distributing it.
You're missing the attack in exactly the same way as the author does.
The same person doesn't distribute both of the files. Two different people distribute two different files. One of them is totally innocent and the party distributing that file doesn't even have to be in on it or have any relationship with the other person, but there is no way to tell which one it is.
The legal system is forced into either punishing and taking down the innocent file or not doing so for the infringing one. There is no other option when you can't distinguish between them.
But it isn't supposed to do that to the one which is just an ordinary use of a one-time pad by an innocent independent third party who has e.g. posted it in a public place for the intended recipient of the non-infringing message to receive it without there being a direct one-to-one communication between sender and the recipient. Or because there are multiple intended recipients and only those with the correct pad can read the original message so it's safe to publish widely.
The fact that some totally different person has come along and used your published file to encode an infringing one is not supposed to affect your legal status. But if nobody can tell which one is the original, the legal system has to choose between punishing the innocent and not punishing the guilty.
It isn't an algorithmic problem, it's an evidentiary problem. There are two different sets of bits and one is supposed to have a different "Colour" but the legal system has no information as to which one it is.
It's like someone discovering that the flashlight on certain phones is bright enough to blind surveillance cameras, and when someone points out that criminals could use this to prevent surveillance cameras from capturing their faces while they're committing their crimes, you respond that the legal system doesn't work like that because having an effective way to avoid being identified doesn't make your conduct legal. But that wasn't the original claim.
I see it all the time and it's quite frustrating to see people being so naive. The law is not purely mathematical and algorithmic. I think a good example that moves outside of IP law is murder vs manslaughter. Two identical killings could fall under different charges simply due to what the killer was thinking at the time. And we want it that way. It would be unfair and not accomplish anything good to treat an accidental and an intentional killer the same way.