💾 Archived View for nicholasjohnson.ch › 2021 › 11 › 13 › future-proof-digital-timestamping captured on 2024-08-18 at 18:51:23. Gemini links have been rewritten to link to archived content
⬅️ Previous capture (2024-07-08)
-=-=-=-=-=-=-
_ _ _ _ _ _ _ | \| (_)__| |_ ___| |__ _ ___ _ | |___| |_ _ _ ___ ___ _ _ | .` | / _| ' \/ _ \ / _` (_-< | || / _ \ ' \| ' \(_-</ _ \ ' \ |_|\_|_\__|_||_\___/_\__,_/__/ \__/\___/_||_|_||_/__/\___/_||_|
📆 November 13, 2021 | ⏱️ 7 minute read | 🏷️ computing
The internet will soon face a huge problem. AI-generated media, aka synthetic media, is becoming harder to distinguish from human-generated media. Synthetic articles shared on social media have topped the charts with readers entirely unaware what they're reading is synthetic. The amount of synthetic media posted online keeps increasing every year. Even big tech platforms and DARPA have ramped up their deepfake detection efforts.
What will be the outcome of this? Well it will definitely be a cat-and-mouse game for a while. Big tech and governments will spend millions to detect fakes. But they're only delaying the inevitable. Any deepfake detection scheme can be used to train a better AI to fool it. It will eventually be practically impossible to detect fakes. It's only a matter of time.
For online content that isn't widely witnessed, like my online journal, it'll be impossible for future viewers to be sure that a human being created it. Since AI will be able to generate unlimited convincing fake media, that will diminish the value of any media that can't be verified as human in origin.
So I decided I wanted to give future readers and internet historians a way to verify definitively that this journal was written by a human. That way it doesn't blend into the background of all the convincing synthetic media that will surely populate the internet soon enough.
It occurred to me that because today's AI couldn't possibly generate my journal articles, if I timestamped my journal, that would prove to future readers that it's human-made. So I started looking for software that could do that.
I didn't want to use some centralized service to perform the timestamping because of 2 reasons:
Then I found OpenTimestamps¹. It's based on Bitcoin, which I don't like. I've encouraged people to avoid using proof-of-waste cryptocurrencies² before. I don't feel great about using software that relies on a planet-roasting cryptocurrency, but there's just no other way I know of to create trustless, decentralized, verifiable timestamps.
Also, OpenTimestamps has an extremely efficient design compared to other Bitcoin timestamping schemes. Thanks to OpenTimestamps' clever use of Merkle trees³, it can timestamp unlimited data using only 1 transaction. Other Bitcoin timestamping software uses 1 transaction per timestamp, an extremely wasteful, inefficient design. At least OpenTimestamps isn't that bad.
So anyway, I created a timestamped Git commit and tagged it timestamp-1⁴. I wrote the concatenated commit data of the timestamped commit to a file⁵ in case you're interested to see what it looks like. The software works in a very elegant fashion. It even maintains compatibility with non-OpenTimestamps Git clients, so GnuPG can still verify the commit signature.
The base64-encoded timestamp appended to the commit data includes all the necessary hashes to build the Merkle path from the tagged commit to the merkle root included in the Bitcoin transaction. Using './ots --git-extract <filename>' on any file in the nicksphere-gmi repo present at the timestamped commit, you can extract an ots proof file which you can then verify with './ots --verify <filename>'.
Thus future readers of my journal and historians will be able to verify that each entry was written by a human with no major external dependency other than the widely witnessed Bitcoin ledger. There are caveats to that, but luckily I thought up ways around all of them.
If you're familiar with Git's crypto, you know it still uses SHA-1, which is SHAttered⁶. Since OpenTimestamps uses the Git commit data for timestamping commits, it also uses SHA-1. Unless you've enabled experimental SHA-2 support, which no code hosting platforms support, then SHA-1 is the best OpenTimestamps can do for Git repos.
As it turns out, SHA-1 is still good enough for OpenTimestamps⁷. Since there's no preimage attack against SHA-1, OpenTimestamps is unaffected. Meaning the timestamp I created for this journal still has meaning. Nonetheless I'll eventually redo the timestamp when Git supports SHA-2, just to future-proof it.
There's actually another problem with OpenTimestamps: It depends on Bitcoin. Bitcoin was the first cryptocurrency. Don't get me wrong, it was great for its time. But by today's standards, it has several severe design flaws:
With all these design flaws, Bitcoin should've fallen out of favor years ago. Supposing people come to their senses and it does fall out of favor, it will lose its value. Then miners will quit mining. There will be nothing to secure the blockchain and it will be possible to rewrite blockchain history. Thus the timestamps won't be secure.
Luckily, there's a clever way to preserve the timestamps, even after Bitcoin is no longer secured by miners. It's a technique I call 'timestamp chaining'. The idea is simple. Before the blockchain becomes insecure due to lack of mining, digitally timestamp the whole ledger. Then embed that timestamp inside its successor ledger.
Just as timestamping my journal before AI could've generated it proves it was written by a human, timestamping the Bitcoin blockchain before it becomes insecure proves which blocks were really included. If Bitcoin's successor falls out of favor, the process can simply be repeated. This creates a secure chain of timestamps from the most recent distributed ledger all the way back to the timestamp embedded in today's Bitcoin ledger.
All this assumes distributed ledgers stick around. If there's any gap in the timestamp chain where there's no distributed ledger to put the latest timestamp in, then the entire chain is invalidated. This would be bad because Bitcoin timestamps are used to carbon date much of the internet⁸ (archive.org). The timestamps will be extremely useful to future internet historians.
In order to verify the timestamp chain, you need to know roughly around what time each ledger in the chain stopped being secure. That way you can check that it was timestamped before that date. As long as you stick to widely witnessed ledgers, this shouldn't pose a problem. This whole process can be automated. But it's not yet necessary as Bitcoin still hasn't fallen from grace.
But what about quantum computers? Won't they invalidate the timestamps? No. Timestamp chaining is also quantum-secure, given quantum-resistant ledgers are in use before quantum computing becomes practical. Research on quantum-resistant distributed ledgers⁹ has been underway for years, so I estimate a very high probability it will be ready.
It doesn't even matter if all the underlying cryptographic primitives of the ledgers in the timestamp chain are broken by quantum computers. As long as the most recent ledger used in the timestamp chain is quantum-secure and there are no gaps in the timestamp chain, timestamps going all the way back to Bitcoin will be verifiable. SHA-256 is the only primitive relied upon for timestamping and it's thought to be quantum-secure already.
This journal's timestamp is not yet future-proof because it still uses SHA-1. When Git supports SHA-2, I plan on creating a new timestamp. I don't think SHA-2 preimage resistance will be broken any time soon and I think distributed ledgers will still be popular for years to come. So if you want to create a trustless, future-proof, unforgeable digital timestamp, timestamp chaining seems like the way to go.
Future internet historians will have many methods of verifying when some digital media was created. They probably won't be limited to verifying timestamp chains. While timestamps offer the strongest assurance that media isn't synthetic, it's not like your digital work will necessarily be indistinguishable from synthetic media just because you didn't timestamp it.
I just decided to timestamp my journal to create that extra assurance that it's not synthetic. That was the primary reason. The synthetic internet might arrive in 10 years or 50 years. Since I have no way to know, it seemed best to create a verifiable timestamp now, before GPT-4 gets released.
🔗 [2]: proof-of-waste cryptocurrencies
🔗 [7]: SHA-1 is still good enough for OpenTimestamps
🔗 [8]: Bitcoin timestamps are used to carbon date much of the internet
🔗 [9]: quantum-resistant distributed ledgers
Copyright © 2020-2024 Nicholas Johnson. CC BY-SA 4.0.