💾 Archived View for wilmhit.pw › blog › ethical-telemetry.gmi captured on 2024-05-26 at 14:39:42. Gemini links have been rewritten to link to archived content
⬅️ Previous capture (2023-12-28)
-=-=-=-=-=-=-
---
title: "Is proof of work a way to ethical telemetry?"
date: 2022-08-13
draft: false
---
Free and open source project pride themselves with privacy. That is also one of
the aspects why users choose to use these projects. This model of development
has more advantages over closed one: faster development, clean code, outside
contributions and many more.
<!--more-->
If asked only about privacy, one segment of FOSS software exceeds as the
ultimate champion: offline programs. If you obtained the installation files
in private way then you're covered. Doesn't matter what your ISP, cloud
provider or company's administrator are up to. If there is no traffic leaving
your device there can hardly be any tracking at all. Users like this and have
come to expect this.
Most of the GUI FOSS software is written by a single person. That person not
only is benevolent dictator but also rarely occasion to discuss a feature with
someone before implementing it.
This person obviously has to know how to code, so can be called a programmer.
Programming is the only skill needed to write a program. Now, what is the
chance that this person knows at least basics of a good UI design?
Let's reverse the question. If a person is a UX expert, what is the chance that
this person will also become maintainer of FOSS app? Slim, I would say.
There is also another thing: users don't report usability problems. They are
not bugs so why bother? Bugtrackers are filled with actual bugs. Would you open
an issue titled "Post button is too small"? First, it sounds like your problem
and not a software issue. Second, this is really minor. Maintainer surely has
more burning issues to resolve.[^1]
If a corporation commits to build a new program there is always more then one
person assigned to that task. And more than often if the project is big enough
the employer will make sure at least one person knows something about UX. That
fixes first issue.
Feedback also is available. We rarely see a closed-source program that doesn't
include telemetry. This provides usability data about interfaces that already
have been internally tested and developed in a strict pipeline.
No wonder closed-source UIs wins.
While we (community) cannot provide additional developers for many of the
projects. We cannot emphasize importance of user experience more (because we
already do it and in the end the maintainer will do what they want). But we can
find a ways to give developers feedback.
Incorporating telemetry in free and open codebase has already been done a few
times. Nevertheless it always:
- was met with lack of applause, sometimes even anger from the community;
- was done not in user-respecting way and could be treated as a regression.
Reactions of the community are very understandable. It is proprietary
software's fault that "telemetry" is considered almost a swear word. After all,
telemetry is generally used to provide business intelligence and tracking of
the user (to later serve ads or sell this data). It's obvious that this is
something unfit for privacy-focused projects.
To provide you with example we can take Audacity. This project (after
acquisition by Muse group) tried including telemetry 2 times. As developers
failed to understand concerns of the community it projected negative image of
how project was managed. As a result fork was created. All the controversies
are linked in Tenacity's `README.md` file.
[Tenacity - repository](https://git.sr.ht/~tenacity/tenacity#motivation)
To provide another example we can take KDE. Here the software does not beg you
to opt into telemetry. You can change it in settings. Nevertheless people are
still concerned with were the data goes (as you cannot be sure what runs on
remote servers).
[KDE telemetry policy](https://community.kde.org/Policies/Telemetry_Policy)
As you can see telemetry is very touchy topic when it comes to particular group
of users (that is not a bad thing). If we were to include it in some
hypothetical project, a lot of care should be put into how privacy-friendly
suggested solution is.
Current market solutions are not really up to the task as they all relay on
trusting the remote. This should not be the case as remote server can log your
IP and then tie your data together. When your few-month long history is
presented you can be deanonymised as this study shows
[Credit card study blows holes in anonymity](https://www.science.org/doi/full/10.1126/science.347.6221.468)
You could say:
But surely there is a way to not include any IP. Every user just needs a proxy.
Let's say that we do just that. Either with use of TOR network or any other
peer-to-peer solution. The data this way loses trust. As a developer I cannot
be sure if this telemetry wasn't swayed by some script kiddy who *really*
wants bigger "Post" button. They could sent few months worth of data in
an instant and I am helpless if that happens.
To improve data integrity there we can sign every packet with a proof-of-work
key. This is very similar to how mCaptcha works. Here is an basic overview:
[mCaptcha](https://mcaptcha.org/)
1. Client wants to send telemetry data to server.
2. Packet is properly formatted and prepared to be send (for example as a
`.json` document). One integer field "nonce" is left to be filled later.
3. Hash value of entire packet is computed. This can be done using MD5 or SHA2
or any other algorithm of developer's choice (although you should probably
lean towards something modern like SHA3[^2]).
4. If hash does not meet server's requirements the nonce field is changed
(incremented by one) and go back to step 2. The server requirements are
hard-coded into a client and can be something like this: hash value must
begin with "12345"[^3]. This does not serve any purpose other than making it hard
to guess nonce value giving this exact hash.
5. Packet with correct nonce is send to server.
6. Server computes hash only once to check if it really begins with "12345".
This ensures the client must do a lot more work than the server and makes it a
lot harder to send a lot of information at once. This way normal can users send
the data in a rate as it gathers and malicious actors are limited to their
machines computational power.
Please don't confuse this with DoS protection. Above mechanism is there to
protect integrity and not availability. There are different techniques to
prevent server overload such as discarding unusual traffic before processing
it.
[^1]: If you are a person who would report something like this then I'm glad.
You're making valuable contribution to free software. However, surely you can
understand the problem.
[^2]: Modern hash algorithms are not only safer from cryptographic standpoint
but also often faster to compute
[^3]: Hash requirements are chosen arbitrarily. The stricter they are the more
time client will have to be spend on computing valid hash. I.e. "ends with 6
ones" is a stricter requirement than "ends with 5 ones". Computing time
increases exponentially.