gemini - kennedy.gemi.dev

💾 Archived View for wilmhit.pw › blog › ethical-telemetry.gmi captured on 2024-08-18 at 17:23:49. Gemini links have been rewritten to link to archived content

-=-=-=-=-=-=-

---

title: "Is proof of work a way to ethical telemetry?"

date: 2022-08-13

draft: false

---

Free and open source project pride themselves with privacy. That is also one of

the aspects why users choose to use these projects. This model of development

has more advantages over closed one: faster development, clean code, outside

contributions and many more.

If asked only about privacy, one segment of FOSS software exceeds as the

ultimate champion: offline programs. If you obtained the installation files

in private way then you're covered. Doesn't matter what your ISP, cloud

provider or company's administrator are up to. If there is no traffic leaving

your device there can hardly be any tracking at all. Users like this and have

come to expect this.

FOSS has a problem

Most of the GUI FOSS software is written by a single person. That person not

only is benevolent dictator but also rarely occasion to discuss a feature with

someone before implementing it.

This person obviously has to know how to code, so can be called a programmer.

Programming is the only skill needed to write a program. Now, what is the

chance that this person knows at least basics of a good UI design?

Let's reverse the question. If a person is a UX expert, what is the chance that

this person will also become maintainer of FOSS app? Slim, I would say.

There is also another thing: users don't report usability problems. They are

not bugs so why bother? Bugtrackers are filled with actual bugs. Would you open

an issue titled "Post button is too small"? First, it sounds like your problem

and not a software issue. Second, this is really minor. Maintainer surely has

more burning issues to resolve.[^1]

Corporate is immune

If a corporation commits to build a new program there is always more then one

person assigned to that task. And more than often if the project is big enough

the employer will make sure at least one person knows something about UX. That

fixes first issue.

Feedback also is available. We rarely see a closed-source program that doesn't

include telemetry. This provides usability data about interfaces that already

have been internally tested and developed in a strict pipeline.

No wonder closed-source UIs wins.

While we (community) cannot provide additional developers for many of the

projects. We cannot emphasize importance of user experience more (because we

already do it and in the end the maintainer will do what they want). But we can

find a ways to give developers feedback.

Never done correctly

Incorporating telemetry in free and open codebase has already been done a few

times. Nevertheless it always:

- was met with lack of applause, sometimes even anger from the community;

- was done not in user-respecting way and could be treated as a regression.

Reactions of the community are very understandable. It is proprietary

software's fault that "telemetry" is considered almost a swear word. After all,

telemetry is generally used to provide business intelligence and tracking of

the user (to later serve ads or sell this data). It's obvious that this is

something unfit for privacy-focused projects.

To provide you with example we can take Audacity. This project (after

acquisition by Muse group) tried including telemetry 2 times. As developers

failed to understand concerns of the community it projected negative image of

how project was managed. As a result fork was created. All the controversies

are linked in Tenacity's `README.md` file.

[Tenacity - repository](https://git.sr.ht/~tenacity/tenacity#motivation)

To provide another example we can take KDE. Here the software does not beg you

to opt into telemetry. You can change it in settings. Nevertheless people are

still concerned with were the data goes (as you cannot be sure what runs on

remote servers).

[KDE telemetry policy](https://community.kde.org/Policies/Telemetry_Policy)

As you can see telemetry is very touchy topic when it comes to particular group

of users (that is not a bad thing). If we were to include it in some

hypothetical project, a lot of care should be put into how privacy-friendly

Zero-trust

Current market solutions are not really up to the task as they all relay on

trusting the remote. This should not be the case as remote server can log your

IP and then tie your data together. When your few-month long history is

presented you can be deanonymised as this study shows

[Credit card study blows holes in anonymity](https://www.science.org/doi/full/10.1126/science.347.6221.468)

You could say:

But surely there is a way to not include any IP. Every user just needs a proxy.

Let's say that we do just that. Either with use of TOR network or any other

peer-to-peer solution. The data this way loses trust. As a developer I cannot

be sure if this telemetry wasn't swayed by some script kiddy who *really*

wants bigger "Post" button. They could sent few months worth of data in

an instant and I am helpless if that happens.

Protect the data!

To improve data integrity there we can sign every packet with a proof-of-work

key. This is very similar to how mCaptcha works. Here is an basic overview:

[mCaptcha](https://mcaptcha.org/)

1. Client wants to send telemetry data to server.

2. Packet is properly formatted and prepared to be send (for example as a

`.json` document). One integer field "nonce" is left to be filled later.

3. Hash value of entire packet is computed. This can be done using MD5 or SHA2

or any other algorithm of developer's choice (although you should probably

lean towards something modern like SHA3[^2]).

4. If hash does not meet server's requirements the nonce field is changed

(incremented by one) and go back to step 2. The server requirements are

hard-coded into a client and can be something like this: hash value must

begin with "12345"[^3]. This does not serve any purpose other than making it hard

to guess nonce value giving this exact hash.

5. Packet with correct nonce is send to server.

6. Server computes hash only once to check if it really begins with "12345".

This ensures the client must do a lot more work than the server and makes it a

lot harder to send a lot of information at once. This way normal can users send

the data in a rate as it gathers and malicious actors are limited to their

machines computational power.

Not a DoS protection

Please don't confuse this with DoS protection. Above mechanism is there to

protect integrity and not availability. There are different techniques to

prevent server overload such as discarding unusual traffic before processing

it.

[^1]: If you are a person who would report something like this then I'm glad.

You're making valuable contribution to free software. However, surely you can

understand the problem.

[^2]: Modern hash algorithms are not only safer from cryptographic standpoint

but also often faster to compute

[^3]: Hash requirements are chosen arbitrarily. The stricter they are the more

time client will have to be spend on computing valid hash. I.e. "ends with 6

ones" is a stricter requirement than "ends with 5 ones". Computing time

increases exponentially.