📅 Published 2021-08-28
I just read the Vice article that's been going around, "How Data Brokers Sell Access to the Backbone of the Internet." Needless to say, I am not happy about this.
How Data Brokers Sell Access to the Backbone of the Internet
This quote pretty much sums it up:
"I'm concerned that netflow data being offered for commercial purposes is a path to a dark fucking place," one source familiar with the data told Motherboard.
TL;DR? Here's a summary.
In some ways, it's not as bad as it sounds--it's actually very hard to fish out specific information from this ocean of traffic--but in other ways it's worse, because this market is still relatively young. This practice is completely unregulated, at least in the US, and as far as I know there is nothing legally preventing these companies (or other as-yet-unknown actors) from using this data for other purposes. Individuals are not even given the palliative of a meaningless "opt out" or "do not track" option. If you use the internet, you are *opted in*.
Once it is commoditized, this kind of technology only improves over time, becoming cheaper and more pervasive. Mature markets develop around the technology, vendors multiply, and buyers find more and more "interesting" uses for the data. Some examples include facial recognition in retail stores, pervasive use of ALPR systems to track drivers, and mobile carriers selling location data. Every week it seems we hear about some new abuse. Old news, right?
facial recognition in retail stores
use of ALPR systems to track drivers
mobile carriers selling location data
Almost everyone is aware that our privacy is vanishing. You of course have no day-to-day financial privacy, unless you use cash. You have little to no location privacy any longer, unless you live far out in the country, ride a bike, and refuse to carry a cell phone or shop at big stores. But I've always assumed that careful use of Tor or VPNs would allow me some degree of anonymity for browsing the web, at least when visiting websites that respect me by avoiding the use of trackers.
Unfortunately it appears that the "global adversary" threat model is no longer limited to nation-states. If BigCorp, Inc. wants to know who is behind a certain whistleblower leak, they can possibly find out, for the right price. (And yes, in theory this kind of data can be used to identify specific Tor users. See the paper "On the Effectiveness of Traffic Analysis Against Anonymity Networks Using Flow Records.")
On the Effectiveness of Traffic Analysis Against Anonymity Networks Using Flow Records
Simply banning this kind of collection altogether may not be a tenable option, as the companies mentioned in the article are trying to solve some legitimate problems. DDoS attacks, botnets, coordinated hacking activity, and cyberwarfare are a recurring nightmare on the modern internet. Security teams need some sort of targeted insight into this activity so they can coordinate a response, but this must be done in a way that preserves individual privacy. For one thing, law enforcement should be involved if specific individuals or IPs are sought, and anyone who has been deanonymized through such an investigation should be informed once the investigation is complete.
There are also uses for netflow data in academic research, but access must of course be tightly controlled and regulated. Institutional review boards should be brought up to speed on the potential abuses of this data so they can ensure proper oversight.
As for general commercial use? Data mining and advertising? In my opinion, this type of use should be strictly forbidden. There is huge potential for abuse here.
In the absence of government action, is there any way to protect your privacy against this? I have no idea. I've been thinking about this problem, and it's hard to come up with a good answer. There are a few important implications to consider.
If your ISP sells or trades this data, and you are accessing a site whose ISP (or DDoS protection service) also sells or trades this data, then you can likely be deanonymized by interested actors *even if you use a VPN*. Your risk goes up as you use a particular site more often, as the search becomes less like finding a needle in a haystack and more like looking for a regular pattern in the data. Your risk also goes up if you use less popular sites, as there is less overall traffic to observe.
As the technology improves, expect this to be used for marketing analytics and related uses. Observers may not know what you are viewing on "ilikemarijuana.com" or "sociallyunpopularopinion.com," but they will be able to discover that you spend an inordinate amount of time there.
I'm not sure if many governments are regulating this activity too closely. Regulations in Switzerland and the EU should theoretically prevent misuse of this data, but as a less visible market, this problem is likely not even on the radar for investigation or enforcement. As can be seen in this Reddit thread, EU companies are collecting netflow data, and many of the posters in the thread are not overly concerned about the related privacy implications. Depending on the practices of their "threat intelligence" vendors, they may be unwittingly passing on sensitive data to companies who use that data unethically, resell it to other vendors, or present it to others for analysis in ways that compromise privacy.
EU companies are collecting netflow data
Additionally, there has been plenty of criticism of the effectiveness and coverage of current laws. Swiss and EU privacy protections could be stronger (Gemini article available), and their legal requirements regarding data retention ensure that this data must be retained one way or another. If you have to hold the data anyway, and nobody is watching, why not get some use out of it? Even better, why not get a financial return to offset the cost of retaining the data? The incentives are all wrong.
Swiss and EU privacy protections could be stronger
Furthermore, even when vendors provide security controls to limit access to sensitive material, those controls are often inadequate and proper setup is prone to human error. For a recent example, multiple FBI employees accessed private data through Palantir because the data was entered with incorrect permissions. If netflow data is being collected and presented for analysis, it is likely that some novel abuse of the data will be possible, even if the vendor tries to protect it.
multiple FBI employees accessed private data through Palantir
Then of course there is the inevitable Amazon S3 bucket leak to contend with. This kind of data never remains secure at scale. If companies can't even guard social security numbers, why on earth would we expect them to succeed at protecting this data?
So the situation looks pretty bleak. Our data is being collected, there's not really a good solution for it, and it's unlikely that regulation will prevent it. Technical capabilities continue to improve. Future abuse of this information seems inevitable.
The silver lining here is that maybe this will finally spur innovation in new kinds of networks. From an anonymity perspective, the old centralized client-server model is dead. Many of us have long been aware that certain nation-states have this sort of capacity, but now that I see this power creeping into the corporate sector, I am done trying to monkey patch privacy onto the old internet. It's just not a medium that is privacy-capable.
Offline-capable networks like Secure Scuttlebutt, mesh networks, and other unconventional setups are the only possible way to achieve truly private and anonymous communications. I also see fully decentralized networks such as I2P or ZeroNet as providing a level of privacy above the clearnet, and maybe this is enough to make data mining too expensive to be worthwhile--but I wouldn't stake your freedom on it.
Another option that is becoming more and more common is the use of small, private, encrypted group conversations. These are still "private" in that nobody outside the group can tell exactly what is being said, but they should not be considered anonymous, as the participants can be identified given sufficient time. Motivated data miners could correlate the known interests of each participant to get a good idea of what they might be discussing. So this shouldn't be considered a true solution, although it is certainly more private than unsecured plaintext.
A huge advantage of the internet is that information can travel quickly across great distances, routing around damage. This is still an important capability, so it's unlikely that most people can just abandon the internet entirely. There are many tasks in life that require rapid communications, even if privacy may be sacrificed in the process.
On the other hand, there is a great deal of information that can move slowly, within more limited circles. Correspondence about personal matters, high-dollar contracts and transactions, and long-term business strategies are often more privacy-sensitive than they are time-critical. For journalists, activists, and lawyers, critical documents often just need to arrive safely without being noticed, and privacy here can be a matter of life and death. For these use cases, and more, perhaps the internet has outlived its usefulness.
---
Comments? Email the author: mntn at mntn.xyz