💾 Archived View for tilde.club › ~winter › gemlog › 2024 › 8-02.gmi captured on 2024-08-31 at 12:35:40. Gemini links have been rewritten to link to archived content
⬅️ Previous capture (2024-08-18)
-=-=-=-=-=-=-
https://www.theregister.com/2024/06/28/microsoft_ceo_ai/
Microsoft CEO of AI: Your online content is 'freeware' fodder for training models
Over at The Register, Thomas Claburn writes that Mustafa Suleyman, CEO of Microsoft AI, claims that ML companies can basically scrape the web with impunity because anything put online that doesn't have a notice stating otherwise is essentially "freeware".
This is horseshit, and emphatically not how copyright works. The creator of the work can decide how it's used, or assign these rights, but that's it: there's nothing in copyright laws that says, "I saw it, so I can use it."
All 3 Major Labels Are Suing AI Start-ups for Copyright Infringement
Master List of lawsuits v. AI, ChatGPT, OpenAI, Microsoft, Meta, Midjourney & other AI cos.
The Center for Investigative Reporting (producer of Mother Jones and other publications) sees it that way too, and has sued Microsoft and OpenAI for copyright violation, amongst other things. They join a long list of companies doing the same.
GitHub: Licensing a Repository
It's funny, because GitHub (owned by Microsoft) is clear: for code to be truly open source. If you want your code to be reusable, it needs a license:
You're under no obligation to choose a license. However, without a license, the default copyright laws apply, meaning that you retain all rights to your source code and no one may reproduce, distribute, or create derivative works from your work.
Someone tell their CEO for AI.
Of course, this is all a game. Microsoft isn't going to be swayed by these arguments. And of course, if the shoe were on the other foot, you'd be met with the most expensive C&Ds Seattle has to offer. At this point, the only thing that will stop them (maybe?) are court judgements. Even then, will it? The last fifteen years has seen tech companies operating basically with impunity: settlements are cheap, and when you pull in billions every quarter, you can afford a lot of settlements.
It's an accelerating race: since the introduction of sludge machines like ChatGPT and others, there's been less and less human-created content on the open web. People don't make websites anymore, after all. They post on Facebook, on Instagram, on Reddit. This data is either walled off, or noted (in the case of Reddit) that it is most definitely not free for the scraping.
Lovingly curating a website or similar presence, over a period of years, the way we used to do? That's dead and gone.
With those sites vanishing, or vanished, with future iterations of these ML models needing an order of magnitude more data than was used for the current versions - well, you can see why companies are getting desperate. Anything that can feed the machine will be used as such, and Microsoft's made their position clear: they don't care if it's illegal. If you think it is, you're going to have to take them to court. Thankfully, a lot of organizations that can afford a well-paid legal team have decided to take them up on the offer.
In a just world, the kinds of tactics used by Microsoft and OpenAI would be met with appropriate sanctions. I remember in the 90s when the DOJ tried to break up Microsoft, ultimately abandoning the efforts. That's the kind of reaction warranted by Suleyman's proclamations. Deciding you're going to steal everything on the web, because you're able to write a crawler that can do this? That should bankrupt your company.
But I don't think it matters. The world wide web, at least how we saw it decades ago, is essentially dead, and the ML companies are carrion birds swooping in for the carcass. That's not to say that people aren't online anymore, or aren't getting any value out of the web - they are, clearly, and they are. But people's usage patterns are no longer those of contributors to an open corpus. Instead, we're retreating to our walled gardens. Social media (gated), WhatsApp chats, group texts. Off-web, unindexed. So we're making less. And the electronic commons is worse for it, of course. But that's where we are. And from the social media platforms, to the SEO chucklefucks, to the AI grifters, there's no shortage of blame. The web was created by scientists, nerds, and idealists. It's being destroyed by greed.