💾 Archived View for rawtext.club › ~winter › gemlog › 2023 › 8-10.gmi captured on 2023-09-08 at 16:20:28. Gemini links have been rewritten to link to archived content

-=-=-=-=-=-=-

Effluent and Pennies: The Amazon AI Flood

Since the explosion of ChatGPT, Midjourney, and other forms of generative AI, the word "hallucination" has experienced a spike in popularity. People have used it to describe when a generative system produces an unexpected result, something that we, with our expectations of truth, would call false, or a lie (or whatever).

It's been strange seeing the word used in this context for a number of reasons:

Generative AI doesn't see anything; how can it hallucinate?
Generative AI has no conception of truth; how could it emit anything false?

What seems to have been lost over the past months, both in the popular consciousness and in tech journalism, is the understanding of just how dumb these models are: they produce text based on a prompt. The models are trained (typically) via a (massive) neutral network with some number of transformers, which allow for a sort of short-term memory - without these, the text prediction would be too local, and the model would "lose track" of things it said before.

And that's it! There's no ontological reasoning, no understanding or representation of the word beyond tokens in a corpus, or training set, none of which have any inherent meaning to the model. There are many, many forms of AI, my own background being in Bayesian networks. Neural networks, even with the flurry of recent research, remain just one of many approaches. The way they've dominated the popular imagination would make my undergraduate thesis advisor weep.

So given this: how does a model "hallucinate"? If I ask it to describe me, and it says I'm a noted North American poet from Calgary (I'm not), it's not hallucinating, telling half-truths, or being tricksy; these are just the most likely statistical outcomes given the training based on whatever prompt I gave it.

And yet we still talk about these models lying, or making things up, and I suppose the latter is sort of true, because from the perspective of such a system, everything it provides in response to a prompt is made up according to its billions of internal weights.

Amazon is being flooded with books written entirely by AI

We're losing track of truth, not understanding that these strange and opaque processes have no notion of such themselves. But we're also losing track of truth for a different reason. For centuries, we've trusted that if something has been published, it's passed through a number of well-known steps: an author has written the work, or an editor compiled it; another editor has read through it for style and consistency; a copy editor has taken a pass to standardize spelling, grammar, tone. And while this may not always be the case, particularly around self publishing, the basic set of steps needed to get information into the world was largely unchanged for hundreds of years.

The internet, of course, torpedoed that. The great democratizer. Anyone could write anything, put it into a website, make it available online. And a lot of people did, and do. Websites aren't dead, or a dying art, but are certainly diminished from twenty years ago. And in their heyday, let's say 1995-2003, back when people were still concerned with creating websites, rather than managing content, there was a veritable flood of information. Pages about TV shows, beloved family cats, Sailor Moon dubs, weather cams tracking south-east Ohio. But the crucial part of this is that even though the time required to publish dropped off a cliff, there was still an undeniable (and charming) human element. It was amateur, sure. But on the whole, it was all written by people who cared, and wow if that hasn't been lacking online for the last fifteen years.

I Would Rather See My Books Get Pirated Than This (Or: Why Goodreads and Amazon Are Becoming Dumpster Fires)

On my Mastodon feed (and also when checking Hacker News), I came across Jane Friedman's article about how people were creating all kinds of AI-generated garbage books and uploading them to Amazon's print-on-demand service using her own name. Friedman is a professor and writer, and has written extensively on the nuts and bolts of how to get published. It's maybe no surprise that the scammers have come for this particular niche: I can tell you from my own searches and hair-pulling decades ago, it's popular, and enduring. Novice writers love this well-worn genre, hopeful for the smallest hint that might tip the scales in their favour (the hard truth: read more; a lot more). But even worse than the appropriation of one's own name for garbage text is what came next: all these books started getting linked to Friedman's Goodreads profile. And she had a hell of a time trying to convince people to, you know, actually take them off.

This is the world we live in now, a world where those without morals can and will flood the world with a million varities of effluent, hoping to make pennies off each.

A New Frontier for Travel Scammers: A.I.-Generated Guidebooks

A few days ago I read the above article from the New York Times about how the scammers have come for paper guidebooks, too. I admit this isn't something that affects me much - in the past, we've always made itineraries on paper, sent them to family, and recently worked TripAdvisor into the mix as well - but it's still a booming business, and I guess you have to only scam a fraction of a fraction of a percentage of the people buying them to make it worthwhile. You just need to flood the market with every perceivable country, city, region. Effluent and pennies. AI generators make this easy.

I should be surprised, but I'm not. I'm old enough to be young when the first spam messages started coming in, and I remember the righteous indignation people felt when they realized that people were flooding their inbox with trash because it was easy and because they could. Filters eventually caught up with the scammers, and vast amounts of spam is now automatically nixed before it even reaches my inbox. Somewhere, a Nigerian prince sheds a tear.

I suspect there'll be a solution to this at some point soon: we are in the early days of a new wild west of the web. It's a strange feeling, knowing that you're standing on the edge of a major transformation. It felt that way when I was talking every night with people in New Mexico and Ireland and Montreal on the cusp of this current century. I was younger then, and didn't have the full appreciation for what was going on. But I knew that all of "this" - the mix of software and services I was incorporating into the fabric of my life - the web, ICQ, AIM, all in its relative infancy - was rapidly changing how I was living my life. And I have the same feeling, in a very different way, seeing the generated, human-free results in Google, in Amazon, ad infinitum.

This is a change. I can see how some of it will play out. Speaking from experience, much more will surprise me.

What I don't know is whether the solution will look like. I suspect it'll be part technological, part legal, part social. I wouldn't be shocked if companies like Amazon start charging more for listing products, to make it not worthwhile to flood listings with crap. But maybe I'm an optimist. Maybe none of this will happen, and it's definitely downhill from here.

I suspect there will be requirements around labelling and generative media (or whatever we end up calling it), that full refunds must be available on request. I just don't know if these will have any teeth, if the associated fines simply become the cost of doing business, as with so much else.

But the last part of the solution, the social, will be the most critical. Trust, I suspect, will be an enormous part of purchasing decisions going forward. If I'm buying a travel guide, it'll be a Lonely Planet. If I'm going to London, maybe the classic A-Z.

Or I'll ask people I know, people who I know are actually people. Because if there isn't a solution to the flood of AI-generated everything, the internet is going to splinter and fragment. There'll be the infinite wastes, full of SEO copy, garbage books, counterfeit cat beds. And there'll be the places populated by people - places like Gemini, Discord, the Fediverse, VRChat, Cohost - which aren't perfect, but are at least good, good enough, full of friends and strangers that remind us of the way things used to be.

gemlog