💾 Archived View for oberdada.pollux.casa › gemlog › 2024-04-27_poem_generator.gmi captured on 2024-05-10 at 10:37:29. Gemini links have been rewritten to link to archived content
-=-=-=-=-=-=-
Some observations have emerged from my ongoing exploration of automatically generated concrete poems that I try to read and typeset. At the core is a first-order markov model of transition probabilities from one letter to the next. The analysed text corpus, which I have expanded since the first iteration, uses various European languages, restricted to an ascii character set. This restriction seriously distorts most languages, because accents and diacritical marks typically carry semantic meaning. An 'e' is different from an 'é', which should not be confused with 'è'; 'a' and 'ä' are different letters. The provisory solution was either to remove words with non-ascii characters, or drop the accents when it seemed permissible.
Long struggles with the string libraries and character representations available in plain C have led nowhere. There are wchars, but for whatever reason I have not been able to get them to work. Tackling the problem from another angle, using readymade strings of full words in various languages, provides a partial workaround. The ascii letters of such strings can still be manipulated, but any non-ascii letter is a stumbling block if one tries to replace the letter or reverse the string. I know this sort of problems are trivial, in a sense, and can be solved. Python seems to have handy functions for the full utf-8 character set. But I'm not a programmer, at best perhaps I qualify as what Solderpunk refers to as a "computer person" (*). Programming can be fun, even addictive but, lest I'm misreading Solderpunk, I agree that computation does not solve fundamental problems anymore than it introduces new ones of its own. Besides, I'm not naturally inclined to care for program code as an end in itself when I'm actually working on a practical problem in the realm of art.
A good part of the words generated by the first-order markov model tend to be hard to pronounce, so I have introduced a spell-checker that blocks some utterly failed attempts such as words with five consonants in a row. But making rules for acceptable words in not as straightforward as one might think. Sometimes a word like "hroinetl" is defensible for its expressiveness even though it defies all phonological common sense.
Even if I were able to solve the practical problems, making a program that spouts elegant concrete poems in larger numbers than anyone could hope to read in a lifetime is not actually desirable. Some slight imperfections in the output gives me as a writer an opportunity to scrutinise the expressions and subjectively determine if a word can be spelled differently or a phrase can be dropped. The crucial test is reading the poem aloud. This is what it must feel like to be dyslectic. Since the language is not given, it is not even a true conlang, there are no a priori pronounciation rules, no indication where the stress falls or how to interpret letter combinations such as -age- which have distinctly different pronounciations depending on whether it occurs in a German, English, or French word. The unusual sequence -tht- is hard to read but not too hard to pronounce as a voiceless th followed by t, and rewriting it as þt, using the Icelandic th character, makes it more readable.
The point of concrete poetry of course is to work with speech sounds or the graphical text image as such, without reference to semantic meaning. The Saussurean sign is separated into the signifier, the perception of the word as sound, and the signified, or the mental image of whatever the signifier stands for. With the constructed words, most of which are unfamiliar, it is as if they were pure signifiers without any signified attached to them. Inserting existing and familiar words into the context of a concrete poem disturbs its delicate balance in the same way as a major or minor triad sticks out as inappropriate in an atonal musical passage. However, it is very unlikely that longer than three letter familiar words show up. An occasional "the," "and," "it," "is," and the like can be tolerated. The context doesn't reinforce their interpretation as English words, nor does it deny it. Short words with different meanings in different languages tend to be common, such as the Dutch "en/of" (and/or in Enlish).
Many words nevertheless have some loose connotation of being verb-like or noun-like, and the short ones behave very much like articles, prepositions, and nouns. Perhaps certain declination patterns from some language can be recognised in a suffix. Also, a general synaesthetic sense of softness and hardness is conveyed by the famous words maluma/takete, which are universally understood although no precise meaning exists.
So far there is one animation with a reading of a poem. The animation has been done "by hand," that is, by moving around things in gimp and saving each frame. I usually say to myself: never again! when I end up doing that, but this one is just over a minute. Three full days of work.
This poem is from an earlier iteration of the program. Later versions turn out epic poems with lots of words and less insistent repetition. Perhaps too little repetition or redundancy. Some memorable turns of phrase occur:
o upemon abeman
ieu fanemaiten
but much of it drowns in its own text mass, not unlike Finnegan's Wake. Programming easily solves the (non)problem of generating huge quantities of material, not the finnicky aesthetic decision-making.
(*)
Solderpunk's smol earth update