💾 Archived View for gemini.bunburya.eu › newsgroups › gemini › messages › ssjhge$k74$1@dont-email.me… captured on 2022-03-01 at 15:42:09. Gemini links have been rewritten to link to archived content

View Raw

More Information

-=-=-=-=-=-=-

Simple conversions from HTML to simple markups are disappointing

Message headers

From: Luca Saiu <luca@ageinghacker.net>

Subject: Simple conversions from HTML to simple markups are disappointing

Date: Sun, 23 Jan 2022 13:25:29 +0100

Message-ID: <ssjhge$k74$1@dont-email.me>

Message content

--=-=-=

Content-Type: text/plain; charset=utf-8

Content-Transfer-Encoding: 8bit

Disgusted by the web with its anti-features, its enormous gratuitous

complexity and its essentially proprietary nature (the effort of

re-implementing a significant component from scratch is unrealistic for

single developers), I have recently opened Gemini and Gopher services

on ageinghacker.net .

I plan to publish new information on the simpler systems as well, rather

than only on my web site.

About existing web pages I had thought about machine-converting HTML to

Gopher and Gemini markup, using lynx -dump along with some simple Unix

shell scripting. The existing HTML on my web site is hand-written and

very simple; I had at least the good taste of not using JavaScript.

After experimenting for one day or two I have to admit that the result

is disappointing. The conversion is unnatural and I find that at the

same time some important information is lost (<tt> and <pre>) while some

which is irrelevant is preserved (icons). Having out-of-line links does

not help readability when references are numerous.

You can find my experiment at

gopher://ageinghacker.net

and

gemini://ageinghacker.net

, and my crude script attached to this message.

Now, it is possible to obtain a better conversion by spending more

effort: in particular lynx (which of course was never designed for this

task) is inadequate in preserving markup information. It is possible to

parse HTML instead, and start from an AST. On the other hand some fault

lies in the HTML source document as well: The document could have used,

for example, CSS for icons instead of <img> elements when the content

was not significant enough to deserve translation. However some style

information only encoded in CSS would be significant for translation:

had I used CSS in the place of old-style <tt> elements, recognising

“code”-type elements would have been an issue.

My html-to-gemini or html-to-gopher conversion would need a lot of the

complexity I want to avoid.

I have come to believe that the only really practical solution is

translating in the opposite direction: starting from a simple and clean

markup (I would say Gemini) and from that generating other simple

markups (Gopher) and the legacy system (HTML). This can and should

handle relative, intra-server links.

--

Luca Saiu -- http://ageinghacker.net

I support everyone's freedom of mocking any opinion or belief, no

matter how deeply held, with open disrespect and the same unrelented

enthusiasm of a toddler who has just learned the word "poo".

--=-=-=

Content-Type: application/octet-stream

Content-Disposition: attachment; filename=html2simple

Content-Transfer-Encoding: base64

IyEvYmluL2Jhc2gKIyBXcml0dGVuIGJ5IEx1Y2EgU2FpdSAgPGh0dHA6Ly9hZ2VpbmdoYWNrZXIu

bmV0PiBpbiAyMDIyLgojIFRoZSBhdXRob3IgcmVsZWFzZXMgdGhpcyBjb2RlIGludG8gdGhlIHB1

YmxpYyBkb21haW4sIHVwIHRvIHRoZSBleHRlbnQgb2YKIyB0aGUgYXBwbGljYWJsZSBsYXcuCgpz

Y3JpcHRuYW1lPSQwCgp0YXJnZXRtYXJrdXA9J2dlbWluaScKIyB0YXJnZXRtYXJrdXA9J2dvcGhl

cicKCiNpbnZpc2libGVjaGFyYWN0ZXI9J+KBpCcgIyBVKzIwNjQg4oCcaW52aXNpYmxlIHBsdXPi

gJ0KaW52aXNpYmxlY2hhcmFjdGVyPSfigI0nICAjIFUrMjAwRCDigJx6ZXJvLXdpZHRoIGpvaW5l

cuKAnQojaW52aXNpYmxlY2hhcmFjdGVyPSfigaRGT09GT09GT08nCgpjbGVhbnVwICgpCnsKICAg

IGlmIHRlc3QgIiR7dGVtcG9yYXJ5ZGlyZWN0b3J5fSIgPSAnJzsgdGhlbgogICAgICAgIHRydWUg

IyBEbyBub3RoaW5nLgogICAgZWxpZiAhIHRlc3QgLWQgIiR7dGVtcG9yYXJ5ZGlyZWN0b3J5fSI7

IHRoZW4KICAgICAgICBtZXNzYWdlPSJub3QgYSBkaXJlY3Rvcnk6ICR7dGVtcG9yYXJ5ZGlyZWN0

b3J5fSIKICAgICAgICB0ZW1wb3JhcnlkaXJlY3Rvcnk9JycKICAgICAgICBmYXRhbCAiJG1lc3Nh

Z2UiCiAgICBlbHNlCiAgICAgICAgcm0gLXJmICIke3RlbXBvcmFyeWRpcmVjdG9yeX0iCiAgICBm

aQp9CgpzdWNjZXNzICgpCnsKICAgIGNsZWFudXAKIyAgICBlY2hvICJTVUNDRVNTIgp9CgpmYXRh

bCAoKQp7CiAgICBlY2hvICJGQVRBTDogJEAiCiAgICBjbGVhbnVwCiAgICBleGl0IC0xCn0KCmlt

cG9zc2libGUgKCkKewogICAgZmF0YWwgJ3RoaXMgc2hvdWxkIG5ldmVyIGhhcHBlbicKfQoKc3lu

b3BzaXMgKCkKewogICAgZWNobyAiU1lOT1BTSVM6ICQwIGh0bWxmaWxlIgogICAgZmF0YWwgImlu

dmFsaWQgY29tbWFuZC1saW5lIGFyZ3VtZW50cyIKfQoKaWYgdGVzdCAiJCMiICE9IDE7IHRoZW4K

ICAgIHN5bm9wc2lzCmZpCgpodG1sZmlsZT0iJDEiCmh0bWxmaWxlPSIkKHJlYWRsaW5rIC0tY2Fu

b25pY2FsaXplICR7aHRtbGZpbGV9KSIKCmlmICEgdGVzdCAtZSAiJGh0bWxmaWxlIjsgdGhlbgog

ICAgZmF0YWwgImNvdWxkIG5vdCByZWFkIGZpbGUgJGh0bWxmaWxlIgpmaQoKdGVtcG9yYXJ5ZGly

ZWN0b3J5PSQobWt0ZW1wIC1kKQpjZCAiJHt0ZW1wb3JhcnlkaXJlY3Rvcnl9IgoKcGF0Y2hlZGh0

bWxmaWxlPSIke3RlbXBvcmFyeWRpcmVjdG9yeX0vcGF0Y2hlZC5odG1sIgpjYXQgIiR7aHRtbGZp

bGV9IiBcCiAgICB8IHNlZCAicy9eIy8ke2ludmlzaWJsZWNoYXJhY3Rlcn0jL2ciIFwKICAgIHwg

c2VkICdzLzxbIFx0XG5ccl0qaFwoMVx8Mlx8M1wpXChbXj5dKlwpPi88aFwxXDI+XGQwX1wxL2cn

IFwKICAgIHwgc2VkICdzL1xkMF8xLyMvZycgXAogICAgfCBzZWQgJ3MvXGQwXzIvIyMvZycgXAog

ICAgfCBzZWQgJ3MvXGQwXzMvIyMjL2cnIFwKICAgID4gIiR7cGF0Y2hlZGh0bWxmaWxlfSIKCnRl

eHRmaWxlPSIke3RlbXBvcmFyeWRpcmVjdG9yeX0vdGV4dCIKTENfQUxMPSAgbHlueCBcCiAgICAg

IC1kdW1wIFwKICAgICAgLWZvcmNlX2h0bWwgXAogICAgICAtdW5pcXVlX3VybHMgLWhpZGRlbmxp

bmtzPW1lcmdlIFwKICAgICAgLWltYWdlX2xpbmtzIFwKICAgICAgLW5vbWFyZ2lucyBcCiAgICAg

IC1kb250X3dyYXBfcHJlIFwKICAgICAgLWluZGV4PSdodHRwOi8vZm9vYmFycXV1eCcgXAogICAg

ICAiJHtwYXRjaGVkaHRtbGZpbGV9IiBcCiAgICAgID4gIiR7dGV4dGZpbGV9IgoKbnVtYmVyZm9y

bWF0PSclMDEwZCcKZm9ybWF0bnVtYmVyICgpCnsKICAgIHByaW50ZiAiJG51bWJlcmZvcm1hdCIg

IiRAIgp9Cnplcm9lcz0kKGZvcm1hdG51bWJlciAwKQoKY3NwbGl0IC0tc3VmZml4LWZvcm1hdD0i

JHtudW1iZXJmb3JtYXR9IiBcCiAgICAgICAiJHt0ZXh0ZmlsZX0iIFwKICAgICAgICcvXlJlZmVy

ZW5jZXMkLycgJ3sqfScgXAogICAgICAgPCAiJGh0bWxmaWxlIiBcCiAgICAgICA+IC9kZXYvbnVs

bAoKZnJhZ21lbnRubz0iJChscyB4eCogfCB3YyAtbCkiCmxhc3RmcmFnbWVudD0kKHByaW50ZiAi

eHgke251bWJlcmZvcm1hdH0iICQoKCAkKGVjaG8gJHtmcmFnbWVudG5vfSB8IHNlZCAncy9eMCov

LztzL14kLzAvJykgLSAxKSkpCgpwcmludGZyYWdtZW50ICgpCnsKICAgIGZyYWdtZW50PSJ4eCQo

Zm9ybWF0bnVtYmVyICRAKSIKICAgICNlY2hvIFByaW50aW5nIGZyYWdtZW50ICRmcmFnbWVudAog

ICAgY2F0ICIke2ZyYWdtZW50fSIKfQoKIyBQcmludCBldmVyeSBmcmFnbWVudCBidXQgdGhlIGxh

c3QuCmZpbHRlcl9ub25fcmVmZXJlbmNlcyAoKQp7CiAgICBjYXNlICR0YXJnZXRtYXJrdXAgaW4K

ICAgICAgICBnb3BoZXIpCiAgICAgICAgICAgIGNhdCBcCiAgICAgICAgICAgICAgICB8IHNlZCAi

cy9eXFsvJHtpbnZpc2libGVjaGFyYWN0ZXJ9Wy9nIiBcCiAgICAgICAgICAgICAgICB8IGZtdCAt

LXNwbGl0LW9ubHkgLS13aWR0aD0xNjA7OwogICAgICAgIGdlbWluaSkKICAgICAgICAgICAgIyBX

ZSBoYXZlIGFscmVhZHkgaGFuZGxlZCBeIyBiZWZvcmUgdGhpcyBwb2ludC4KICAgICAgICAgICAg

Y2F0IFwKICAgICAgICAgICAgICAgIHwgc2VkICJzL149Pi89JHtpbnZpc2libGVjaGFyYWN0ZXJ9

Pi9nIiBcCiAgICAgICAgICAgICAgICB8IHNlZCAicy9cYFxgXGAvXGAke2ludmlzaWJsZWNoYXJh

Y3Rlcn1cYFxgPT4vZyIKICAgICAgICAgICAgOzsKICAgICAgICAqKQogICAgICAgICAgICBpbXBv

c3NpYmxlOzsKICAgIGVzYWMKfQpmb3IgaSBpbiBgc2VxIDAgJCgoJHtmcmFnbWVudG5vfSAtIDIp

KWA7IGRvCiAgICBwcmludGZyYWdtZW50ICIkaSIgfCBmaWx0ZXJfbm9uX3JlZmVyZW5jZXMKZG9u

ZQoKIyBQcmludCB0aGUgbGFzdCBmcmFnbWVudCwgd2l0aCB0aGUgYXBwcm9wcmlhdGUgcmVwbGFj

ZW1lbnRzLgpmaWx0ZXJfcmVmZXJlbmNlcyAoKQp7CiAgICBlY2hvICIjI1JlZmVyZW5jZXMiCiAg

ICAjIGNhc2UgJHRhcmdldG1hcmt1cCBpbgogICAgIyAgICAgZ29waGVyKQogICAgIyAgICAgICAg

IDs7CiAgICAjICAgICBnZW1pbmkpCiAgICAjICAgICAgICAgZWNobyAiIyNSZWZlcmVuY2VzIjs7

CiAgICAjICAgICAqKQogICAgIyAgICAgICAgIGltcG9zc2libGU7OwogICAgIyAgICAgZXNhYwog

ICAgd2hpbGUgcmVhZCBudW1iZXIgdXJsOyBkbwogICAgICAgIG51bWJlcj0kKGVjaG8gJG51bWJl

ciB8IHNlZCAncy9cLiQvLycpCiAgICAgICAgY2FzZSAkdGFyZ2V0bWFya3VwIGluCiAgICAgICAg

ICAgIGdvcGhlcikKICAgICAgICAgICAgICAgIHRhcmdldHByZWZpeD0nJwogICAgICAgICAgICAg

ICAgdGFyZ2V0PSIkdXJsIgogICAgICAgICAgICAgICAgaWYgZWNobyAiJHVybCIgfCBncmVwIC1x

ICdeaHR0cHNcPzovLycgXAogICAgICAgICAgICAgICAgICAgJiYgISBlY2hvICIkdXJsIiB8IGdy

ZXAgLXEgJ15odHRwc1w/Oi8vYWdlXD9pbmdoYWNrZXJcLm5ldCc7IHRoZW4KICAgICAgICAgICAg

ICAgICAgICB0eXBlPSdoJwogICAgICAgICAgICAgICAgICAgIHRhcmdldHByZWZpeD0nVVJMOicK

ICAgICAgICAgICAgICAgIGVsaWYgZWNobyAiJHVybCIgfCBncmVwIC1xICdcLlwoaHRtbFw/XHx4

bWxcfHhodG1sXHxjc3NcfGpzXHwvXCkkJzsgdGhlbgogICAgICAgICAgICAgICAgICAgIHR5cGU9

JzEnCiAgICAgICAgICAgICAgICBlbGlmIGVjaG8gIiR1cmwiIHwgZ3JlcCAtcSAnXC5cKHRlXD94

dFwpJCc7IHRoZW4KICAgICAgICAgICAgICAgICAgICB0eXBlPScwJwogICAgICAgICAgICAgICAg

ZWxpZiBlY2hvICIkdXJsIiB8IGdyZXAgLXEgJ1wuXChqcGVcP2dcfGdpZlx8cG5nXCkkJzsgdGhl

bgogICAgICAgICAgICAgICAgICAgIHR5cGU9J0knCiAgICAgICAgICAgICAgICBlbGlmIGVjaG8g

IiR1cmwiIHwgZ3JlcCAtcSAnXC5cKHBzXHxwZGZcKSQnOyB0aGVuCiAgICAgICAgICAgICAgICAg

ICAgdHlwZT0nZCcKICAgICAgICAgICAgICAgIGVsaWYgZWNobyAiJHVybCIgfCBncmVwIC1xICdc

Llwob2d2XHx3ZWJtXHxhdmlcfG1wZ1x8bXA0XHxta3ZcKSQnOyB0aGVuCiAgICAgICAgICAgICAg

ICAgICAgdHlwZT0nOycgIyBNb3ZpZSBmaWxlCiAgICAgICAgICAgICAgICBlbGlmIGVjaG8gIiR1

cmwiIHwgZ3JlcCAtcSAnXC5cKGF1XHx3YXZcfG9nZ1x8bXAzXCkkJzsgdGhlbgogICAgICAgICAg

ICAgICAgICAgIHR5cGU9JzwnICMgU291bmQgZmlsZQogICAgICAgICAgICAgICAgZWxzZQogICAg

ICAgICAgICAgICAgICAgIHR5cGU9JzknICMgQmluYXJ5IGZpbGUKICAgICAgICAgICAgICAgIGZp

CiAgICAgICAgICAgICAgICBlY2hvICJbJHt0eXBlfXxbJG51bWJlcl0gJHVybHwkdGFyZ2V0cHJl

Zml4JHVybHxhZ2VpbmdoYWNrZXIubmV0fDcwXSI7OwogICAgICAgICAgICBnZW1pbmkpCiAgICAg

ICAgICAgICAgICBlY2hvICI9PiAkdXJsIFskbnVtYmVyXSAkdXJsIjs7CiAgICAgICAgICAgICop

CiAgICAgICAgICAgICAgICBpbXBvc3NpYmxlOzsKICAgICAgICBlc2FjCiAgICBkb25lCn0KcHJp

bnRmcmFnbWVudCAkKCgke2ZyYWdtZW50bm99IC0gMSkpIFwKICAgIHwgc2VkICdzL15SZWZlcmVu

Y2VzLy8nIFwKICAgIHwgZ3JlcCAtdiAnXiQnIFwKICAgIHwgZmlsdGVyX3JlZmVyZW5jZXMKCgoj

IGVjaG8gIlRoZXJlIGFyZSAke2ZyYWdtZW50bm99IGZyYWdtZW50cyIKIyBlY2hvICJsYXN0ZnJh

Z21lbnQgaXMgJHtsYXN0ZnJhZ21lbnR9IgojIGNhdCAke2xhc3RmcmFnbWVudH0KIyBscyAtbCBg

cmVhZGxpbmsgLS1jYW5vbmljYWxpemUgLmAKIyBjYXQgeHgkemVyb2VzCiNjYXQgJHRleHRmaWxl

CiMgY2F0ICR7cGF0Y2hlZGh0bWxmaWxlfQoKc3VjY2Vzcwo=

--=-=-=--

Related

Children:

Re: Simple conversions from HTML to simple markups are disappointing (by rtr <rtr@nospam.invalid> on Sun, 23 Jan 2022 20:37:50 +0800)

Re: Simple conversions from HTML to simple markups are disappointing (by bunburya <bunburya@tilde.club> on Sun, 23 Jan 2022 14:03:27 +0000)

Re: Simple conversions from HTML to simple markups are disappointing (by meff <email@example.com> on Sun, 23 Jan 2022 20:02:52 -0000 (UTC))