💾 Archived View for gemini.bunburya.eu › newsgroups › gemini › messages › ssjhge$k74$1@dont-email.me… captured on 2022-03-01 at 15:42:09. Gemini links have been rewritten to link to archived content
-=-=-=-=-=-=-
From: Luca Saiu <luca@ageinghacker.net>
Subject: Simple conversions from HTML to simple markups are disappointing
Date: Sun, 23 Jan 2022 13:25:29 +0100
Message-ID: <ssjhge$k74$1@dont-email.me>
--=-=-=
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Disgusted by the web with its anti-features, its enormous gratuitous
complexity and its essentially proprietary nature (the effort of
re-implementing a significant component from scratch is unrealistic for
single developers), I have recently opened Gemini and Gopher services
on ageinghacker.net .
I plan to publish new information on the simpler systems as well, rather
than only on my web site.
About existing web pages I had thought about machine-converting HTML to
Gopher and Gemini markup, using lynx -dump along with some simple Unix
shell scripting. The existing HTML on my web site is hand-written and
very simple; I had at least the good taste of not using JavaScript.
After experimenting for one day or two I have to admit that the result
is disappointing. The conversion is unnatural and I find that at the
same time some important information is lost (<tt> and <pre>) while some
which is irrelevant is preserved (icons). Having out-of-line links does
not help readability when references are numerous.
You can find my experiment at
gopher://ageinghacker.net
and
gemini://ageinghacker.net
, and my crude script attached to this message.
Now, it is possible to obtain a better conversion by spending more
effort: in particular lynx (which of course was never designed for this
task) is inadequate in preserving markup information. It is possible to
parse HTML instead, and start from an AST. On the other hand some fault
lies in the HTML source document as well: The document could have used,
for example, CSS for icons instead of <img> elements when the content
was not significant enough to deserve translation. However some style
information only encoded in CSS would be significant for translation:
had I used CSS in the place of old-style <tt> elements, recognising
âcodeâ-type elements would have been an issue.
My html-to-gemini or html-to-gopher conversion would need a lot of the
complexity I want to avoid.
I have come to believe that the only really practical solution is
translating in the opposite direction: starting from a simple and clean
markup (I would say Gemini) and from that generating other simple
markups (Gopher) and the legacy system (HTML). This can and should
handle relative, intra-server links.
--
Luca Saiu -- http://ageinghacker.net
I support everyone's freedom of mocking any opinion or belief, no
matter how deeply held, with open disrespect and the same unrelented
enthusiasm of a toddler who has just learned the word "poo".
--=-=-=
Content-Type: application/octet-stream
Content-Disposition: attachment; filename=html2simple
Content-Transfer-Encoding: base64
IyEvYmluL2Jhc2gKIyBXcml0dGVuIGJ5IEx1Y2EgU2FpdSAgPGh0dHA6Ly9hZ2VpbmdoYWNrZXIu
bmV0PiBpbiAyMDIyLgojIFRoZSBhdXRob3IgcmVsZWFzZXMgdGhpcyBjb2RlIGludG8gdGhlIHB1
YmxpYyBkb21haW4sIHVwIHRvIHRoZSBleHRlbnQgb2YKIyB0aGUgYXBwbGljYWJsZSBsYXcuCgpz
Y3JpcHRuYW1lPSQwCgp0YXJnZXRtYXJrdXA9J2dlbWluaScKIyB0YXJnZXRtYXJrdXA9J2dvcGhl
cicKCiNpbnZpc2libGVjaGFyYWN0ZXI9J+KBpCcgIyBVKzIwNjQg4oCcaW52aXNpYmxlIHBsdXPi
gJ0KaW52aXNpYmxlY2hhcmFjdGVyPSfigI0nICAjIFUrMjAwRCDigJx6ZXJvLXdpZHRoIGpvaW5l
cuKAnQojaW52aXNpYmxlY2hhcmFjdGVyPSfigaRGT09GT09GT08nCgpjbGVhbnVwICgpCnsKICAg
IGlmIHRlc3QgIiR7dGVtcG9yYXJ5ZGlyZWN0b3J5fSIgPSAnJzsgdGhlbgogICAgICAgIHRydWUg
IyBEbyBub3RoaW5nLgogICAgZWxpZiAhIHRlc3QgLWQgIiR7dGVtcG9yYXJ5ZGlyZWN0b3J5fSI7
IHRoZW4KICAgICAgICBtZXNzYWdlPSJub3QgYSBkaXJlY3Rvcnk6ICR7dGVtcG9yYXJ5ZGlyZWN0
b3J5fSIKICAgICAgICB0ZW1wb3JhcnlkaXJlY3Rvcnk9JycKICAgICAgICBmYXRhbCAiJG1lc3Nh
Z2UiCiAgICBlbHNlCiAgICAgICAgcm0gLXJmICIke3RlbXBvcmFyeWRpcmVjdG9yeX0iCiAgICBm
aQp9CgpzdWNjZXNzICgpCnsKICAgIGNsZWFudXAKIyAgICBlY2hvICJTVUNDRVNTIgp9CgpmYXRh
bCAoKQp7CiAgICBlY2hvICJGQVRBTDogJEAiCiAgICBjbGVhbnVwCiAgICBleGl0IC0xCn0KCmlt
cG9zc2libGUgKCkKewogICAgZmF0YWwgJ3RoaXMgc2hvdWxkIG5ldmVyIGhhcHBlbicKfQoKc3lu
b3BzaXMgKCkKewogICAgZWNobyAiU1lOT1BTSVM6ICQwIGh0bWxmaWxlIgogICAgZmF0YWwgImlu
dmFsaWQgY29tbWFuZC1saW5lIGFyZ3VtZW50cyIKfQoKaWYgdGVzdCAiJCMiICE9IDE7IHRoZW4K
ICAgIHN5bm9wc2lzCmZpCgpodG1sZmlsZT0iJDEiCmh0bWxmaWxlPSIkKHJlYWRsaW5rIC0tY2Fu
b25pY2FsaXplICR7aHRtbGZpbGV9KSIKCmlmICEgdGVzdCAtZSAiJGh0bWxmaWxlIjsgdGhlbgog
ICAgZmF0YWwgImNvdWxkIG5vdCByZWFkIGZpbGUgJGh0bWxmaWxlIgpmaQoKdGVtcG9yYXJ5ZGly
ZWN0b3J5PSQobWt0ZW1wIC1kKQpjZCAiJHt0ZW1wb3JhcnlkaXJlY3Rvcnl9IgoKcGF0Y2hlZGh0
bWxmaWxlPSIke3RlbXBvcmFyeWRpcmVjdG9yeX0vcGF0Y2hlZC5odG1sIgpjYXQgIiR7aHRtbGZp
bGV9IiBcCiAgICB8IHNlZCAicy9eIy8ke2ludmlzaWJsZWNoYXJhY3Rlcn0jL2ciIFwKICAgIHwg
c2VkICdzLzxbIFx0XG5ccl0qaFwoMVx8Mlx8M1wpXChbXj5dKlwpPi88aFwxXDI+XGQwX1wxL2cn
IFwKICAgIHwgc2VkICdzL1xkMF8xLyMvZycgXAogICAgfCBzZWQgJ3MvXGQwXzIvIyMvZycgXAog
ICAgfCBzZWQgJ3MvXGQwXzMvIyMjL2cnIFwKICAgID4gIiR7cGF0Y2hlZGh0bWxmaWxlfSIKCnRl
eHRmaWxlPSIke3RlbXBvcmFyeWRpcmVjdG9yeX0vdGV4dCIKTENfQUxMPSAgbHlueCBcCiAgICAg
IC1kdW1wIFwKICAgICAgLWZvcmNlX2h0bWwgXAogICAgICAtdW5pcXVlX3VybHMgLWhpZGRlbmxp
bmtzPW1lcmdlIFwKICAgICAgLWltYWdlX2xpbmtzIFwKICAgICAgLW5vbWFyZ2lucyBcCiAgICAg
IC1kb250X3dyYXBfcHJlIFwKICAgICAgLWluZGV4PSdodHRwOi8vZm9vYmFycXV1eCcgXAogICAg
ICAiJHtwYXRjaGVkaHRtbGZpbGV9IiBcCiAgICAgID4gIiR7dGV4dGZpbGV9IgoKbnVtYmVyZm9y
bWF0PSclMDEwZCcKZm9ybWF0bnVtYmVyICgpCnsKICAgIHByaW50ZiAiJG51bWJlcmZvcm1hdCIg
IiRAIgp9Cnplcm9lcz0kKGZvcm1hdG51bWJlciAwKQoKY3NwbGl0IC0tc3VmZml4LWZvcm1hdD0i
JHtudW1iZXJmb3JtYXR9IiBcCiAgICAgICAiJHt0ZXh0ZmlsZX0iIFwKICAgICAgICcvXlJlZmVy
ZW5jZXMkLycgJ3sqfScgXAogICAgICAgPCAiJGh0bWxmaWxlIiBcCiAgICAgICA+IC9kZXYvbnVs
bAoKZnJhZ21lbnRubz0iJChscyB4eCogfCB3YyAtbCkiCmxhc3RmcmFnbWVudD0kKHByaW50ZiAi
eHgke251bWJlcmZvcm1hdH0iICQoKCAkKGVjaG8gJHtmcmFnbWVudG5vfSB8IHNlZCAncy9eMCov
LztzL14kLzAvJykgLSAxKSkpCgpwcmludGZyYWdtZW50ICgpCnsKICAgIGZyYWdtZW50PSJ4eCQo
Zm9ybWF0bnVtYmVyICRAKSIKICAgICNlY2hvIFByaW50aW5nIGZyYWdtZW50ICRmcmFnbWVudAog
ICAgY2F0ICIke2ZyYWdtZW50fSIKfQoKIyBQcmludCBldmVyeSBmcmFnbWVudCBidXQgdGhlIGxh
c3QuCmZpbHRlcl9ub25fcmVmZXJlbmNlcyAoKQp7CiAgICBjYXNlICR0YXJnZXRtYXJrdXAgaW4K
ICAgICAgICBnb3BoZXIpCiAgICAgICAgICAgIGNhdCBcCiAgICAgICAgICAgICAgICB8IHNlZCAi
cy9eXFsvJHtpbnZpc2libGVjaGFyYWN0ZXJ9Wy9nIiBcCiAgICAgICAgICAgICAgICB8IGZtdCAt
LXNwbGl0LW9ubHkgLS13aWR0aD0xNjA7OwogICAgICAgIGdlbWluaSkKICAgICAgICAgICAgIyBX
ZSBoYXZlIGFscmVhZHkgaGFuZGxlZCBeIyBiZWZvcmUgdGhpcyBwb2ludC4KICAgICAgICAgICAg
Y2F0IFwKICAgICAgICAgICAgICAgIHwgc2VkICJzL149Pi89JHtpbnZpc2libGVjaGFyYWN0ZXJ9
Pi9nIiBcCiAgICAgICAgICAgICAgICB8IHNlZCAicy9cYFxgXGAvXGAke2ludmlzaWJsZWNoYXJh
Y3Rlcn1cYFxgPT4vZyIKICAgICAgICAgICAgOzsKICAgICAgICAqKQogICAgICAgICAgICBpbXBv
c3NpYmxlOzsKICAgIGVzYWMKfQpmb3IgaSBpbiBgc2VxIDAgJCgoJHtmcmFnbWVudG5vfSAtIDIp
KWA7IGRvCiAgICBwcmludGZyYWdtZW50ICIkaSIgfCBmaWx0ZXJfbm9uX3JlZmVyZW5jZXMKZG9u
ZQoKIyBQcmludCB0aGUgbGFzdCBmcmFnbWVudCwgd2l0aCB0aGUgYXBwcm9wcmlhdGUgcmVwbGFj
ZW1lbnRzLgpmaWx0ZXJfcmVmZXJlbmNlcyAoKQp7CiAgICBlY2hvICIjI1JlZmVyZW5jZXMiCiAg
ICAjIGNhc2UgJHRhcmdldG1hcmt1cCBpbgogICAgIyAgICAgZ29waGVyKQogICAgIyAgICAgICAg
IDs7CiAgICAjICAgICBnZW1pbmkpCiAgICAjICAgICAgICAgZWNobyAiIyNSZWZlcmVuY2VzIjs7
CiAgICAjICAgICAqKQogICAgIyAgICAgICAgIGltcG9zc2libGU7OwogICAgIyAgICAgZXNhYwog
ICAgd2hpbGUgcmVhZCBudW1iZXIgdXJsOyBkbwogICAgICAgIG51bWJlcj0kKGVjaG8gJG51bWJl
ciB8IHNlZCAncy9cLiQvLycpCiAgICAgICAgY2FzZSAkdGFyZ2V0bWFya3VwIGluCiAgICAgICAg
ICAgIGdvcGhlcikKICAgICAgICAgICAgICAgIHRhcmdldHByZWZpeD0nJwogICAgICAgICAgICAg
ICAgdGFyZ2V0PSIkdXJsIgogICAgICAgICAgICAgICAgaWYgZWNobyAiJHVybCIgfCBncmVwIC1x
ICdeaHR0cHNcPzovLycgXAogICAgICAgICAgICAgICAgICAgJiYgISBlY2hvICIkdXJsIiB8IGdy
ZXAgLXEgJ15odHRwc1w/Oi8vYWdlXD9pbmdoYWNrZXJcLm5ldCc7IHRoZW4KICAgICAgICAgICAg
ICAgICAgICB0eXBlPSdoJwogICAgICAgICAgICAgICAgICAgIHRhcmdldHByZWZpeD0nVVJMOicK
ICAgICAgICAgICAgICAgIGVsaWYgZWNobyAiJHVybCIgfCBncmVwIC1xICdcLlwoaHRtbFw/XHx4
bWxcfHhodG1sXHxjc3NcfGpzXHwvXCkkJzsgdGhlbgogICAgICAgICAgICAgICAgICAgIHR5cGU9
JzEnCiAgICAgICAgICAgICAgICBlbGlmIGVjaG8gIiR1cmwiIHwgZ3JlcCAtcSAnXC5cKHRlXD94
dFwpJCc7IHRoZW4KICAgICAgICAgICAgICAgICAgICB0eXBlPScwJwogICAgICAgICAgICAgICAg
ZWxpZiBlY2hvICIkdXJsIiB8IGdyZXAgLXEgJ1wuXChqcGVcP2dcfGdpZlx8cG5nXCkkJzsgdGhl
bgogICAgICAgICAgICAgICAgICAgIHR5cGU9J0knCiAgICAgICAgICAgICAgICBlbGlmIGVjaG8g
IiR1cmwiIHwgZ3JlcCAtcSAnXC5cKHBzXHxwZGZcKSQnOyB0aGVuCiAgICAgICAgICAgICAgICAg
ICAgdHlwZT0nZCcKICAgICAgICAgICAgICAgIGVsaWYgZWNobyAiJHVybCIgfCBncmVwIC1xICdc
Llwob2d2XHx3ZWJtXHxhdmlcfG1wZ1x8bXA0XHxta3ZcKSQnOyB0aGVuCiAgICAgICAgICAgICAg
ICAgICAgdHlwZT0nOycgIyBNb3ZpZSBmaWxlCiAgICAgICAgICAgICAgICBlbGlmIGVjaG8gIiR1
cmwiIHwgZ3JlcCAtcSAnXC5cKGF1XHx3YXZcfG9nZ1x8bXAzXCkkJzsgdGhlbgogICAgICAgICAg
ICAgICAgICAgIHR5cGU9JzwnICMgU291bmQgZmlsZQogICAgICAgICAgICAgICAgZWxzZQogICAg
ICAgICAgICAgICAgICAgIHR5cGU9JzknICMgQmluYXJ5IGZpbGUKICAgICAgICAgICAgICAgIGZp
CiAgICAgICAgICAgICAgICBlY2hvICJbJHt0eXBlfXxbJG51bWJlcl0gJHVybHwkdGFyZ2V0cHJl
Zml4JHVybHxhZ2VpbmdoYWNrZXIubmV0fDcwXSI7OwogICAgICAgICAgICBnZW1pbmkpCiAgICAg
ICAgICAgICAgICBlY2hvICI9PiAkdXJsIFskbnVtYmVyXSAkdXJsIjs7CiAgICAgICAgICAgICop
CiAgICAgICAgICAgICAgICBpbXBvc3NpYmxlOzsKICAgICAgICBlc2FjCiAgICBkb25lCn0KcHJp
bnRmcmFnbWVudCAkKCgke2ZyYWdtZW50bm99IC0gMSkpIFwKICAgIHwgc2VkICdzL15SZWZlcmVu
Y2VzLy8nIFwKICAgIHwgZ3JlcCAtdiAnXiQnIFwKICAgIHwgZmlsdGVyX3JlZmVyZW5jZXMKCgoj
IGVjaG8gIlRoZXJlIGFyZSAke2ZyYWdtZW50bm99IGZyYWdtZW50cyIKIyBlY2hvICJsYXN0ZnJh
Z21lbnQgaXMgJHtsYXN0ZnJhZ21lbnR9IgojIGNhdCAke2xhc3RmcmFnbWVudH0KIyBscyAtbCBg
cmVhZGxpbmsgLS1jYW5vbmljYWxpemUgLmAKIyBjYXQgeHgkemVyb2VzCiNjYXQgJHRleHRmaWxl
CiMgY2F0ICR7cGF0Y2hlZGh0bWxmaWxlfQoKc3VjY2Vzcwo=
--=-=-=--
Children: