๐พ Archived View for station.martinrue.com โบ marginalia โบ tinylog captured on 2023-12-28 at 17:42:25. Gemini links have been rewritten to link to archived content
โฌ ๏ธ Previous capture (2023-09-28)
-=-=-=-=-=-=-
author: marginalia
I'm on german radio xD https://www.deutschlandfunkkultur.de/google-suche-100.html
Weird day. Had an interview with Deutschlandfunk today about alt-search and the small web. Hopefully I didn't ramble too much. We'll see if and when the segment airs I guess.
I have so much I should be doing I'm opting to do nothing in front of a computer instead.
Working on open sourcing marginalia.nu with associated services. Bit of a project in its own, given it's 2 years of intertwined hobby projects in one big repo. But I think I'll get there, eventually, somehow.
It's looking like I might join the legion of unemployed geminauts soon. Kinda got mixed feelings toward this. Upside is I'm getting more time to work on my search engine. Maybe I'll draw a sad jimmy wales to plead for donations in the corner.
It is with extreme hesitation I share this game: https://explore.marginalia.nu/
I'm full of energy for the first time in what feels like forever. Dunno if it's winter finally passing or what, but I'm certainly not complaining.
https://filosofia.dickinson.edu/encyclopedia/ambiutopia/
I'm in the new yorker, like only a couple of paragraphs but still. Weird goings on keep on going on. https://www.newyorker.com/culture/infinite-scroll/what-google-search-isnt-showing-you
Last week: What? Russia is invading Ukraine?! Cut off Russia from the Internet! This week: What, Russia is cutting itself off from the Internet?! Make sure Russia isn't cut off from the Internet!
So my 'i have no capslock' post sort of blew up. First on HN. Now Elon Musk tweets it, and he discovers the site went down from 10,000 people clicking the link the same moment it's posted, so he deletes the tweet, and now there's like a weird brewing conspiracy theory about what this meant. The fuck.
Started watching "Godzilla: Singular Point" on netflix the other day. Not sure what I was expecting, but it was surprisingly hard sci-fi. Not to spoil too much, but semi plausible "show, don't tell" genetic algorithms are a central plot point. Neat.
https://interfacecritique.net/book/olia-lialina-from-my-to-me/
Anyone want a fun dataset to play with? I've published a link database from my search engine here: https://downloads.marginalia.nu/
I got an email from someone wanting to publissh stuff on Gemini, don't know what's the best advice to give them. Wasn't there a Gemini quick-start guide floating around a while back?
I built shuffle mode for the internet: https://search.marginalia.nu/explore/random (hint: use the explore buttons to guide the perusing)
www.flutopedia.com
This log4j/jndi shitstorm is entertaining to no end. At work we got a mass BCC email urging us to update the default jdk due to nebulous "licensing issues". Right, that seems totally legit. No CYA at all.
My URL database's is such a chonky boi, it takes 50 mintues to drop a column of ints.
http://www.panicresearch.com/
There should be a name for this aesthetic. TimeCube-punk, TempleOS-wave? http://www.dowsers.info/toronto/nov2008.htm
http://theboojum.com/Tales/Dumptruk/Dating/whos_datable_in_tristram.htm
New experiment: Search for pages that link to a domain (only available for top-domain), informed by standard ranking algos. https://search.marginalia.nu/search?query=links:circumlunar.space&profile=corpo&js=default
November Update of my search engine is in progress. It's gonna be a good one. Ought to be back to full speed in maybe a week? Still usable even at 0.5% index size, only a bit limited.
Is there something like a regexp-language, except generalized to sequences of objects? I want to be able to express patterns of properties in a list, and find matches in a way that isn't if((foo(i j) && bar(i j 1) && baz(i j 2)) || foo(i j) && ...
https://www.atarimagazines.com/
https://simplifier.neocities.org/
Got around to doing some long overdue refactoring. Had a bunch of that sort of code that fills you with dread when you think about touching it. Valiantly I slew the gorgon. The search index now has a quarter the disk footprint, and converts from forward index to reverse index in a third the time.
I didn't want to have to CDN up, but the botnet is really not giving me many options :-( I guess the upside is that it seems pretty effective at weeding them out.
I built... a thing. The design is super-unfinished, but it's pretty cool. Press the browse button to get links adjacent to the domain. https://search.marginalia.nu/search?query=browse:memex.marginalia.nu&profile=yolo&js=default
Currently have a botnet spamming my search engine. I've blocked a couple of thousand and things seem to be holding up, but if it goes know you know what happened. Really don't want to have to hide behind cloudflare or something like that. They seem pretty sketchy from a privacy standpoint.
I think this is reasonable: gemini://marginalia.nu/projects/edge/privacy.gmi
Recurring events in my search engine work: Finding easy optimizations that reduce the requirements by 90%, and finding bugs that drastically improve result qualities based on some like easy list-ordering tweak. I don't know how many times this has happened. They just seem to keep cropping up.
http://nausicaa.net/miyazaki/interviews/miyazaki_kurosawa_p1.html
http://www.lileks.com/misc/scifi/index.html
It turns out you can skew PageRank to heavily bias toward a certain subset of pages. It's even suggested in the original PR article. So I set it to skew toward personal blogs. The result is kinda amazing.
https://meatfighter.com/castlevania3-password/
Today's search engine gem: https://www.tim-mann.org/trs80/doc/Guide.txt
I wonder how many E-presses "O_CREAT" has saved since it was introduced in the posix standard.
This was a strange and deep rabbit hole. While testing my search engine, I found this. http://www.wild-seven.org/ It linked to this: http://www.zeruda.org/, and this http://ohmydarling.org/, and there's this https://psyche.nu/ ... there's even more if you poke around. It's the first time in a while I've felt like the Internet is gonna be ok.
http://www.winestockwebdesign.com/Essays/Eternal_Mainframe.html
My landlord has send me several emails and text messages reminding me to fill their anonymous tenant survey. Just... let that scenario marinate for a while and you'll get it.
You would think my search engine would at least struggle a bit when faced with a HackerNews front-page. You would think.
Another find. Sometimes it's hard to draw a line between shitposting and art: https://www.floppyswop.co.uk
gemini://marginalia.nu/projects/edge/top-20.gmi
Another interesting article: https://nullprogram.com/blog/2019/03/22/
This was amusing: https://worthdoingbadly.com/nn-adversarial/
Building a search engine is nothing for an instant gratification junkie. I think I've made huge improvements, but I won't know for certain until the dust settles in about a week.
You know, when I say link farms a big industry, I don't most people quite get the scope of just how big it is. I blacklisted over 20,000 domains today, from what looks like a single operation. Most of them expensive .com-tlds. That's a quarter million dollars a year in registration fees alone.
https://search.marginalia.nu/ will be (somewhat) useless the next 12-24 hours. I'm rebuilding the index. Sorry for any inconvenience. It will actually (probably) improve search quality though, so it's for the greater good (tm).
This was an interesting analysis: https://www.youtube.com/watch?v=1f5Xt5pZZZM
What would it take to make a text-focused mobile web browser, one that renders the most minimal of styling and disregards css and js? Like a w3m for android.
Removed 2 characters of code and saved myself 600 Gb of disk-writes per day ยฏ\_(ใ)_/ยฏ
Just looked at the reddit front page for the first time in a long while. Not signed in. Christ on an actual bike. Every other post is an ad, and what isn;t an ad is hot garbage. What has happened to reddit, and when did this happen? How does it still have users?
Oops, my capsule is a bit of a hard-to-navigate mess right now. I'm attempting to bridge https://memex.marginalia.nu/ and gemini://marginalia.nu/ in a way where both makes sense. Right now (I think) the HTTP version is better. But I'm working on bringing the gemini version up to speed.
Hello from my laptop! I installed Debian Bullseye on my HP Spectre x360. After some coaxing with the installer, it works. Like, surprisingly well. I was expecting a lot more hardware jank than I'm seeing. KDE5 deals with HiDPI very well. I honestly even prefer the touchpad behavior over what Windows 10 gave me.
Hot take: How much do you need to type before the time lost learning DVORAK at 7 WPM is made up for by mastering DVORAK and typing maybe somewhat faster than with QWERTY?
It's fascinating how some designs follow as a logical conclusion from basic principles. LISP is a great example of this; EMACS is its logical conclusion. Hypertext is another one of those simple designs that have the ability to grow into something incredibly powerful if you let it.
Been playing around with Floyd-Steinberg dithering using a weird color palette all day for an upcoming project (also because I like the aesthetic). Here's a car I rasterized: gemini://marginalia.nu/pics/volvo-raster.png
Honestly, I'm pretty impressed with the traffic I'm getting on my gemini server. I'm getting about 50 unique visitors on my gemini server every day. I get that on HTTPs too, but they're almost all bots and scripts.
I made a telnet ingress to my gemini server. Just log into marginalia.nu:9999 with putty or telnet or whatever, and enjoy. I guess I wanted to show the silliness of all these layers of abstractions we keep piling on. Fun project.
It turns out that if you put 50 million small files in an ext4 filesystem tuned for large files it fills your kernel up with inode information. Neat.
I spent the better part of the day tinkering with a wikipedia cleaner that generates stripped down HTML that's so clean you can read the the articles with netcat. It's supposed to be a part of my search engine, but it's pretty cool on its own. Check it out: https://search.marginalia.nu/wiki/Memex
I devised a fast compression scheme for my search engine dictionary which reduces its size to a third while still allowing O(1) lookups. I also had to implement my own hashmap because anything available was too generalized (and therefore wasting too much memory). A byte is a a gigabyte when your dictionary has a billion entries. A java object header is 8 bytes.
My only wish is that someone makes a browser plugin that plays a loud humming fan and floppydisk seeking nosies whenever a page has been javascripting for longer than a few seconds. Given the load time is straight out of Windows 3.1, the soundscape should be as well.
Anyone who feels that there needs to be more emoji should check out U 13000..U 1342F for the OG stuff ๐จ
Antenna is down. I was thinking about this when I brought marginalia.nu down the other day because of a storm, does gemini need a downdetector, or some way of communicating outages? (obviously if your server is down you can't host it yourself)
Got a mean mother of a thunder storm rolling in and I'm running my server with no UPS/surge protection at all, so my capsule and server is going for 12-24 hours :-( See you on the flip side! (I think I'll need to get a raspberry pi I can hook up as a replacement for future events like this.)
@martin I'd like to lodge a bug report. If you type plus (the character) in a comment, it gets turned into a space. Also, if this comment ends in a language developed by Thompson and Ritchie and not Bjarne Stroustrup, regular posts are affected as well: C
Spent the day adding support for word-pairs in the index of my search engine. It's not live yet, but it seems to work pretty well (at the expense of quadrupling the index size). Hopefully end of next week you might be able to find such things as Plan 9, or Windows XP, or D Day; as well as being able to exclusively show pages that contain a sequence of two words like "gemini client" or "midnight pub" as a surprisingly convincing fake free text search. I keep being surprised by how well this thing actually works.
Added a gemini ingress to my search engine for websites, results are ordered by how little javascript and markup they use. Like... a reverse SEO search engine: gemini://marginalia.nu/search?gemini -- will make it crawl gemini-space as well in the foreseeable future, but until then, enjoy exploring the more obscure corners of the big web.
I'm interested in adapting my search engine to crawl geminispace as well, but I know a lot of people are hosting their stuff on low power hardware like raspberry pis and whatnot, and I don't think robots.txt seems to be a thing. What's a good, polite and non-disruptive page-fetch interval do you guys reckon? I was thinking 1 sec per fetch, but that may even be a bit too high. 5s interval?
I am very much enjoying the DIY aspect of gemini so far. Yesterday I wanted to set up a server. Didn't like the software available, ended up building the server myself. It just served static files. Today I added a guestbook. Oh that's like 40 more lines. It's all just code. Almost nothing is configurable. If I want something, I add it. And no XML or YAML anywhere. Very pleasant.
Dear mr tech start-up: You've got 7 layers of docker containers that got snatched from some repository, thousands of NPM packages fetching themselves from repositories sketchier than warez sites outta the mid 00s, latest greatest kubernetes, virtualization and paravirtualization, compilation, obfuscation and transpilation, everything is run on someone else's computer running software you can't inspect, and all your traffic is encrypted by default so you can't inspect it, and most of it goes through CDNs so you can't tell where it's going, and you do HTTP2 with all its multiplexing capabilities. So how would you know if some of that code was maybe doing something more than it says on the box?
Only got advertisements, stopped watching TV. Only got unsolicited mail. Only use the postal service for receiving bills. Only got spam calls, stopped answering my phone. Only got spam mail, only use my email for signing up to stuff. Only got spam text messages. Stopped using text messages. Only got blogspam. Stopped checking the blogs. Only got promoted content. Stopped using facebook. I don't know if this merits yakety sax or a jaws music.