💾 Archived View for tilde.team › ~zerica › posts › thoughts-on-xmpp › index.gmi captured on 2021-11-30 at 20:18:30. Gemini links have been rewritten to link to archived content
-=-=-=-=-=-=-
October 11, 2021
Over the past few weeks, I've been silently working on one project I was looking forward to tinkering with: a graphical XMPP client.
My plan was to design the architecture in a nicely modular way: there would be a display-agnostic backend in charge of actually interacting with the XMPP server, which would expose an interface that the frontend could painlessly display in an interactive way.
I decided I'd build the latter through "web" technologies (HTML, JS, CSS, you know the deal), which was more than anything a pragmatic decision; I figured the ever-present issues of the modern web would be mostly irrelevant in a webkit desktop application, outside their native environment (absolutely no Electron, wouldn't want to subject any potential user to THAT). A secondary goal for me was to build an interface similar enough to Discord (where I'm trying to move away from) that the transition would be painless, and this seemed like an obvious equivalent platform. Also, I would be able to leverage all the existing solutions for UI development! An UI framework would let me focus on the parts that do matter while letting the client look "nice enough" without my inner perfectionist screeching in frustration.
Or, well, so I thought. The ecosystem is an ABSOLUTE MESS and it ended up being more practical for me to do a lot of the busywork myself. That's not what I'm here to talk about, however, so I won't get too into detail. Just know that with this alone I had gone through way past enough.
What I want to talk about is the XMPP protocol itself. It's envisioned as a bidirectional stream of XML data, where information is exchanged through specific self-contained elements, or, as the spec calls them, stanzas. On paper, this sounded perfectly fine to me, but as I was developing a proper way to decode them, I realized just how plain PAINFUL XML is to deal with.
I don't think XML is good at all as a way of exchanging information; it's generic to the point where the places something can go wrong are ENDLESS, and are you supposed to keep track of every little thing that can go wrong? What do you do with superfluous information? The specifications are riddled with MUSTs and SHOULDs and MAYs that ended up going over my head when implementing what should be very basic concepts (the exchange of messages, for one), and it's not always clear how the concepts map to an actual interface. I could be horridly misimplementing something and it would be very, very hard for me to know when my code happily parses any incoming data. Or there could be some edge case I failed to consider that apocalyptically brings the whole application down, because there wasn't a specific handler for it. You really can encode ANYTHING AT ALL with XML!!
I will admit theres a big chance I'm just approaching all of this with the wrong mindset. Perhaps XML really is the right tool for the job, and there's a way to design a deserializer that takes every little detail into account. Perhaps it's because there isn't really a good streaming XML library for my language that could do the whole job for me, and I sure wasn't going to put in the effort to write an idiomatic, universal solution. Who knows? Certainly not me.
So what can one even do about this? For now I think my best option is to simply leave the project on hold as it exists right now and come back when I feel more willing to tackle it. I'm even strongly considering to forget about the XMPP plumbing, and just use libpurple or something along with the UI work that's done.
But there's one idea I'm wondering about; what WOULD my ideal IM protocol look like? The sheer work that implementing XMPP is does not sit well with me, and Drew DeVault's post about chat apps got me to think about the question.
What should the next chat app look like?, by Drew DeVault
Another issue I have with XMPP, separately from any XML woes, is that it's not secure by design. E2EE extensions like OMEMO are great, but when (for example) your contact data is stored in plaintext in the server anyway, it just feels like quick fix for a largely structural issue. This is something that feels completely fixable if you're designing something from scratch.
Perhaps I'll write another post once I've given it some more thought. For now, I'm going to figure out what else would be cool to make. Until next time!!