💾 Archived View for gemi.dev › gemlog › 2022-07-23-multi-language-gemipedia.gmi captured on 2024-09-29 at 00:05:32. Gemini links have been rewritten to link to archived content
⬅️ Previous capture (2023-07-10)
-=-=-=-=-=-=-
2022-07-23 | #gemipedia | @Acidus
I've updated Gemipedia so it now support accessing Wikipedia for other languages!
"F-11 Tiger" article in Polish.
"Wikipedia" article in German.
"Esperanto" article in Esperanto (of course).
While the Gemipedia interface will continue to be in English, article content, image captions, search results, and featured content of the day will be in the language you choose. When you select a language, it changes which instance of Wikipedia that Gemipedia talks to. As an example, the "Featured Content" for German Wikipedia won't simply be the popular content from English Wikipedia, translated to German. It's the featured article, and the top 25 most read articles, from German Wikipedia!
English Wikipedia: Featured content
German Wikipedia: Featured content
Each language-specific instance of Wikipedia is different. There are articles that exist in some, that don't exist in others. The amount of content will vary as well.
There is now a link on Gemipedia's home page which tell you the current selected language, and takes you to a page to change to.
I've included a list of quick links to select one of the languages which have the largest number of articles in their version of Wikipedia. If you want to read a Wikipedia in language that's not listed, there is a link at the bottom which lets you enter any 2 character country code.
Gemipedia article about ISO 3166-1 country codes
I've also started using an API which tell you if an article is available in other languages. At the bottom of every article in Gemipedia you'll see a link to "Read this article in another language".
Wikipedia has a pretty elegant design when it comes to supporting other languages. Each language has a different subdomain off of "wikipedia.org"
en.wikipedia.com ⬅️ English Wikipedia de.wikipedia.com ⬅️ German Wikipedia [country code].wikipedia.com ⬅️ any 2 digit country code
Wikipedia (and MediaWiki, the software that runs Wikipedia) has had various API versions. However all the APIs are accessed via different paths on the same hostname. So the only change I had to make to talk to other Wikipedias was to add a langugage option to my API wrapper:
Old Version: //only talks to English wikipedia var client = new WikipediaClient(); client.GetArtile(title); //talk to German wikipedia var client = new Wikipedia("de"); client.GetArticle(title);
Modifying the client and setting a default language took about 2 minutes. The rest of the work involved creating UI to change the language, persisting the language choice between page, and adding some creative features to switch between version of Wikipedia.
Honestly, persisting the language between pages was the biggest amount of work. This is because Gemini makes working with query strings a little complicated. The only way to get user input in Gemini is to send a "10 Input" response. This presents a dialog to the user to enter some text. That text that replaces whatever is in the query string of the URL. Since Gemipedia accepts user input in a few places (going to an article, searching), I can't really use the query string to hold data the way I would in a typical web application, since it will be clobbered.
Instead, I include the language choice at the end of the URL, after the MVC route info:
/cgi-bin/wp.cgi/view/en?cats ⬅️ View the article for "cats" on English Wikipedia /cgi-bin/wp.cgi/view/de?cats ⬅️ View the article for "cats" on German Wikipedia /cgi-bin/wp.cgi/images/en?Cat ⬅️ View the image gallery for the "Cat" article on English Wikipedia /cgi-bin/wp.cgi/images/de?Cat ⬅️ View the image gallery for the "Cat" article on German Wikipedia
This required that I refactor all the parts of the code that created Gemipedia links into a single place, which would ensure the current language was always included in a URL, and would parse the language out of incoming URLs from the users. This is better and cleaner code in the end, so it's good work to do.
I think the big reason Gemipedia is so successful is that it recognizes specific HTML that Wikipedia uses for things like image callouts, information boxes, and math formulas. It also knows what things can be skipped, like the "references" section, or the bibliography. Most of this involves a dozen or so CSS selectors that match on certain elements and tell the HTML converter what it can ignore.
For the most part, an "<table class='infobox'>" in the English Wikipedia is also a "<table class='infobox'>" in the Spanish Wikipedia, but not always. I've seen language-specific CSS classes, or entirely different HTML structures used in Wikipedias for different languages. This means that Gemipedia's awesome rendering and filtering
sometimes doesn't work. I'm trying to knock these down, and write more "cross-language" detection logic, but there is no easy way to do this at scale. So if you encounter a rendering issue, please drop me an email.