💾 Archived View for dioskouroi.xyz › thread › 29397267 captured on 2021-11-30 at 20:18:30. Gemini links have been rewritten to link to archived content
-=-=-=-=-=-=-
________________________________________________________________________________
What I like most about it is how easy it is to achieve something useful with a very moderate amount of code.
100%. One of the best things about both Wikipedia and Python IMO, neither may deliver perfect results but they get you WORKABLE results very quickly.
I was also delighted reading this article about writing a Python parser for Wikipedia on a Jekyll blog... because I did an eerily similar thing ~5 years ago and it's still my most starred repo -
https://roche.io/2016/05/scrape-wikipedia-with-python
. Small world :)
Best of luck with the project! On one hand it seems impossible with all the irregularities in article structure and being able to QA the long-tail of niche topics. But on the other if you can manage to wrangle 99% of it into a reliable query language... that can mean a lot to many other side projects!
Wikidata is the way to go. If you manage to get a machine readable form of Wikipedia knowledge which is not yet present in Wikidata, please consider contributing to Wikidata.
Good point on this too. I think there's value in allowing and exploring one-off / alternative views into Wikipedia especially where the data isn't accessible already - but long term any serious effort should be merged back into (or at least offered) to the official source.
I actually hope the WikipediaQL, when it will become a bit more mature, to be helpful in parsing Wikipedia data _into_ Wikidata. As of now, Wikidata lacks a lot of knowledge yet (the article talks about that, too).