Oddmu indexes all the Markdown pages when it starts up. It uses a trigram index implemented in Go. The index is maintained as pages are edited and forgotten when the wiki is stopped.
Two key numbers are important:
And I now have some data:
This is a small wiki:
● oddmu.service - Oddmu Loaded: loaded (/etc/systemd/system/oddmu.service; enabled; preset: enabled) Active: active (running) since Mon 2023-09-11 16:03:01 CEST; 6h ago Main PID: 844316 (Oddµ) Tasks: 5 (limit: 7065) Memory: 19.9M (high: 120.0M max: 100.0M available: 80.0M) CPU: 1.891s CGroup: /system.slice/oddmu.service └─844316 /home/oddmu/oddmu Sep 11 16:03:01 sibirocobombus systemd[1]: Started oddmu.service - Oddmu. Sep 11 16:03:01 sibirocobombus oddmu[844316]: Indexing all pages Sep 11 16:03:01 sibirocobombus oddmu[844316]: Serving a wiki on port 8080
The markdown files summed: 47371 bytes or 46 KiB.
This was computed using the following:
find . -name '*.md' -printf "%s\n" | awk '{sum+=$1} END{print sum+0}'
What about a bigger wiki? This is my personal wiki: 15567422 or 15.2 MiB
● alex.service - Oddmu for Alex Schroeder Loaded: loaded (/etc/systemd/system/alex.service; enabled; preset: enabled) Active: active (running) since Mon 2023-09-11 22:52:58 CEST; 10s ago Main PID: 949213 (Oddµ) Tasks: 5 (limit: 7065) Memory: 98.1M (high: 120.0M max: 150.0M available: 21.8M) CPU: 2.306s CGroup: /system.slice/alex.service └─949213 /home/oddmu/oddmu Sep 11 22:52:58 sibirocobombus systemd[1]: Started alex.service - Oddmu for Alex Schroeder. Sep 11 22:52:58 sibirocobombus oddmu[949213]: Indexing all pages Sep 11 22:53:01 sibirocobombus oddmu[949213]: Serving a wiki on port 8081
So the 15 MiB of markdown files seem to have generated an index of 70 MiB.
That's odd. 🤔
I still have a branch with some full text search code. Perhaps it would use less memory? But then we get into the stemming dilemma. If you want to do stemming, you need to know the text languages. For trigram search, only the user doing the search needs to do the "stemming". If you're looking for "airport", for example, that'll find "airports", too. Not so when using full text search. There, you need to "stem" the word "airports" and only index "airport" – but then again, I guess things are tricky in German one way or another: one "Flughafen", two "Flughäfen" … oops!
#Oddµ #Wiki #Software #Programming