2023-09-11 Oddµ memory consumption

Oddmu indexes all the Markdown pages when it starts up. It uses a trigram index implemented in Go. The index is maintained as pages are edited and forgotten when the wiki is stopped.

Oddmu

trigram index

in Go

Two key numbers are important:

And I now have some data:

This is a small wiki:

● oddmu.service - Oddmu
     Loaded: loaded (/etc/systemd/system/oddmu.service; enabled; preset: enabled)
     Active: active (running) since Mon 2023-09-11 16:03:01 CEST; 6h ago
   Main PID: 844316 (Oddµ)
      Tasks: 5 (limit: 7065)
     Memory: 19.9M (high: 120.0M max: 100.0M available: 80.0M)
        CPU: 1.891s
     CGroup: /system.slice/oddmu.service
             └─844316 /home/oddmu/oddmu

Sep 11 16:03:01 sibirocobombus systemd[1]: Started oddmu.service - Oddmu.
Sep 11 16:03:01 sibirocobombus oddmu[844316]: Indexing all pages
Sep 11 16:03:01 sibirocobombus oddmu[844316]: Serving a wiki on port 8080

The markdown files summed: 47371 bytes or 46 KiB.

This was computed using the following:

find . -name '*.md' -printf "%s\n" | awk '{sum+=$1} END{print sum+0}'

What about a bigger wiki? This is my personal wiki: 15567422 or 15.2 MiB

● alex.service - Oddmu for Alex Schroeder
     Loaded: loaded (/etc/systemd/system/alex.service; enabled; preset: enabled)
     Active: active (running) since Mon 2023-09-11 22:52:58 CEST; 10s ago
   Main PID: 949213 (Oddµ)
      Tasks: 5 (limit: 7065)
     Memory: 98.1M (high: 120.0M max: 150.0M available: 21.8M)
        CPU: 2.306s
     CGroup: /system.slice/alex.service
             └─949213 /home/oddmu/oddmu

Sep 11 22:52:58 sibirocobombus systemd[1]: Started alex.service - Oddmu for Alex Schroeder.
Sep 11 22:52:58 sibirocobombus oddmu[949213]: Indexing all pages
Sep 11 22:53:01 sibirocobombus oddmu[949213]: Serving a wiki on port 8081

So the 15 MiB of markdown files seem to have generated an index of 70 MiB.

That's odd. 🤔

I still have a branch with some full text search code. Perhaps it would use less memory? But then we get into the stemming dilemma. If you want to do stemming, you need to know the text languages. For trigram search, only the user doing the search needs to do the "stemming". If you're looking for "airport", for example, that'll find "airports", too. Not so when using full text search. There, you need to "stem" the word "airports" and only index "airport" – but then again, I guess things are tricky in German one way or another: one "Flughafen", two "Flughäfen" … oops!

​#Oddµ ​#Wiki ​#Software ​#Programming