💾 Archived View for mizik.eu › blog › strong-vs-weak-data-linking › index.gmi captured on 2024-05-10 at 10:42:41. Gemini links have been rewritten to link to archived content
⬅️ Previous capture (2024-02-05)
-=-=-=-=-=-=-
Strong vs Weak data linking - Marián Mižik
2021-09-18 | 5 minutes reading | tags: Personal, Zettelkasten
I have been using a customized zettelkasten method for my personal knowledge database since university, but recently I have deleted all strong (hard) links from the data and I like it. Here is why...
Zettelkasten is basically a card index. Like the ones you could find in most of the bureaus and doctor offices in the past. Every card has some unique identification, some data, and is stored in the drawer. The drawer keeps together cards with some common semantics. That can be a starting letter, or field of work, address, or whatever else. On top of this, zettelkasten has added strong links. So you are adding IDs of some related cards to the footnote of the other card. This helped people in the past to find more content without necessity of going through it all.
In modern days, we have digitalized paper cards and organized them into the databases. Databases have this great ability to be queried based on the information that database tables contain. One additional data we can provide to our cards are weak links. In the world of social media, you would call them #tags. These keywords are adding additional semantic information to the data and of course, they can be queried too.
I started my zettelkasten in digital era, but before raise of major social networks. On the other hand, as an IT student, I was still familiar with the concept of weak linking. I also knew, I would be able to query my data with them, so I started adding tags since the beginning as a secondary linking mechanism. I used strong links (IDs of other cards in the footnote) as a primary one to do it the zettelkasten way, and also I thought it would be nice to have them, especially when they could be used as an interactive hyperlink (just like when reading stuff on the web)
When you have some data collection, and you want to keep it up to date and relevant, you have to go through it from time to time (for me maybe once a year), and add/remove links and keywords, to keep maximum relevant links alive and without false positives. Thanks to this, you can get more precise results when using the dataset. And as we know from Pulp Fiction movie, it takes time and effort :) So I began to analyse how important it is for me to have both strong and weak links in my data.
I have around 10k cards these days. This is not much, but not that little too. My empiric analysis shown, that I almost exclusively use full-text search and tag search to get subset of relevant content and I get very accurate results. I almost never used to continue to relating cards via hyperlinks, especially because after the search I already have all relevant card titles in front of me on one screen, and I am able to seek to the stuff I am looking for faster this way, than to move through the hyperlinks down the rabbit holes.
Relying on weak links only can cut you off from some types of content, that are semantically far from the subset you filtered, but in some cases, still relevant to be shown. I have had this problem from time to time. I remembered I have some other piece of information there and had to alter my query to find it. If I had much bigger set of cards (e.g. 100k), or my memory would be worse, I wouldn't get it. This problem can be worse if you would have much more data than me, data with many semantics, or your querying is not strong enough, like for example you don't know how to create complex queries, or your software is not able to query with logical operands like AND, OR and using parenthesis. Luckily, I don't have these issues. I would say, even without advanced queries, most of the people wouldn't encounter "missing cards" problem, if they have up to date and robust enough weak links.
Currently, I am confident, that setting up personal/work knowledge database using only weak linking is optimal for up to 15-20k cards if the data are homogenous enough and well maintained. You can save a couple of days a year of manual data and link optimizations. Also, your way of thinking about linking information between each other will be simplified, because you have to consider only one mechanism.
2024 Marian Mizik | License: CC BY-NC-SA 4.0 | marian at mizik dot sk | marian_mizik@bsd.network (mastodon)