馃懡 krixano

Read this abridged paper on SALSA (an "improved" version of HITS) yesterday, for those people who want to condescend me about rejecting other people's work. It's fine if you want popularity-searching, and it *better* fixes some of HITS's core problems that result in the TKC (Tight Knit Community) effect. It seems to me instead of picking the "best" authority to put at the top, it basically just spreads the top results a bit more between the other top authorities. The authors fully admit that anything below the top 10 results is probably not ordered well, but they justify not caring about that.

I definitely won't be using it for AuraGem.

https://www.ra.ethz.ch/cdstore/www9/175/175.html

2 years ago

Links

https://www.ra.ethz.ch/cdstore/www9/175/175.html

Actions

馃憢 Join Station

11 Replies

馃懡 krixano

@marginalia If you are just not going to read any of my actual arguments, then I have nothing further to say. Just because PageRank might have been better than the alternatives at the time doesn't mean it's the best that we can get.

I judge algorithms based on how they actually work and whether they make logically sound decisions, unlike you who clearly judges them based on their corporate or popularity success. 路 2 years ago

馃懡 marginalia

You use a lot of adjectives like "terrible" and "bad" and so on, but I don't actually see very much actual concrete evidence backing them up more than that you don't like them. Which is fine I guess, but I do think you're missing out by having this judgemental attitude toward algorithms. Gemini is probably as close to the ideal case for PageRank, given it still operates as a web-of-documents. The reason it's fallen apart on the clearnet is that it just no longer does. The reason google became so dominant was because their algorithm was *amazing* back when the internet largely looked like Gemini does today. 路 2 years ago

馃懡 krixano

It's also noteworthy that PageRank can be modified to skew toward arbitrary cliques as influence sources (as noted by Page and Brin in the original paper), such as your favorite bookmarks.

I'm also well aware of this. I didn't cover it in my post because it's pointless. You still have to deal with the terrible characteristics of PageRank.

Anyways, AuraGem doesn't need to provide SALSA or popularity-based searching because it's already provided by other gemini search engines. People can just choose what search engine they want to use. 路 2 years ago

馃懡 krixano

You seem very caught up on this notion of discrimination, but even the null ranking algorithm rewards some sites more than others.

@marginalia I don't know what the "null ranking algorithm" is, but if you are using the excuse that all discrimination is not that bad because you always get some discrimination anyways, then that's an extremely poor argument, imo. Some forms of "discrimination" are actually desired because... that's the whole point of search queries. FTS is at the very base, but what's on top of that can cause more problems.

Content farming is still a problem, but it's much more manageable than dealing with TKC and link farms. 路 2 years ago

馃懡 krixano

@marginalia

It is in many ways more fair for everyone else if your ranking is based on the opinions of others about your website, than the opinion of yourelf about your website.

This is not necessarily true, and I cover why in my new devlog. PageRank, HITS, and SALSA don't just rank based on number of links, they do much more than that. They rank sites based on the ranks of the sites that link to that site. This is a major problem that ends up rewarding big corporations, link farms, and pay-to-link advertising, etc.

Search engines can't understand the "opinions of others" because the semantic markup for links doesn't convey the reasoning behind the link. 路 2 years ago

馃懡 marginalia

It's also noteworthy that PageRank can be modified to skew toward arbitrary cliques as influence sources (as noted by Page and Brin in the original paper), such as your favorite bookmarks. I think rather than going on a fool's quest to creatre a fair ranking algorithm, it may be a better option to create one that is configurable to the searcher's taste. That way, the bias becomes a feature. 路 2 years ago

馃懡 marginalia

You seem very caught up on this notion of discrimination, but even the null ranking algorithm rewards some sites more than others. The benefit of having something like HITS or PageRank is that it's harder for you, yourself, to decide how highly you rank by stuffing your site with misleading keywords. It is in many ways more fair for everyone else if your ranking is based on the opinions of others about your website, than the opinion of yourelf about your website. 路 2 years ago

馃懡 krixano

@marginalia

they don't serve to discriminate the less-linked documents as much as present them in an order that probably makes more sense.

I see how you are trying to justify this. Basically, we can't say it discriminates because instead of removing those results, it just places them lower down in the results. Except what you are missing is that that is discrimination itself, and it's literally the whole point of the algorithm to distinguish/discriminate based on how many links a page has, if those links have many backlinks, and how many backlinks any other page has.

I'm dismissing the algorithms because I think that the very nature of how the algorithm works is bad. 路 2 years ago

馃懡 krixano

@marginalia I'm not dismissing them "based on the words used to describe them". I'm dismissing them because they make very very bad assumptions about what the user wants, when what the user wants can only be given to you with the search query.

I do not care about ranking being cheap. I'm not going to sell out my own search engine for terrible "search algorithms" that think the more links you have to more linked content should be prioritized over articles that have absolutely no links. That's how bad information gets boosted in the results, and this is WELL KNOWN, because link farms have existed to abuse the very algorithm you say I'm not understanding properly! 路 2 years ago

馃懡 marginalia

I would also suggest, that instead of dismissing these algorithms based on the words used to describe them, try them out and check out how they make your search results feel. Why not offer all of them, as well as none? Ranking is pretty cheap. 路 2 years ago

馃懡 marginalia

I think you are overthinking ranking algorithms a bit. They really only enter into the picture in the underspecifified case, which is relatively rare especially for a smaller search engine; and in the underspecified case, they don't serve to discriminate the less-linked documents as much as present them in an order that probably makes more sense. 路 2 years ago