< WIP search engine

Parent

~moonsheep

Well there were certainly improved versions later, but PageRank was one of the first attempts to counteract this issue. The key point is that not only is the backlink count used for ranking, but the quality of each of those backlinks is taken into consideration--that is, getting a single good-quality backlink is often much better than many poor backlinks (rendering at least small-scale link farming methods ineffective).

http://ilpubs.stanford.edu:8090/422/1/1999-66.pdf

Web pages vary greatly in terms of the number of backlinks they have. For example, the Netscape home page has 62,804 backlinks in our current database compared to most pages which have just a few backlinks. Generally, highly linked pages are more important than pages with few links. Simple citation counting has b een used to speculate on the future winners of the Nobel Prize [San95]. PageRank provides a more sophisticated method for doing citation counting. The reason that PageRank is interesting is that there are many cases where simple citation counting does not correspond to our common sense notion of importance. For example, if a web page has a link to the Yahoo home page, it may be just one link but it is a very important one. This page should be ranked higher than many pages with more links but from obscure places. PageRank is an attempt to see how good an approximation to importance can be obtained just from the link structure.

These types of personalized PageRanks are virtually immune to manipulation by commercial interests. For a page to get a high PageRank, it must convince an important page, or a lot of non-important pages to link to it. At worst, you can have manipulation in the form of buying advertisements (links) on important sites. But, this seems well under control since it costs money. This immunity to manipulation is an extremely important property. This kind of commercial manipulation is causing search engines a great deal of trouble, and making features that would be great to have very difficult to implement.

I'm not sure how much that applies to gemini, since its smaller size may make exploitation easier, but on the other hand it also makes manual blacklisting significantly more managable.

Write a reply

Replies

~tetris wrote:

I never thanked you for this added context. Cheers both for the info and the correction -- and I hope your engine takes off!