gemini://gemi.dev/cgi-bin/wp.cgi/view?Link%20farm Gemipedia on Link Farms Since people want to act like Page Ranking algorithms are not broken, I give you the article above. Not only is there link farms that artificially increase a page in search results, there is also a chance for false positives where sites are penalized for looking like a link farm, *and* whole demains have to be unsearcheable because they are removed for being a link farm. Ranking may only be used/visible in "underspecified" search queries, but the majority of all search queries are underspecified, especially if you use FTS of page contents, where even common words can be matched.
2 years ago
gemini://gemi.dev/cgi-bin/wp.cgi/view?Link%20farm
@acidus - is not too late to fix the URL syntax for gempedia. implement the one you want and redirect any legacy links to the new ones. 路 2 years ago
Said that I see/saw a lot of bias in the search engine field, especially when the real goal is making money.
Anyway at beginning the hyperlink was the mantra so when internet was perceived as a spider-web of hyperlinks the pages with more links toward them were considered more authoritative.
Then search engines started to become the preferred gates to access to the WWW which already lost the association with the spider-web, and pages simply tried to please the (only) search engine(s) which tried to please advertisers.
As usual if Gugl shit really worked out in a positive manner we would never being here in the Gemini space... 馃 路 2 years ago
@acidus Gemipedia is very well done! I linked to it a bunch of times in my new post as well :D 路 2 years ago
As an aside, it makes my day seeing people link to Wikipedia content via Gemipedia! But wow, do I wish I had done some URL rewriting so the URLs looked better 馃槀 路 2 years ago
@acidus Right, that's sorta why I called it a fools errand, lol. Anyways, my thought was that link text doesn't add as much to the ranking of a page as other things. This means collusions wouldn't be prioritized over page content, but content farming is still going to be a big problem. 路 2 years ago
I don't think gemini is the same hostile environment, because its virtually impossible to monetize tricking the search engines into what your content is about to drive large amounts of traffic: You can't really monetize here, and we don't have large amounts of traffic 馃槅. But people making mistakes, having broken gemtext where half there page is in a preformatted block, sending the wrong charset (like all the textfile.com mirrors do) missing up readin the content. So as you said @krixano link text could help there 路 2 years ago
Its also funny that PageRank was designed, in part, to get around the original SEO abuse: Keyword stuffing. If I control a page's content, I can lie about what it is about. Early search that just looked at the page meta data or content get fooled. Instead of only trusting the page, look at what (hopefully independent) 3rd parties say a page is about, via their link text. This just pushed the SEO abuse out to people colluding about links, and link farms. Adding to this was the concept of "authority" as well. But I can sympathize with the challenge. How do you determine what something is about in a hostile environment where people lie? its interesting 路 2 years ago
@acidus I agree. Link text was always something I wanted to include in AuraGem Search. 路 2 years ago
Where I think links are valuable is in helping to classify content. If I write a a Sci-fi short story how does the search engine know its science fiction? Indexing the text probably won't reveal it. But people linking to it saying "great hard sci-fi read" "As good as an Asimov tale", and you index that, you can better classify it. Completely skipping the "authority" or "popularity" side of things. Also a great way to tell if a piece is satire, or ironic. That's probably not in the content itself, but can be determined from the text that links to it: TLDR: in a medium like gemini without much meta data, I lthink link text can help a lot. 路 2 years ago
@acidus Right, this is exactly why I rejected HITS, SALSA, and PageRank :P
But searching for "Station" keep bringing up pages that used the word "Station" a lot in their titles or body text (so jsut FTS-ing titles/meta won't help. So this was a stop gap.
This is also why I don't do FTS on body text either. Although, yeah, this is still a problem with metadata, and I'm aware of this problem and thinking about fixing it. 路 2 years ago
I think using Links for popularity or authority isn't the best approach (though yes. I'm using a primative popularity constant as a work around now). Doing that has downsides, like ranking less value content higher than new content, simply because its older and has more links. Older content can have out of date info. Or just be inflamatory and is generating hate-links. So links have more nuance than just a blunt "With this link I vote that page X is important" 路 2 years ago
@acidus I agree btw with everything you wrote in this comment. I just spend a couple hours writing a long article that talks about not only the problems with popularity-based ranking, but also content-based ranking (which I never meant to suggest wasn't a problem).
I'm releasing it now. 路 2 years ago
@acidus Oh! Sorry, I swear I seen somewhere that you used a PageRank derived ranking system. 路 2 years ago
I have some experience in the SEO field and the only things I really understood is if you pay directly or indirectly big G, your organics searches perform mysteriously better... 路 2 years ago
Kennedy doesn't use PageRank. Its open source, you can see for yourself. It uses a primative popularity rank:
https://github.com/acidus99/Kennedy/blob/main/Crawler/Support/PopularityCalculator.cs
I don't think this is a best approach, because I don't think popularity is valuable. But searching for "Station" keep bringing up pages that used the word "Station" a lot in their titles or body text (so jsut FTS-ing titles/meta won't help. So this was a stop gap. 路 2 years ago
https://github.com/acidus99/Kennedy/blob/main/Crawler/Support/PopularityCalculator.cs
@freezr Oh, also, Kennedy uses PageRank. I'm not sure what GUS uses, but I doubt it uses any link-based ranking systems, because the results are worse than even auragem's, imo, lol 路 2 years ago
@freezr Also, TLGS uses SALSA, which is prone to the Link Farms and Spamdexing described on the Gemipedia page I linked. In fact, any algorithm that takes into account the *number* of links to any page is prone to this, including PageRank and SALSA and HITS. Google released their Panda system in 2011 to try to detect content spam. And PageRank also tries to detect link spam.
gemini://gemi.dev/cgi-bin/wp.cgi/view?Spamdexing
gemini://gemi.dev/cgi-bin/wp.cgi/view?PageRank
gemini://gemi.dev/cgi-bin/wp.cgi/view?Google%20Panda
https://web.archive.org/web/20111104131332/https://www.google.com/competition/howgooglesearchworks.html 路 2 years ago
gemini://gemi.dev/cgi-bin/wp.cgi/view?Spamdexing
gemini://gemi.dev/cgi-bin/wp.cgi/view?PageRank
gemini://gemi.dev/cgi-bin/wp.cgi/view?Google%20Panda
@freezr PageRank's patent has already expired, and Google has admitted that the core of their search engine still uses PageRank. There was also a paper written on how it works. Google is also sometimes open with how certain things sorta work in their search engine, but some of it can't be used because of the patents. 路 2 years ago
you don't need to go so far... the simple fact that nobody can actually taking a look at the Big G algorithms just makes to whole question silly. 路 2 years ago