馃懡 krixano

One of the biggest criticisms of Google and other big search engines is that they put too much emphasis on popularity. To call this searching by *relevance* is disingenuous, imo, because popularity has nothing to do with the relevance of the search. Instead, these types of algorithms assume that people *want* the most popular or "authoritative" pages.

But the biggest mistake is presuming that more authoritative pages are ones that are linked to the most, especially when making this presumption on the internet where false information spreads like wildfire.

2 years ago 路 馃憤 superfxchip, smokey

Actions

馃憢 Join Station

14 Replies

馃懡 freezr

I don't know how search engines work but the major search engines work based on expectations IMHO.

What they expect you are looking for.

What they expect to monetize with your searches.

The latter heavily influences the former cause, for instance, Gugl clients are whom pay for adverse and end-users are the product.

Anything you may find through a search it is just the "bait" to make your on the screen as long as possible to make revenue from a display ad or through a click...

This atrocius behavior under the bible of the SEO and its god PageRank or how it is called today, simply shaped the WWW at its figure, that's why all the websites looks all identically. 路 2 years ago

馃懡 smokey

The goal of mainstream search engines is no longer to aggregate search results but to sell you things (or maybe better put, to sell your information and attention space to advertisers) as well as disseminating instant answers to any quesion in as few clicks as possible. Using a meta-search engine like searx helps but still uses google and friends to aggregate results. search.marginalia.nu and yacy are the closest I have come to search engines which truly are novel and free of the SEO bullshit, but both also contain their own set of flaws. marginalia great for exploring information but not so good at disseminating it. YACY is cool in theory but shitty to use in practice. 路 2 years ago

馃懡 superfxchip

SEO pumping out problematic garbage, operating under misguided programming, in an effort to feed back similar instances regardless of context? sounds good, seems legit. 路 2 years ago

馃懡 krixano

The whole problem is that it's a fool's errand to think that you can determine the "authority" of a page with an algorithm... 路 2 years ago

馃懡 krixano

The faulty assumption is that links could just be linking to something the author thinks is absolutely not authoritative, and the whole reason for linking it is to heavily criticize the page. For example, if someone posts a conspiracy theory, and then you have a ton of pages linking to it and saying how bad it is, that then boosts the "influence" or "authority" of the bad page. Like... it's completely broken because of these assumptions.

[2/2] 路 2 years ago

馃懡 krixano

@superfxchip Right, but they end up having to build band-aids to fix the brokenness of the core, lmao. For example, links that go to the same domain (intra-links) are not used to boost the authority. The search engine would also ideally detect links for ads, and other "non-informational" links. However, there's still a big problem with the assumption below:

With such a link p suggests, or even recommends, that surfers visiting p follow the link and visit q. This may reflect the fact that pages p and q share a common topic of interest, and that the author of p thinks highly of q's contents.

[1/2] 路 2 years ago

馃懡 superfxchip

@krixano The worst from google, bing, or even social media engines integrated in sites like facebook or twitter is when it feeds you back next to nothing, or a limited amount of results because of an inability to match anything that's actually relevant to your search or I even notice broken or outdated results from sources that either don't exist anymore or have been overtaken by spam, page squatters, etc 路 2 years ago

馃懡 superfxchip

@krixano I love the way indexing and queries work over the engines here like your very own auragem, or @acidus ' kennedy, and the gemspace.info search bar, which bring back a variety of results diversified between various pages and articles which make mention of what exactly it is i'm searching for without feeding me back tons and tons of recycled results upon pages and pages needlessly. .That's REAL optimization, true efficiency. 路 2 years ago

馃懡 superfxchip

@krixano this is starting to sound a lot like a very convoluted link farm dependant on the number of roots connected between pages that link back to eachother, so all in all, getting absolutely tf nowhere lol. 路 2 years ago

馃懡 krixano

@superfxchip And then there's the whole thing that Google (and probably Bing) does where it weighs pages based on the "quality" of the website, or really, whether the website uses modern-html5, how fast or slow it is to load up, whether it has metadata, whether there's a mobile version, accesibility access, etc. This allows people to come up with methods to game the system, calling it a fancy term like "SEO Optimization". 路 2 years ago

馃懡 krixano

@superfxchip And, imo, the whole reason why this effect is even in there in SALSA, HITS, and PageRank, is because they are all built on faulty assumptions. This faulty assumption is given explicitly on an old page on google.com that explained how the searching worked (https://web.archive.org/web/20111104131332/https://www.google.com/competition/howgooglesearchworks.html):

PageRank works by counting the number and quality of links to a page to determine a rough estimate of how important the website is. The underlying assumption is that more important websites are likely to receive more links from other websites. 路 2 years ago

https://web.archive.org/web/20111104131332/https://www.google.com/competition/howgooglesearchworks.html

馃懡 krixano

@superfxchip In fact, if you look at the SALSA algorithm, which is based on HITS, on its wiki page, it mentions the TKC (tightly knit community) effect, which is where pages that link each other reinforce and raise each other's authority-level. Imo, having even a little bit of this is absolutely unacceptable. It's basically the echo-chamber effect of search engines.

SALSA is less vulnerable to the Tightly Knit Community (TKC) effect than HITS. A TKC is a topological structure within the Web that consists of a small set of highly interconnected pages. The presence of TKCs in a focused subgraph is known to negatively affect the detection of meaningful authorities by HITS. 路 2 years ago

馃懡 krixano

@superfxchip Well, right. Google and Bing do both - they prioritize ads as well as popularity, lol. Afaik, they both use PageRank, but tweaked and expanded. They first get a base result set by using Full-Text Search to search through all pages for the keywords you are querying for, and then they weigh all of the pages based on how many keywords they have that match with your query, as well as the "authority" or "influence" (read: popularity) these pages have. Both HITS and PageRank do this.

These algorithms do have uses in other types of search engines, just not in general internet search engines when you want to find good information, imo. 路 2 years ago

馃懡 superfxchip

What's worse, in my experience, is knowing that you're being fed results paid off with money to show up first prioritized on top of other, probably more relevant sites yet buried far deeper below the pages of that same search.

All because they didn't have enough or weren't considered "commercially relevant" enough to be included in top results over less relevant non-helpful search results that were only pushed to the top because they had a bigger monetary value to 'em smfh.

I wouldn't call that "popularity" i'd call that digital payola. 路 2 years ago