đŸ’Ÿ Archived View for seirdy.one â€ș 2021 â€ș 03 â€ș 10 â€ș search-engines-with-own-indexes.gmi captured on 2022-01-08 at 13:36:58. Gemini links have been rewritten to link to archived content

View Raw

More Information

âŹ…ïž Previous capture (2021-11-30)

âžĄïž Next capture (2022-03-01)

-=-=-=-=-=-=-

A look at search engines with their own indexes

Originally posted 2021-03-10. Last updated 2021-10-21.

This is a cursory review of all the indexing search engines I have been able to find. Gemini engines are at the bottom; the rest of this post is about Web search engines.

The three dominant English search engines with their own indexesÂč are Google, Bing, and Yandex (GBY). Many alternatives to GBY exist, but almost none of them have their own results; instead, they just source their results from GBY.

With that in mind, I decided to test and catalog all the different indexing search engines I could find. I prioritized breadth over depth, and encourage readers to try the engines out themselves if they’d like more information.

I primarily evaluated English-speaking search engines because that’s my primary language. With some difficulty, I could probably evaluate a Spanish one; however, I wasn’t able to find many Spanish-language engines powered by their own crawlers.

This page is a “living document” that I plan on updating indefinitely. Check for updates once in a while if you find this page interesting. Feel free to send me suggestions, updates, and corrections; I’d especially appreciate help from those who speak languages besides English and can evaluate a non-English indexing search engine. Contact info is in the article footer.

Rationale

Google, Microsoft (the company behind Bing), and Yandex aren't just search engine companies; they're content and ad companies as well. For example, Google hosts video content on YouTube and Microsoft hosts social media content on LinkedIn. This gives these companies a powerful incentive to prioritize their own content. They are able to do so even if they claim that they treat their own content the same as any other: since they have complete access to their search engines' inner workings, they can tailor their content pages to better fit their algorithms and tailor their algorithms to work well on their own content. They can also index their own content without limitations but throttle indexing for other crawlers.ÂČ

One way to avoid this conflict of interest is to *use search engines that aren't linked to major content providers;* i.e., use engines with their own independent indexes.

There's also a practical, non-ideological reason to try other engines: different providers have different results. Websites that are hard to find on one search engine might be easy to find on another, so using more indexes and ranking algorithms results in access to more content.

Methodology

I focused almost entirely on "organic results" (the classic link results), and didn't focus too much on (often glaring) privacy issues, "enhanced" or "instant" results (e.g. Wikipedia sidebars, related searches, Stack Exchange answers), or other elements.

I compared results for esoteric queries side-by-side; if the first 20 results were (nearly) identical to another engine’s results (though perhaps in a slightly different order), they were likely sourced externally and not from an independent index.

I tried to pick queries that should have a good number of results and show variance between search engines. An incomplete selection of queries I tested:

General indexing search-engines

Large indexes, good results

These are large engines that pass all the above tests and more.

1. Google: the biggest index. Allows submitting pages and sitemaps for crawling, but requires login. Powers a few other engines:

2. Bing: the runner-up. Allows submitting pages and sitemaps for crawling, but requires login. Its index powers many other engines:

3. Yandex: originally a Russian search engine, it now has an English version. Some Russian results bleed into its English site. Allows submitting pages and sitemaps for crawling, but requires login. Powers:

4. Mojeek: Claims to be privacy-oriented. Quality isn’t at Google/Bing/Yandex’s level, but it’s not bad either. If I had to use Mojeek as my default general search engine, I’d live. Partially powers eTools.ch.

5. Petal search: A search engine by Huawei that recently switched from searching for Android apps to general search. Despite its surprisingly good results, I wouldn't recommend it due to privacy concerns. Requires an account to submit sites. I discovered this via my access logs. Be aware that in some jurisdictions, it doesn't use its own index: in Russia and some EU regions it uses Yandex and Qwant, respectively.

petalsearch.com

Smaller indexes, relevant results

These engines pass most of the tests listed in the “methodology” section.

Right Dao

Gigablast

Private.sh

Gowiki

Smaller indexes, hit-and-miss

These engines fail badly at a few important tests. Otherwise, they seem to work well enough.

seekport (HTTP only)

Exalead

Curlie

ExactSeek

Meorca Search Engine

Infotiger

search.tl

Kozmonavt

ChatNoir

Common Crawl

ChatNoir source code (GitHub)

ChatNoir Announcement

Unusable engines, irrelevant results

Results from these search engines don’t seem at all useful.

MetaGer

Active Search Results

Crawlson

Anoox

Plumb CPO

Yioop!

Semi-independent indexes

Engines in this category fall back to GBY when their own indexes don't have enough results. As their own indexes grow, they claim that this should happen less often.

Brave Search

Plumb

Neeva

Non-generalist search

These indexing search engines don’t have a Google-like “ask me anything” endgame; they’re trying to do something different. You aren't supposed to use these engines the same way you use GBY.

wiby.me

wiby.org

Search My site

Ninfex

Other languages

I’m unable to evaluate these engines properly since I don’t speak the necessary languages. English searches on these are a hit-or-miss. I might have made a few mistakes in this category.

Big indexes

Naver

Seznam

Cốc Cốc

go.mail.ru

Smaller indexes

Parsijoo

search.ch

fastbot

Moose.at

Misc

uk.ask.com

Infinity Search

Infinity Decentralized

Upcoming engines

These engines aren’t ready yet; their indexes are either in a proof-of-concept phase with a handful of sites or aren’t available yet.

Gemini search engines

Time for my first Gemini-exclusive content! A Gemini page about search engines wouldn't be complete without a few search engines for the Gemini space.

gus.guru

geminispace.info

Graveyard

These engines were originally included in the article, but have since been discontinued.

The Wbsrch Experiment

Acknowledgements

Some of this content came from the Search Engine Map and Search Engine Party. A few web directories also proved useful.

Search Engine Map

Search Engine Party

Matt from Gigablast also gave me some helpful information on GBY which I included in the "Rationale" section. He's written more about big tech in the Gigablast blog:

Gigablast blog

Nicholas A. Ferrell of The New Leaf Journal wrote a great post on alternative search engines.

A 2021 List of Alternative Search Engines and Search Resources

N.A. Ferrell's Gemlog

He also gave me some useful details about Seznam, Naver, Baidu, and Goo:

Re: Editor of The New Leaf Journal - Added Your Guestbook Comment Info to My Post + Feedback

Notes

Âč Yes, “indexes” is an acceptable plural form of the word “index”. The word “indices” sounds weird to me outside a math class.

ÂČ Matt from Gigablast told me that indexing YouTube or LinkedIn will get you blocked if you aren't Google or Microsoft. I imagine that you could do so by getting special permission if you're a megacorporation.

Âł DuckDuckGo has a crawler called DuckDuckBot. This crawler doesn't impact the linked results displayed; it just grabs favicons and scrapes data for a few instant answers. DuckDuckGo's help pages claim that the engine uses over 400 sources; my interpretation is that at least 398 sources don't impact organic results. I don't think DuckDuckGo is transparent enough about the fact that their organic results are proxied. Compare DuckDuckGo side-by-side with Bing and Yandex and you'll see it's sourcing organic results from one of them (probably Bing).

 Qwant claims to also use its own crawler for results, but it’s still mostly Bing. Try a side-by-side comparison; I found that it doesn’t seem to have anything besides Bing results.

⁔ Disconnect Search allows users to have results proxied from Bing or Yahoo, but Yahoo sources its results from Bing.

⁶ Yippy claims to be powered by a certain IBM brand (a brand that could correspond to any number of products) and annotates results with the phrase “Yippy Index”, but a side-by-side comparison with Bing and other Bing-based engines revealed results to be nearly identical.

⁷ Ask.moe was working on a FLOSS indexer; its search page stated an intention to switch to it from Bing at one point. This statement has since been removed.

FLOSS indexer

⁞ This is based on a statement Right Dao made in on Reddit:

Right Dao on Reddit

Archive of the Reddit thread

âč Some search engines support the "site:" search operator to limit searches to subpages/subdomains of a single site or TLD. "site:.one", for instance, limits searches to websites with the ".one" TLD.

Âč⁰ More information can be found in a HN subthread and the Cliqz tech blog:

HN comment thread for "Introducing Brave Search Beta"

Tech @ Cliqz: Building a search engine from scratch

Tech @ Cliqz: Search quality at Cliqz

---

Article changelog

Homepage

View “A look at search engines with their own indexes” on the WWW

Gemini capsule source code

Copyright © 2021 Rohan Kumar