💾 Archived View for envs.net › ~seirdy › 2021 › 03 › 10 › search-engines-with-own-indexes.gmi captured on 2022-04-29 at 12:40:14. Gemini links have been rewritten to link to archived content

View Raw

More Information

⬅️ Previous capture (2022-04-28)

-=-=-=-=-=-=-

A look at search engines with their own indexes

Originally posted 2021-03-10. Last updated 2022-04-23.

This is a cursory review of all the indexing search engines I have been able to find. Gemini engines are at the bottom; the rest of this post is about Web search engines.

The three dominant English search engines with their own indexes¹ are Google, Bing, and Yandex (GBY). Many alternatives to GBY exist, but almost none of them have their own results; instead, they just source their results from GBY.

With that in mind, I decided to test and catalog all the different indexing search engines I could find. I prioritized breadth over depth, and encourage readers to try the engines out themselves if they’d like more information.

This page is a “living document” that I plan on updating indefinitely. Check for updates once in a while if you find this page interesting. Feel free to send me suggestions, updates, and corrections; I’d especially appreciate help from those who speak languages besides English and can evaluate a non-English indexing search engine. Contact info is in the article footer.

I plan on updating the engines in the top two categories with more info comparing the structured/linked data the engines leverage (RDFa vocabularies, microdata, microformats, JSON-LD, etc.) to help authors determine which formats to use.

About the list

I discuss my motivation for making this page in the "Rationale" section.

I primarily evaluated English-speaking search engines because that’s my primary language. With some difficulty, I could probably evaluate a Spanish one; however, I wasn't able to find many Spanish-language engines powered by their own crawlers.

I mention details like "allows site submissions" and structured data support where I can only to inform authors about their options, not as points in engines' favor.

See the "Methodology" section at the bottom to learn how I evaluated each one.

General indexing search-engines

Large indexes, good results

These are large engines that pass all my standard tests and more.

1. Google: the biggest index. Allows submitting pages and sitemaps for crawling, and even supports WebSub to automate the process. Powers a few other engines:

2. Bing: the runner-up. Allows submitting pages and sitemaps for crawling without login using the IndexNow API. Its index powers many other engines:

3. Yandex: originally a Russian search engine, it now has an English version. Some Russian results bleed into its English site. Like Bing, it allows submitting pages and sitemaps for crawling using the IndexNow API. Powers:

4. Mojeek: Seems privacy-oriented with a large index containing billions of pages. Quality isn’t at Google/Bing/Yandex’s level, but it’s not bad either. If I had to use Mojeek as my default general search engine, I’d live. Partially powers eTools.ch. At this moment, I think that Mojeek is the best alternative to GBY for general web search.

5. Petal search: A search engine by Huawei that recently switched from searching for Android apps to general search. Despite its surprisingly good results, I wouldn't recommend it due to privacy concerns. Requires an account to submit sites. I discovered this via my access logs. Be aware that in some jurisdictions, it doesn't use its own index: in Russia and some EU regions it uses Yandex and Qwant, respectively.

petalsearch.com

Google, Bing, and Yandex support structured data such as microformats1, microdata, RDFa, Open Graph markup, and JSON-LD. Yandex's support for microformats1 is limited; for instance, it can parse h-card metadata for organizations but not people. Open Graph and Schema.org are the only supported vocabularies I'm aware of. Mojeek is evaluating structured data; it's interested in Open Graph and Schema.org vocabularies.

Smaller indexes, relevant results

These engines pass most of the tests listed in the "methodology" section. All of them seem relatively privacy-friendly.

Right Dao

Gigablast

Private.sh

Alexandria

Alexandria engine source code

Fairsearch

FairSearch supports Open Graph and some JSON-LD at the moment. A look through the source code for Alexandria and Gigablast didn't seem to reveal the use of any structured data

Smaller indexes, hit-and-miss

These engines fail badly at a few important tests. Otherwise, they seem to work well enough.

seekport (HTTP only)

Exalead

Curlie

ExactSeek

Infotiger

Burf.co

Entfer

Siik

inetdex.com

Meorca Search Engine

ChatNoir

Common Crawl

ChatNoir source code (GitHub)

ChatNoir Announcement

Secret Search Engine Labs

CashRank Algorithm

Unusable engines, irrelevant results

Results from these search engines don’t seem at all useful.

Bloopish

MetaGer

Artado Search

Active Search Results

Crawlson

Anoox

Plumb CPO

Yioop!

Semi-independent indexes

Engines in this category fall back to GBY when their own indexes don't have enough results. As their own indexes grow, some claim that this should happen less often.

Brave Search

Plumb

Neeva

Qwant

Kagi Search

Kagi.ai

TinyGem

Non-generalist search

These indexing search engines don’t have a Google-like “ask me anything” endgame; they’re trying to do something different. You aren't supposed to use these engines the same way you use GBY.

Small/non-commercial Web