💾 Archived View for dioskouroi.xyz › thread › 29421898 captured on 2021-12-04 at 18:04:22. Gemini links have been rewritten to link to archived content

-=-=-=-=-=-=-

Gigablast Search Engine

Author: SubiculumCode

Score: 232

Comments: 47

Date: 2021-12-02 20:19:08

Web Link

________________________________________________________________________________

eatbitseveryday wrote at 2021-12-03 00:20:22:

Source of this post likely from [1]

[1]

https://news.ycombinator.com/item?id=29417296

SubiculumCode wrote at 2021-12-03 05:37:01:

Yep, I posted this today after the author of gigablast posted a comment in this

https://news.ycombinator.com/item?id=29417061

today

boyter wrote at 2021-12-03 01:40:54:

Matt Wells who wrote the majority of the code for gigablast is someone I have been following online for a long time. I used to live for

http://gigablast.com/rants.html

updates. Gigablast being an amazing example of what one person can do given the time and effort. If you look around for articles and interviews by him you get some nice insights that would never come from the likes of Google, although Bing has some very good in depth technical discussions such as how bitfunnel works. It’s also nice to look through the code of it and see how things like porn filters were implemented.

It’s also nice to know that for a while in internet history gigablast was mentioned in the same breath as google. An amazing achievement at the time for a single person against the core google product.

I wish that someone with some design chops could work on it for a few weeks though. Or it was rolled back to the design from around 2006 with the rocket logo. I really liked that design.

I really wish I had the courage to strike out on my own like Matt has. I have written a few search engines, but a general purpose one from scratch on my own hardware is unlikely to ever happen, as much as I would love to do it.

That’s for inspiring me so much Matt if you do read this (notice me senpai!). Oh and sorry for abusing your XML api so much. I was poor at the time and needed some search results.

A few choice articles,

https://queue.acm.org/detail.cfm?id=988401

https://www.abc.net.au/news/science/2021-02-14/google-news-m...

gbmatt wrote at 2021-12-03 05:33:23:

thanks ben, you are too kind.

samhw wrote at 2021-12-03 09:31:56:

I second this - thanks for building this. It's an unbelievably inspiring achievement. It's my default search engine, and I'm really glad it exists.

boyter wrote at 2021-12-03 10:35:58:

Wow even knows me by my first name too. Very humbled. Once again thanks for being so open with what you have done.

gbmatt wrote at 2021-12-03 02:24:15:

hey thanks for the recognition, people. :) finally, all my problems are solved. this comment is here for hacker news karma points.

ivanche wrote at 2021-12-03 08:29:53:

Hey Matt, would you consider making a search box (input with id="q") a bit wider? I can type only around 15 characters before the beginning of search query becomes "cut off".

dzdt wrote at 2021-12-03 10:55:18:

Seconded.

I went to check out a few example searches and the too-narrow search bar is the first annoyance I found.

The next annoyance was that the crawled index seems much smaller than google's or bing's. I looked for things I know exist on twitter, on an old wordpress blog, on obscure websites I frequent: forcing terms to not be skipped using + I could see that none of my test cases were in the index.

InfiniteRand wrote at 2021-12-03 12:04:41:

The too small text box is also a pain when deleting the query in order to type a new one on mobile. A clear button would mitigate some of this pain although making the field larger would probably be sufficient

benwills wrote at 2021-12-03 15:55:32:

I noticed there were IPs in the source code that seemed to reference yours, and mabye others', home IP addresses. I'm curious if you run any parts of either the crawling, indexing, or searching from home networks?

I'm asking since I'm working on similar/different crawling problems that would make some stuff easier to just handle from the hardware I have at home, and have always assumed the provider would shut it down. Have you had any issues with that?

kingcharles wrote at 2021-12-03 02:50:05:

*throws karma at the screen*

nixgeek wrote at 2021-12-02 23:55:50:

“Gigablast has teamed up with Imperial Family Companies to create a next generation private search engine, private.sh.”

Imperial Family Companies are the people who essentially destroyed [1] the freenode IRC network, aren’t they?

[1]

https://netsplit.de/networks/history/top10_2021u.png

humanistbot wrote at 2021-12-03 00:20:50:

Yes, the same ones [1]. They are an investment firm that was formerly known as London Trust Media. You can see they have listed both "IRC" (links to irc.com) and "freenode" in their portfolio. [2]

[1]

https://lists.ubuntu.com/archives/ubuntu-irc/2021-May/001923...

[2]

https://imperialfamily.com/

(hover over "Technology)

mrtweetyhack wrote at 2021-12-03 03:57:33:

let's just make sure we destroy everything they invest in

superkuh wrote at 2021-12-03 04:05:44:

It's hard to believe they'd take money from someone that attacked so many open source projects earlier this year by leveraging "donations". They should be careful.

ludamad wrote at 2021-12-03 13:51:11:

At the same time, they're in a position to consent to a transition, not sure there's a community of collective ownership here like with freenode

superasn wrote at 2021-12-03 04:05:15:

I used to donate my idle cpu to seti@home back in the day. Wonder if the same can be done for creating an open search engine to compete with Google.

Also since the resources are crowd sourced it can make it easy to get around rate limits and anti scraping too.

tigerlily wrote at 2021-12-03 04:20:51:

Perhaps not weirdly I had the same thought yesterday [1].

https://yacy.net

was suggested in response.

[1]

https://news.ycombinator.com/item?id=29417925

boyter wrote at 2021-12-03 04:16:37:

You can do this with Yacy right now,

https://yacy.net

but it's not great for results generally.

I have often wondered if something built on activity pub or like that would be an option allowing people to group servers with peers they like or trust. Its something I want to implement actually and may get around to doing one of these days.

zdkl wrote at 2021-12-03 08:25:45:

Well for starters, one could implement the API to return ActivityStreams formatted responses. That would be a good start to being compatible with the fediverse and stuff while not going insane in implementing a full and proper ActivityPub service.

Been there, done that, way better tool for "lower level" features.

[0]

https://www.w3.org/TR/activitystreams-core/

boyter wrote at 2021-12-03 10:33:47:

That roughly what I thought. I’m not familiar with activitypub at all. I will probably investigate this deeper.

You see to have some knowledge in this area. Do you have any suggestions of places to look to achieve something like this?

R0b0t1 wrote at 2021-12-03 07:21:01:

I don't think it would be that easy. You need to distribute the indexing data. But you could federate search servers and have them send queries to others.

SubiculumCode wrote at 2021-12-03 07:22:24:

I know the usual thing against crypto, but I wonder whether a gridcoin model would work.

https://gridcoin.us/

SubiculumCode wrote at 2021-12-03 05:39:17:

That's a really cool idea!

ronenlh wrote at 2021-12-03 11:14:03:

Hi @gbmatt, amazing work!

What is the business history of it? Did companies /investors show interest in acquiring it? What do you think needs to be done in terms of business development to extend the index to cover the modern internet, as well as get whitelisted (and shortlisted) properly by cdns?

dash2 wrote at 2021-12-03 11:51:59:

Competition in search would be great.

This needs to be quicker. Nobody wants to wait 3 seconds watching cogs spin.
It needs a UX designer. The search box jumps around the page when you type into it. The left-orientation of the search results is ugly and distracting.

If this is a one-person project, then that is really cool, but if it wants to be a serious contender in consumer-facing search, then it is probably time to hire an employee with complementary skills.

fibbberMEN wrote at 2021-12-03 14:49:36:

It's been around since early google days, and that seems to be when most of the HTML was written. Every single page has several HTML errors... so definitely needs help there. Basic things like HTML tables are not closed / switching <td> and <tr> around etc.

maverick74 wrote at 2021-12-03 10:42:54:

Matt is a great guy and it does not have the credit he should have!

Same thing for Gigablast!

It is amazing what one person alone can accomplish.

Congratulations for that, Matt!

You've done impossible things with few resources!

It's a shame to have so many money given to so many projects and no one ever remembers Gigablast.

(About private.sh: I think it would be nice to have image search on private.sh)

jll29 wrote at 2021-12-03 20:58:39:

> It is amazing what one person alone can accomplish

I was also wondering how Matt did all this mostly alone until I discovered he joined HN only nine months ago. ;)

twofornone wrote at 2021-12-03 00:12:56:

Well, pirate bay is returned in search results, so that's a good start...

zandorg wrote at 2021-12-03 13:16:17:

I tried to submit my website to Gigablast, but apparently it costs 25 cents.

This doesn't make any sense to me for a search engine.

ohiovr wrote at 2021-12-03 01:41:44:

it found this:

https://gigablast.com/search?c=main&qlangcountry=en-us&q=how...

Which is definitely a good sign of a competent search engine.

lkramer wrote at 2021-12-03 11:34:30:

Initial searches are very promising. Is there a good way to add this as my default search engine in Firefox?

avery42 wrote at 2021-12-03 15:11:34:

If you don't want an extension, another option is to find it on Mycroft Project [0], choose Gigablast, and on the "Install plugin" page, right click the address bar and choose "Add Gigablast". Then you can set it as your default from the Firefox search settings.

[0]:

https://mycroftproject.com/search-engines.html?name=gigablas...

blobcode wrote at 2021-12-03 14:11:38:

You could give

https://addons.mozilla.org/en-CA/firefox/addon/gigablast-sea...

a try.

1cvmask wrote at 2021-12-02 20:49:55:

I saw this in a thread earlier today. I couldn't understand why it has a login and account. They seem to be the anti-Google and and anti-personalization search engine.

GistNoesis wrote at 2021-12-02 22:57:54:

You don't need an account to make searches though :

https://gigablast.com/index.html

If you have an account you can probably log your queries.

aquarin wrote at 2021-12-03 07:13:35:

It looks, you need a account to add url-s.

"You need to login to use the add url tool. "

"Each added url is $0.25."

SubiculumCode wrote at 2021-12-02 20:56:08:

good question

webZero wrote at 2021-12-03 02:57:35:

I cant go back to search results from private.sh.

lepouet wrote at 2021-12-03 11:41:27:

"Clients" --> Error = Not Found

:')

musicale wrote at 2021-12-03 06:59:22:

I like the idea of a web search engine that works for searching the web.

SubiculumCode wrote at 2021-12-03 07:30:50:

Honestly, I find this search engine pretty dang usable. I've thrown technical to frivolous at it, and i like the mix of results.

bigyellow wrote at 2021-12-03 01:27:03:

Furthermore, the client-side javascript on private.sh encrypts any query done on private.sh so that only Gigablast can read it. Therefore, no single party has access to both the IP address and the query. This is something that is truly unique and truly powerful, and, right now, only private.sh can supply this level of privacy.

Run proprietary Javascript for privacy - what a fallacious concept. Going to assume this service is a honeypot or run by incompetent staff - pass.

gbmatt wrote at 2021-12-03 02:33:37:

the javascript is run by your browser, so you can fully audit it.

bigyellow wrote at 2021-12-03 03:38:38:

It's still served by the site and I doubt most are interested or capable in auditing software to perform routine online tasks.

eftychis wrote at 2021-12-03 03:51:28:

I am not sure there are good solutions besides going off browser.

P.S. I was involved in user authorization, attestation and privacy flows for a particular product recently and the browser was always where shit hit the fan. The web features are just not made with simplicity and privacy in mind. Then again we had more complex constraints.

rasengan wrote at 2021-12-03 03:46:31:

There's an extension as well [1]. This means that the code is not being served by the server in this use case.

[1]

https://private.sh/extension.html

mrtweetyhack wrote at 2021-12-03 03:58:17:

Only Gigablast can read it means it is not private