Vision

📅 2007-04-16

📑 Tech

🏷 #idea

🏷 #www

This is a condensed version of a few conversations with a few people, containing some of their ideas as well, which I tried to clean up and make it something coherent. While I don’t see the harm in disseminating it, _(If it gets implemented without me having to work, so much the better.)_ for the moment I’ll be keeping it password protected, just in case… cause I do want at least some glory out of this. :)

Web 2.0 has put the emphasis on limited numbers of large Service Sites — just like the dotcom bubble tried to make every site your single portal to internet as a whole. Unlike the earlier creations, they’re interoperable. They have exposed APIs you can use to make your own toys, and you do. However, and that’s the important bit, they keep your data halfway across the world, God knows where. They even went so far as to suck your actual applications away. They’ve reduced your computer to a brainless terminal — all in good faith, yes, in good faith and legitimate commercial interest, but that’s just web 2.0. Which leaves you with CPU power to spare.

This is just a vision of what Web 3.0 should be like.

Let’s call it your External Information Manager. It’s a local application, it’s not a service — though there might be external sites which offer you hosting tailor-made for it. It might even have a personality, if you want — in some ways it certainly does, though essentially, this personality is also yours. It sits between your browser and the rest of the internet, and its job is:

1. To cut the crap.

2. To disseminate the non-crap to your friends.

3. To make sure that if there’s anything non-crap that you want, you’re the first to know.

The methods of doing so are as follows:

┄┄

1. Probably, using XMPP, even.

And even if large chunks of the net keel over and die, it works just as it did.

There’s still a few important chunks of this vision missing, however, that’s what the implementation is all about:

1. While implementing the separation of text into ham and spam is trivial using existing bayesian filtering software, I don’t think analyzing words as tokens will be sufficient. Eventually, it has to be able to tell a WWN report of a falling asteroid from a real report of a falling asteroid. It has to be able to tell good porn from bad, even. And it has to work with all languages known to man, tell between them and adjust to grammar peculiarities.

2. This robot network will result in useless duplication of effort, so some sensible concept of sharing filter profiles is needed… but you can’t exactly share a multi-megabyte database easily. I don’t see how this could be done yet.

3. Markup sensitive filtering to remove unwanted stuff will have to somehow fit between the browser and the net, without wrecking these very delicate things people are so fond of these days like AJAX. And with the way people still treat web standards, well, it’s going to be extremely tough.

4. It is in no way clear how to determine what I might find interesting in the next moment, what keywords to search for and what to spider. This, however, is a very important bit.

So… thoughts?

1. I don’t know yet whether it should be a cgi-style application visible through your browser only, or a proxy/server style standalone application. Both approaches have their merits. It definitely should expose a http interface somewhere, though. It’s possible that both approaches should be used at the same time.

2. I think that the most important part, which everything will be based on, is the [my-world-is-spam] text analysis module. It should form a core library, (and a core database) which will be used everywhere else. Most other things listed above involve it in one way or another.

3. The development, obviously, will be open source, cause this thing will be so much more useful and reality-changing if it’s widespread. That posting will go public when we get at least some running, marginally useful code — by then it will no longer matter. :)

4. Beside donations, which aren’t as easy to come by as one would hope, we might be able to earn money by providing paid hosting for EIMs in the form of a conventional web 2.0 service — with the added value that you can always grab your data and use it locally, which most smarter people will do. Notice that while with this approach it will scale in a linear fashion, the line will be quite steep, so optimizations for hosting multiple EIMs should be thought of early. That’ll also help with enterprise deployment — and I suspect corporations might find EIMs just as useful.

5. One interesting side-effect from using EIMs is that advertising will get filtered out along with all other spam. There will be some interesting publicity consequences involved, since, while there is software like Adblock and Junkbuster which does this for you, I don’t think there were incidents of a web 2.0 service trying to make ad-removal a business.

6. …beside web content, it could, and probably should, also filter e-mail, and maybe even instant messages. :)

my-world-is-spam

✏️ View and leave comments

◀️ 2007-04-16: И еще два слова о маршах

⬅️ Tech: Very Selective Availability

➡️ Tech: И опять о тесте Тьюринга

▶️ 2007-04-17: И опять о тесте Тьюринга

↩ Home

© 2001-2022 Eugene Medvedev. All rights reserved, not like that ever stopped anyone, or means anything when not backed up by a corporation.