đŸ Archived View for dioskouroi.xyz âș thread âș 25000514 captured on 2020-11-07 at 00:54:16. Gemini links have been rewritten to link to archived content
-=-=-=-=-=-=-
________________________________________________________________________________
I came from a third world country and internet was pretty expensive. For some reason, my provider made Facebook completely free. So in my free college days I used the Facebook developer echo API to make a HTTP proxy so I can browse internet for free. It was terrible, was only HTTP 1, so no web sockets, videos stopped randomly etc, but hey I could read Reddit.
This is exactly the reason I love reading this site
What API is this? I am curious as to what this API was meant to be used for.
Same story here. Data was so expensive and Airtel let users open airtel.in etc at Zero balance. We used to use all kinds of Opera and UC "Handler Mods" with custom HTTP headers like Host or X-Online-Host to fool the ISP. First on Nokia s40 and Symbain and later on Android. Someone made a Handler Mod of pshipon VPN and man, it was slow but so cool. And then Jio happened!
Any chance you could share that? I think some flights still don't charge for Facebook messenger, so HTTP over messenger might still be useful.
Less technically knowledgeable people would probably do something similar, using Facebook as an even slower (and lossier) "layer 8 proxy" as opposed to your "layer 7 proxy".
It would also be a decent anti-tracking mechanism.
May I ask which country?
There are a lot of countries where Facebook is âfreeâ, for example India and Philippines.
I'm guessing that in these countries the cost of Internet traffic is dominated by the undersea cables at their borders.
Facebook has been paying to have new undersea cables laid. This is done as part of a consortium, but those cables only have 6-12 strands in them (the repeaters are bulky) so owning even just one whole strand of fiber in an undersea cable is still an obscene amount of bandwidth for a single company that isn't in the business of reselling bandwidth.
In The Philippines, my understanding is that they have ample bandwidth via Korea and other countries in the region. But the reason they have such expensive terrible internet is because of a lack of net neutrality and deregulation.
The cellphone duopoly sells "YouTube passes", that entitle you to get unthrottled YouTube for brief periods of time.
Net neutrality isnât related to Internet speeds. Good speeds are just driven by having competition.
Comcast was suddenly able to provide 1gbps for the same price as an 80mbps package when a fiber competitor entered the market.
Even with net neutrality, there is no incentive to make the internet better as an operator if youâre operating in a government granted monopoly/duopoly market.
Net neutrality eliminates the ability of an operator to discriminate and offer uncapped data or higher speed passes to _just_ YouTube.
I wonder if a proxy could be made to encode data as video to put in a YouTube livestream. You'd still need an uplink but the upload bandwidth usage is a fraction of the download one for typical Internet usage.
Yes, that is definitely possible and might make for a fun project. Use stego to make the livestream look innocuous and apply heavy ECC, further resisting censorship and arousal of suspicion. I think this is the closest I've seen to a public implementation of that idea:
https://news.ycombinator.com/item?id=12166332
_You'd still need an uplink but the upload bandwidth usage is a fraction of the download one for typical Internet usage._
Perhaps the chat/comments (once again with heavy stego/encryption) would work?
Related and older idea:
https://en.wikipedia.org/wiki/ArVid
> the cost of Internet traffic is dominated by the undersea cables at their borders.
This is rarely the case. It's usually monopoly providers and/or people speaking historically when it _was_ expensive because it was rarer.
India and the Philippines both have more than adequate international bandwidth.
Free Basics was not allowed in India by the Telecom Regulatory Authority of India[0]. The list of countries where Free Basics is currently operated is listed on Internet.org website[1].
[0]
https://www.theverge.com/2016/2/8/10913398/free-basics-india...
[1]
https://info.internet.org/en/story/where-weve-launched/
Indeed, I must have confused it with Bangladesh, I visited both in the same month. Thanks for linking to the actual list.
You can't create your own link previewer, cloudflare will put a captcha in front of every website. All I want is a a freaking <title> tag. They don't seem eager to fix it either, their proposed solution is to contact every website owner (seriously) to ask them to whitelist you[1].
Frankly, i wish facebook or cloudflare offered their previewer as a free service, since most websites have them whitelisted.
1.
https://community.cloudflare.com/t/attention-required-messag...
Iâve long said Cloudflare is a dangerous threat to the open internet and as well as some privacy tools like TOR.
But it doesnât always get much traction on here because both the founder and employees of cloudflare are quite popular users on HN. Some have given me brief half assed counter answers that conveniently miss other harder questions like a good PR person does (and which you seem to have gotten in your reply).
I hope every web admin gives it serious second thought before adopting Cloudflare. Just like for cellphones OS/operator the one thing Iâd dream of is a tool that offers a limited set of what Cloudflare does (DDOS protection, hosting privacy layer) but is pro internet and pro privacy. They seem hostile to it in many ways likely because it directly affects their bottom line.
The bigger question is whether such a tool could be created without all the downsides. The two I listed I think yes. But their web app security system is overly strict and bad for the internet IMO.
And I say that knowing they protect some serious defenders of human rights and face a lot of abuse from the âbad guysâ. I just wished there was a better middle ground.
> But it doesnât always get much traction on here because both the founder and employees of cloudflare are quite popular users on HN.
I don't think it gets much traction because you're barking up the wrong tree. Also, suggesting that YC is out to silence you and that nobody actually has a counter argument isn't very good for traction, either.
Until my website can't get taken off by a $5 rental of an internet-of-shit botnet, Cloudflare gives me and my users recourse against the bad actors of the world. (I also enjoy its host cloaking for my privacy)
You simply gloss over bad actors and attack one of the only solutions that works. The biggest threat to the open internet was its naive "there are no bad actors" design, not the people giving us one of the only bulwarks against bad design.
I agree with your last sentence that it would be nice to have a better middle ground, but notice that's not the "cloudflare bad" thesis of your comment.
The internet needs to be improved so that Cloudflare is redundant. It's not Cloudflare's fault that fundamental design oversights (like optional ISP egress filtering) have created a lucrative niche. And things like faster, unlimited data plans accessible by smart toasters and smart doorbells on top of the internet's naive architecture only entrenches Cloudflare further.
I hosted a server that was attacked all the time over a comcast connection and was always able to figure it out without cloudflare proxy blocking for me
Your tone shines a light that perhaps you are on of those cloudflaire employees/fanboys who will drown out a warning.
The parent poster had a point and your reply is reenforcing it.
Cloudflare even puts multiple captcha challenges for any request from the default browser on the Samsung S7 Edge. Granted it's an old phone at this point, and most users install Chrome on their phones, but I end up skipping a lot of websites on my phone rather than participate in furthering the misconception that "Chrome is the only browser".
_because both the founder and employees of cloudflare are quite popular users on HN._
It seems a lot more likely people aren't finding your argument as convincing as you'd like. Plenty of well-known users (and users who identify their employer) around whose companies' HN-perception fortunes change quite a bit over time.
Normally skip those sites that ask for a Cloudflare captcha if the site isn't too important. Luckily this is the case most of the time.
Would be annoying when online banking or governmental sites start asking for them.
Hey, try
if you want an ethical DDoS protection service.
Edit: my bad. Misinterpreted your comment.
Can you elaborate on how Tor is a threat to the open internet? That's a non-obvious statement to me. I'm aware that it's compromisable via controlling exit nodes (NSA, various nations) but that's not really the threat profile for the average person. Are there any other reasons?
Because despite its flaws, afaik TOR is an attempt to make the internet _more_ open to those who are being surveiled.
What am I missing?
I think OP is suggesting that Cloudflare is a threat to TOR, not that TOR is a threat to the internet.
Website owners can actually whitelist Tor traffic as a "country", but not a lot of them knows/cares/wants to do that.
My read of it was that CloudFlare is a threat to both the open internet and TOR, not that TOR was also a threat to the open internet.
I read that as Cloudflare is a threat to tools like Tor
Any company through which a high percentage of web traffic is not only routed through but fully reverse-proxied of course always should be a significant concern and should be subject to extreme scrutiny. But why explicitly do you think they're anti-internet and anti-privacy? To me it seems like being pro-internet and pro-privacy aligns both with their general incentives and their monetary incentives.
I genuinely think they're a net positive for and supporter of Tor users. Before, site owners and security providers who faced issues with abusive/malicious traffic behind Tor connections (spam, illicit content, security scanning, password struffing) nearly always resorted to outright blocking all Tor exit node IPs, because they had no other feasible option. I've been in that position. Cloudflare at least provides any site owner an ability to easily allow the traffic; just with a fairly quick occasional bot check.
Additionally, as of 2018 they now have an "Onion Routing" option which site owners can enable, which results in Tor users being able to access your site 100% through the Tor network. As a result, Tor users no longer experience any captchas, load your site faster, and never have to touch the clearnet.
>But their web app security system is overly strict and bad for the internet IMO.
Their WAF seems to have a pretty low false positive rate, compared to others I've seen. (Though the flipside of that is it also has a pretty high false negative rate and isn't very helpful against a dedicated non-automated attacker, like many other WAFs.)
>But it doesnât always get much traction on here because both the founder and employees of cloudflare are quite popular users on HN.
They do post a lot here, but I doubt that's really responsible for defensive responses from other HN users. The most common criticism I see here (presenting a captcha for people using Tor, which site owners can now disable) makes me think the majority of people making the criticism have never run large websites or worked infosec for any organization with a large website.
Tor is of course not a threat itself, but anecdotally I'd estimate 90 - 95% of traffic that the average website owner receives from Tor is highly abusive/malicious, and Cloudflare empirically estimated 94% as of 2016 (
https://blog.cloudflare.com/the-trouble-with-tor/
). And anecdotally, not only is a high percentage of Tor traffic malicious, in many cases a significant percentage of all malicious traffic is Tor traffic. Naturally, due to Tor by design making it impossible to distinguish the ~94% connections from the ~6%, it's extremely difficult to mitigate this without just blocking 100% of Tor traffic. This is obviously not Tor or anyone's fault; it's just a practical reality for website owners. This sort of situation will always be the case for any kind of robust privacy-protecting application.
Cloudflare is possibly the first free service that actually enables anyone to easily allow normal traffic from Tor without much increase in security/abuse risk. They seem explicitly pro-Tor, especially with the explicit Onion Routing feature that lets Tor users access your site 100% through the Tor network without ever experiencing captchas, and statements like in
https://blog.cloudflare.com/the-trouble-with-tor/
and
https://blog.cloudflare.com/cloudflare-onion-service/
One may certainly have lots of other justified, legitimate concerns regarding the company and their disproportionate control of a huge chunk of the internet and web, but I'm not sure how someone could read those, see how the traffic is handled in practice, and conclude they're anti-Tor or a dangerous threat to Tor.
And unfortunately, cloudflare is everywhere. This trend will make it even harder for projects like a new search engine to enter the game.
Because if you don't have it some a-hole will go and ddos your site or you want to prevent a hug-of-death because of reasons.
It seems a lot of issues happen because bad players are continued to allowed to thrive, example: everybody uses a big provider because they're the only ones that solved the spam issue.
cloudflare can just allow a fair crawl rate instead of a captcha on first request
The problem is that bad actors can masquerade as a lot of independent clients (The first D in DDoS stands for "distributed").
Figuring out whether a site is under a DDoS attack or getting legitimate requests from many sources is a very hard problem, and can just be worded "telling good actors from bad actors" -- no simple solution works; also, who YOU consider a good actor and who the website owner considers a good actor may be at odds.
Most people (and CloudFlare by default) consider FAcebook a good actor; but as far as I'm concerned, Facebook is an evil an actor as one can be.
> sources is a very hard problem
We're talking about virtually unknown blogs that get 1 http request from my server's IP, which is not blacklisted anywhere. It's not hard at all , i just think cloudflare's tech s not that good
You're really pulling a "how hard could it really be??" to DDoS prevention?
You should at least be humbled by how few services can even offer DDoS protection that works against volumetric attacks and isn't just based on null-routing. The people with skin and money in the game might know something you don't.
here's how simple it is :
if (!website.underDDoS && website.requestedTimesToday[ip] <10) showCaptcha=0;
How do you implement "website.underDDoS"?
Through a proxy - mind you; CloudFlare makes their decision without access to your CPU or DB metrics, and don't know which page load times are legitimately slow and which aren't supposed to be.
how about "haven't had requests for the past 2 minutes". Again, i m talking about links to obscure blogs that barely anyone reads, let alone DDoSes
I think another comment here may be closer to the truth, CF may only be running heuristics on the user agent
If hardly anyone reads or DDoSes them, why did they go to the trouble of setting up CloudFlare? Itâs free for those obscure blogs, but itâs definitely a non trivial hassle. Usually people set it up only after they experienced their first attack.
I get it that you are upset Google gets to scrape them and you donât. But bad actors really are making it difficult for everyone to just âbeâ on the internet.
i dont know! but they do it, everyone does it because everyone else does it. it s not unusual
I got round it by just making sure the user agent is set to the latest version of Chrome rather than a version from a few years ago that I had hardcoded before. It seems Cloudflares protection is pretty much "is your user agent in the top 10 user agents?".
Did you try that?
I have, iirc it worked some times, but not always. Is it a reliable solution for you?
It's at least a 95% reliable solution, which seems to be about the same as a real user sees.
Well if you have an easy solution that you think would work, why don't you put up a website, commission a DDOS attack from a skilled actor and try to demonstrate mitigation?
Companies pay big money to CloudFlare. If a simpler and cheaper solution is workable, they'll pay you instead.
Just like telling if it's raining is easy but stopping rain once has started is hard, the claim is that it's not hard to detect if a site is being ddosed.
I use Zoho.com and I rarely get spam, if ever.
Zoho isn't Google-size, but it isn't irrelevant, either. Sending mail from a self-hosted email server is far harder since the big providers might put it in spam or drop it even earlier.
To add to sibling - running your own mail server is the only way to ensure your email is not read by someone else which is so messed up.
> running your own mail server is the only way to ensure your email is not read by someone else
But any mail you send to someone else probably ends up read by Google/Microsoft anyway, since that's where their mailbox is.
Also, email security is a joke. It's 2020, and even TLS encrypted SMTP connections tend not to check for a valid certificate, making them trivial to MITM.
Practically speaking how does one MITM an SMTP connection? For example, from Google to Microsoft. They connect directly to the IP addresses they get from MX records + lookup. What's the actual threat vector/execution here?
Anyone with hardware on the network path can do it... Or anyone who can inject BGP routes can do it too.
I use it as well and I get sooo much more spam than I git on Gmail.
At
we scrape every registered domain once a month, and make the meta data available freely over an API. You could use that to get a title for a domain (although not for a URL that's not the main domain), eg:
$ curl https://host.io/api/web/facebook.com?token=$TOKEN { "domain": "facebook.com", "rank": 2, "url": "https://www.facebook.com/", "ip": "157.240.11.35", "date": "2020-08-26T17:39:17.981Z", "length": 160817, "encoding": "utf8", "copyright": "Facebook © 2020", "title": "Facebook - Log In or Sign Up", "description": "Create an account or log into Facebook. Connect with friends, family and other people you know. Share photos and videos, send messages and get updates.", "links": [ "messenger.com", "oculus.com" ] }
See
for more details about the API and what else you can do with it (eg. find backlinks to domains, domains with the same adsense ID etc)
Long term, a new HTTP META method would be interesting. I wonder if something like that has ever been considered. Providers like Cloudflare would hopefully be more lenient with these requests.
Huh. It's certainly an interesting idea! Strictly speaking, individual people could implement this today, since nonstandard HTTP verbs don't break anything that doesn't know to request with them. (It wouldn't be of much use, because clients wouldn't know to use it, but still -- something that could easily be prototyped).
I don't think FAAANG (or any other big players) would have much interest in making it happen in the standard though, since it would undercut their big-player advantage.
Doesn't the oembed spec [1] already solve this? I think the OP could solve their problem by simply creating an oembed endpoint with all the necessary meta data.
[1]
Yea but when your request to fetch the oembed data is blocked by a CAPTCHA...
This is a real problem, we experience it in the Fediverse
I wonder if
Accept: application/json
Would be a reasonable alternative? Wasn't this supposed to be the point of content negotiation?
Maybe, but not really; seems like this thread is more about intent (âI just want a previewâ) while content type is more about representation (âI want the content as jsonâ). I can imagine that there will be websites that are actively using the accept parameter to distinguish between âregular visitorsâ and have their APIs at the same paths (didnât Reddit do this at some point?), and thus your approach would break in this case.
I guess what this is really about is, I hate to say it, but something in the direction of the semantic web, where web servers (and in this case, CloudFlare et al) actually gain a deeper understanding of the content they serve, and a web browser / crawler being able to query this content directly.
It seems to me that what "previews" really want is an API for the page's content in a structured format: OpenGraph tags and other microformats are one representation, but it's annoying to have to load _all_ the HTML just to grab title and the OG tags.
Accept: text/preview
In what content type? Json? Xml? Html?
> Frankly, i wish facebook or cloudflare offered their previewer as a free service, since most websites have them whitelisted.
Yup, and exposing just a key pieces of information (title, and some of the meta/og tags) without the body would limit the potential for abuse, while still being fairly useful for legitimate uses.
There hardly are any "illegitimate" uses. The web is meant to be machine-readable (we wouldn't have Google or anything nearly as convenient in the first place if it wasn't). Whatever have been published is public and should not come with artificial limitations on how do you read and process it. Blocking crawling should be outlawed as it clearly is a monopolistic practice. E.g. I want to build my own crawler to index and categorize the web subset I choose for me. I believe this is a perfectly legitimate use. But they will probably try to stop me.
> Blocking crawling should be outlawed
That's overly broad. But maybe it should be illegal to have exceptions only for major monopolies.
Turn it around at least for a few minutes. Does a website operator _have to handle_ whatever arbitrary traffic you want to throw at them from your crawler?
_Theyâre_ the ones choosing to use tech thatâs blocking you. Proposing to make it illegal for them to make that choice or to speak to you differently than they speak to other users of their site may give you some idea of the resistance youâre likely to face to this proposal.
I don't get what value link previews add. Someone shares a link with me (on skype, slack, teams... whatever) and I care about the content because the person sharing it with me thinks I could/should care about it, or someone shares a link on an aggregator and then I don't think it is too much to ask for that someone to write a summary. If the link is worth sharing writing 1 sentence to explain why isn't too much to ask.
What is the value a link preview adds? And why should I, as a content provider care about the value you add? Cloudflare does something for me, what is your service doing for me and why should I whitelist you (or care about you)?
They're sending you traffic.
Imagine Twitter or Facebook without link preview, it's much harder to use and overall reduces the change I'll click on a link. Do you think only Twitter and Facebook should be allowed publish previews?
Half the time the link preview picks the wrong picture and sometimes even the quote. Twitter and Facebook would both be improved by disabling it. Hell, it might even stop people from thinking they need a hero image for their 2 paragraph medium shitpost.
I'd place that blame towards website owners. Both Facebook and Twitter are pretty open where they read that info from, and an owner can pretty easily pass those fields (it's just some <meta> tags in the <head> element).
They also have their own validators:
https://cards-dev.twitter.com/validator
and
https://developers.facebook.com/tools/debug/
The only issue I'm aware of is that Facebook's crawler breaks about every two months or so.
What meta tags do I have to fill and why is Twitters/FBs preview suddenly my problem?
>
https://developer.twitter.com/en/docs/twitter-for-websites/c...
So, I should have to include twitter specific meta tags even though I personally don't care about twitter? Maybe twitter should make it clear which tags they read? Maybe it's SEO bullshit I don't care about? Maybe even even the OG: tags don't work all the time and result in dumb previews?
If you don't want to fill them out, don't... Filling them out lets you customize your link preview on twitter. If you don't care about Twitter, why would this affect you at all?
They're used by instant messengers too: slack, iMessage, WhatsApp, telegram, signal, ...
> it's much harder to use
Actually it's easier to use, in that the preview doesn't take up screen real estate. Perhaps you mean the experience is less pleasant?
>They're sending you traffic.
Irrelevant traffic for every metric I care about.
>Imagine Twitter or Facebook without link preview
That's exactly what I'm saying. Either I care about what that person thinks might interest me or I don't. The link preview abstract is shit anyway. Does the site title and the 2 sentence abstract really sway you? If someone wants to send traffic my way, writing an interesting abstract is not too much to ask.
>it's much harder to use and overall reduces the change I'll click on a link
Maybe you should re-evaluate who you follow on twitter? I frankly could care less about facebook.
>Do you think only Twitter and Facebook should be allowed publish previews?
I think previews are worthless regardless, I thought I made that clear. Either you care about me linking it to you or you do not.
*EDIT: And just for fun, here is the link preview stuff from my latest skype call with my brother:
Look at all the value those previews added.
when you paste a link on reddit and it autocompletes the title
update a bookmark title, or check if it exists.
is it not self-evident that a link being crawlable is useful?
>when you paste a link on reddit and it autocompletes the title
Oh no, you have to copy/paste the title?
>update a bookmark title, or check if it exists.
I can access the site without a captcha, my browser can fetch the title.
>is it not self-evident that a link being crawlable is useful?
No, it is not. Maybe a site owner does not want crawlers to index the site?
Me being able to access the title and any html meta tags is not the same as some crawler being able to access it. It seems like your beef is with cloudflare and that is fine but please state that that is your issue and don't try to frame it as something else. What I don't get is how everybody places the blame at cloudflares feet. It is my choice as a host to use cloudflare and to use their protection features.
i m not sure if you re being serious
CF is so widespread that it breaks a significant part of the web for simple things like getting the page title. That's all. The End.
'I' can get the page title though. That's all. The End.
I don't care about your crawler. Or your ability to post the link to my site to twitter/fb and if I did maybe I'd revise my cloudflare settings.
The cloudflare and google catcha are terrible. It's so bad that at this point I just close the tab if they challenge me with it. I use Brave and always have Shields UP, it seems having it up makes the captchas extremely difficult. Mission accomplished I guess.
Is this the case for any web crawler?
not sure but it s a very common problem:
https://www.google.com/search?q=cloudflare+attention+require...
So this was a very long web page to say: Facebook forgot to rate limit their web scraper on a per user basis, but we told them and they fixed it.
Also itâs not really a scraper. Just a json response with the website preview stuff like meta tags, <title/>, and other basic information which could be useful for some bots.
But it does not give you the whole HTML. Or anything close.
Exactly, they literally didn't do anything but file a Facebook bug report.
In the same way your comment is a very long way to say: something happened.
I've used Yahoo's YQL. While I would hit rate limits and other crap when trying to scape data of some sites directly. YQL would provide me nicely structured data without these stupid limits as many don't see yahoo's bot as a scraper.
That's pretty interesting, Facebook as a "web scale / hundreds of pages per second" batch web page summarizer. I imagine you could build a pretty decent general purpose search engine that way...free crawler.
As long is they are using opengraph meta tags.
Why canât you just make your web crawler look like fb or googlebot (via user agent)?
Do website owners actually check the ip?
Sounds like this company is checking that the ip is from facebook. That would probably work on less secure sites though.
At the best case scenario, google has a monopoly on scraping. Imagine trying to create a global search engine, how can you possibly even crawl sites that are behind cloudflare or just allow google/fb/bing bots ?
Can you real-time crawl twitter ? Pretty sure they have a special deal with google to instant ping on new tweets.
How many websites actually ping google on new content ?
And don't you dare scrape Google results. That's against their TOS! Rules for thee, not for me.
Isn't it weird there is no machine-readable API to Google search results?
I thought this is exactly how DuckDuckGo worked?
No, DuckDuckGo purchases search results from Bing:
https://azure.microsoft.com/en-us/services/cognitive-service...
You mean startpage?
This is so fucked. We've encrusted ourselves into this walled fiefdom, and there's no way to break free.
It's only going to get worse.
Chrome will be the only browser. AMP the only delivery mechanism. Video will require DRM. Eventually, text content will too. Binary blobs with no ad blocking.
Well, I think you're being overly alarmist, at least in the short term: DRM on Video has not really caught on, at least on-line; Non-Chrome browsers continue to have a significant share (mostly on Desktops); and ad blocking remains rather effective.
In the long run, I'm definitely worried: Capitalist economies tend to see a concentration of capital, generally and in most sectors individually. And this seems to be a real danger with computing technology. Coupled with mass surveillance and the pushing of people to have their personal information held by those large tech companies, a dystopia is not inconceivable.
PS - By AMP, do you mean Amazon Prime?
> DRM on Video has not really caught on, at least on-line
This seems like a weird statement. All of the paid streaming services use DRM on Video, so all major browsers include the requisite black-box DRM modules. I'm actually surprised YouTube has not added Widevine DRM for all videos yet, but I'm sure it'll happen if RIAA/MPAA get annoyed enough with youtube-dl and the like.
> PS - By AMP, do you mean Amazon Prime?
I think he means Google AMP[1], which is slowly infecting more of the top search results on Google.
[1]
https://developers.google.com/amp/
I've never heard of this AMP thing. Is it really that popular?
Valid point about paid streaming services - which I don't use.
AMP is incredibly popular. Every news site has enabled it. You have a 100% chance of seeing an AMP page in the top results for anything.
Google had 2 options - make websites faster the normal way (remove bloat) or make websites faster by introducing AMP. AMP is controlled by Google. What do you think they did? They said they would reduce the site's ranking if they didnt use AMP. Within weeks, everybody except Wikipedia was introducing AMP.
You donât crawl unless you have to, Twitter[1], Facebook, WordPress.com[2] and other big services have a firehose you can apply to and get real-time changes. If youâre crawling the web, youâre probably doing it wrong or only servicing a particular niche.
[1]
https://developer.twitter.com/en/docs/twitter-api/v1/tweets/...
[2]
https://developer.wordpress.com/docs/firehose/
You missed the pricing. This is what you're doing wrong.
And you missed the cloudflare part too.
Someone who would create a scraping API that site owners could embed in their projects and get paid for feeding crawlers with their data, could make billions.
Would just like to give an honorable mention to Google Translate, the most accessible http proxy of all time. Itâs especially good for bypassing corporate access controls. Iâve used it many times for accessing solution threads on technical subreddits at work.
datadome.co is blocked for me:
_datadome.co is being blocked by AdGuard DNS filter, AdGuard Tracking Protection filter, EasyPrivacy, Goodbye Ads and oisd._
Dunno what they do, but it can't be good.
How is DataDome different from Cloudflare? The latter offers bot protection for free if you are already a Cloudflare customer
Theyâre a French/EU company, that alone can be an advantage for some businesses.
Pretty sure they have an office in New York as well. Site shows POPs all over the map: datadome.co
My understanding is that it's more "advanced". Take that at face value, I've not used the service.
The website doesnât open for me.
Check your DNS, if you are running a pihole or something similar you will need to disable or allow this site.
Enable spamware? Big red flag nokthx
interesting!