Michael Tremante here. I'd like to address some points openly as I'm personally mentioned in the forum. I reached out to the Pale Moon community on behalf of the team to try and resolve the issue with the Pale Moon browser.
- We sent our standard NDA to speed things up. I explicitly said in the message that it may not be required, but in the interest of moving fast we sent it to them so they could review it just in case
- We are committed to making our challenge system work on all browsers by clearly documenting what APIs need to be supported. For example, part of the issue with Pale Moon, is that it does not support CSPs correctly
- Notwithstanding the above, to resolve the issue quickly we are willing to lower some of our checks if and only if, we find the right approach. Of course this would introduce some security issues that bot developers may quickly leverage
- Contrary to what many have said in this forum, our challenge has no logic that relies on the user agent strings. We rely on browser APIs. We don't have any special checks for any specific browser
- To address this longer term, we are discussing internally a program for browser developers to have a direct channel with our team and we hope to have something to share soon with the browser developer community
"Contrary to what many have said in this forum, our challenge has no logic that relies on the user agent strings."
If that were true then it would be possible to satisfy the challlenge without sending a user agent header. But omitting this header will result in blocking. Perhaps the user agent string is being collected for other commercial purposes, e.g., as part of a "fingerprint" used to support a CDN/cybersecurity services business.
We expect the user agent string to be present, that yes. We don't have any logic based on it's contents though (except blocking known bad ones) and we don't have any exceptions for the major browsers.
These contradict. Blocking "bad ones" is logic. Also such claims are disingenuous without defining what "bad ones" are... For all I know (and it surely seems so), you could be defining "bad ones" is "anything that is not 'the latest chrome without adblock and with javascript on'"
Yes, and I’m pointing out that phrasing it that way makes the whole statement meaningless. Eg: I don’t eat foods except some that I consider edible. I don’t kill kittens, except those I think are evil. See how it works? Adding a vague “except” to an absolute-sounding sentence destroys its very meaning
We speak of an arms race between cloudflare and (bad actors) that results in unintended consequences for end users and independent browsers ... and we need to stop.
There is an arms race: between end users and cloudflare.
The fact that a human chimes in on a HN discussion carries no information.
We continuously scrape a sizable number of ecommerce sites and have had no trouble whatsoever bypassing CloudFlare's antibot technologies.
CloudFlare representatives often defend user hostile behaviour with the justification that it is necessary to stop bad actors but considering how ineffective cloudflare is at that goal in practice it seems like security theatre.
We’ve worked across a number of equivalent anti-bot technologies and Cloudflare _is_ the AWS of 2016. Kasada, Akamai are great alternatives and are certainly more suitable to some organisations and industries - but by and large, Cloudflare is the most effective option for the majority of organisations.
That being said, this is a rapidly changing field. In my opinion, regardless of where you stand as a business, ensure abstraction from each of these providers is in place where possible - as onboarding and migrating should be table stakes for any project or business onboarding them.
As we’ve seen over the last 3 years, platform providers are turning the revenue dial up on their existing clientele.
It's success as a business aside, at a technical level neither Cloudflare nor its competitors provide any real protection against large scale scraping.
Bypassing it is quite straightforward for most average competency software engineers.
I'm not saying that CloudFlare is any better or worse at this than Akami, Imperva etc, I'm saying that in practice none of these companies provide an effective anti-bot tool, and as far as I can tell, as someone who does a lot of scraping, the entire anti-bot industry is selling a product that simply doesn't work.
In practice they only lock out "good" bots. "Bad" bots have their residential proxy botnets and run real browsers in virtual machines, so there's not much of a signature.
This often suits businesses just fine, since "good" bots are often the ones they want to block. A bot that would transcribe comments from your website to RSS, for example, reduces the ad revenue on your website, so it's bad. But the spammer is posting more comments and they look like legit page views, so you get more ad revenue.
I don't believe that distinction really exists anymore.
These days everyone is using real browsers and residential / mobile proxies, regardless of whether they are a spammer, or a Fortune 500, a retailer doing price comparison of an AI company looking for training data.
Just because some evil is a standard policy does not mean it's excused. The sending of broad NDA just to address a problem with Cloudflare itself is more throwing it's weight around again ala,
"I woke up this morning in a bad mood and decided to kick them off the Internet. … It was a decision I could make because I’m the CEO of a major Internet infrastructure company. ... Literally, I woke up in a bad mood and decided someone shouldn’t be allowed on the Internet. No one should have that power." - Cloudflare CEO Matthew Prince
Requiring every web browser to support every bleeding edge feature to be allowed to access websites is not the status quo of how the web has been for it's entire existence. Promoting this radical ideology as status quo is also seemingly shady but perhaps the above corporate rep is just in so deep so long they've forgotten they're underwater. Corporate use cases are not the entire web's use cases. And as a monopoly like cloudflare they have to take such things into consideration.
But they keep forgetting. And they keep hurting people. The simple solution is for them to make cloudflare defaults much less dependent on bleeding edge features for the captchas. If sites need those extra levels of insulation from the bandwidth/cpu-time to fulfill http requests it should be opt-in. Not opt-out.
The solution for the rest of us humans that can no longer read bills on congress.gov or play the nationstates.net game we've been playing the last 20 years is to contact the site owners when we get blocked by cloudflare and hopefully have them add a whitelist entry manually. It's important to show them through tedious whitelist mantainence that cloudflare is no longer doing it's job.
There is no such intent from us to throw around our weight. The team is challenged with a very hard task of balancing protecting web assets VS ensuring that those same assets remain accessible to everyone. It's not an easy problem.
The features you refer to are not bleeding edge, and not only that, they are security features. We are still discussing internally but I hope we can publish soon the details so that point can be addressed.
Final but not last, this only affects our challenge system, which is never issued by us as a blanket action across Internet traffic. It's normally a configuration a Cloudflare user implements in response to an ongoing issue they have (like a bot problem). We do report challenge pass rates and error rates but we can certainly always improve that feedback loop.
One of the things I really appreciated when I worked for Mozilla was their legal department's policy that Mozilla employees not sign over-reaching NDAs [0]. Some of the points they insisted on:
* It has to be limited in scope. It cannot just be "everything we give or tell you is confidential information."
* Confidential information has to be clearly marked or indicated.
* It has to be limited in duration. Not, "You are required to take this information to your grave."
If your project does not have lawyers backing you up, you might not know to ask for these things, or might not think you have the negotiating leverage to get them. But I think they make a real difference to a developer working on an open-source project, and I encourage anyone presented with an NDA to insist on them.
Every interaction I've ever had with CloudFlare has left me feeling like I needed a bath. The vertical desperately needs some competition but I don't know how that could happen at this point.
Things like Cloudflare are a natural monopoly. They are most useful when they have servers in datacenters worldwide in every possible location. So it takes a lot of capital to start. So competitors are few to none.
Personally, I'd like to see browsers moving away from HTTP for the web, towards something more P2P, so that there is less need for Cloudflare. Something like; look up your site key in DNS, then look up things signed by it in the BitTorrent DHT, and go from there.
It's not a monopoly, there are lots of CDNs. Volunteer run P2P networks are vastly more vulnerable to DDoS. CDNs basically are P2P networks of a kind, they're just run by one organization and use dedicated network links for nodes to talk to each other so you can't disrupt the internal network comms too badly by doing DoS.
And the core issue here is that the site owners want it, so a P2P network that couldn't offer bot protection wouldn't get adopted.
If we went to P2P, how would you get around caching issues/slow propagation of new versions when updates are pushed to a given website? That seems like a dealbreaker unless I’m overlooking something.
Same as in the not-P2P Cloudflare world, get the data from the only node that has a copy of it, which would be the HTTP server or the P2P node run by the website owner.
So CDN with extra steps? In your world Cloudflare or anything like it would be in the best position to make itself indispensable for such a network.
Regular client nodes won’t be the backbone of your P2P network these days since many of them are going to be mobile devices. So you are back to a tiered system where you have nodes which are more suitable for hosting (servers) and most suitable for consumers (clients).
We think of the internet as one big flat network, but it's actually a conglomerate of separate networks (interconnected by peering and transit agreements). There are a finite number of networks on the internet. Of those, only some are good CDN locations as you don't need a CDN node on every single network. The number of places where you could possibly ever want a CDN location is finite, with three or four digits.
Cloudflare has a presence in 335 cities - a lot, but not an impossible lot. We're not talking about ten million. Ten million dollars, maybe. (Ten million dollars would be $30k per city - respectable)
How many of Cloudflare's customers even care about all 335 cities? If you're a European business with European customers, you only care about the ~10 mainstream internet exchange sites in Europe (e.g. Frankfurt, London). Cloudflare has 59, but I don't think they need 59. If you want to be a Cloudflare competitor and support European businesses, you only need ~10 physical locations. That's an extremely manageable number.
What you want is at least one peering connection to every major European network, and ideally, a hotline to their NOC or a detailed BGP community agreement, to block attack traffic as close to the source as possible.
I should point out that due to the ongoing collapse of US hegemony, a lot of European institutions would like to reduce their dependence on Cloudflare right now.
From what I can see, there’s reasonable competitors for CF’s offerings, but extremely limited parallels to their free tier. The free tier is the killer.
Exactly this. If I’m doing something big enough to pay for it, I would almost never choose Cloudflare. But as much as I dislike them, for my small projects there just isn’t an option better than their free tier.
Michael Tremante here. I'd like to address some points openly as I'm personally mentioned in the forum. I reached out to the Pale Moon community on behalf of the team to try and resolve the issue with the Pale Moon browser.
- We sent our standard NDA to speed things up. I explicitly said in the message that it may not be required, but in the interest of moving fast we sent it to them so they could review it just in case
- We are committed to making our challenge system work on all browsers by clearly documenting what APIs need to be supported. For example, part of the issue with Pale Moon, is that it does not support CSPs correctly
- Notwithstanding the above, to resolve the issue quickly we are willing to lower some of our checks if and only if, we find the right approach. Of course this would introduce some security issues that bot developers may quickly leverage
- Contrary to what many have said in this forum, our challenge has no logic that relies on the user agent strings. We rely on browser APIs. We don't have any special checks for any specific browser
- To address this longer term, we are discussing internally a program for browser developers to have a direct channel with our team and we hope to have something to share soon with the browser developer community
I am happy to answer any constructive questions.
"Contrary to what many have said in this forum, our challenge has no logic that relies on the user agent strings."
If that were true then it would be possible to satisfy the challlenge without sending a user agent header. But omitting this header will result in blocking. Perhaps the user agent string is being collected for other commercial purposes, e.g., as part of a "fingerprint" used to support a CDN/cybersecurity services business.
We expect the user agent string to be present, that yes. We don't have any logic based on it's contents though (except blocking known bad ones) and we don't have any exceptions for the major browsers.
No commercial uses around this.
> We don't have any logic based on it's contents
> blocking known bad ones
These contradict. Blocking "bad ones" is logic. Also such claims are disingenuous without defining what "bad ones" are... For all I know (and it surely seems so), you could be defining "bad ones" is "anything that is not 'the latest chrome without adblock and with javascript on'"
that's what the word "except" means that you quoted around.
Yes, and I’m pointing out that phrasing it that way makes the whole statement meaningless. Eg: I don’t eat foods except some that I consider edible. I don’t kill kittens, except those I think are evil. See how it works? Adding a vague “except” to an absolute-sounding sentence destroys its very meaning
The purpose of a system is what it does.
We speak of an arms race between cloudflare and (bad actors) that results in unintended consequences for end users and independent browsers ... and we need to stop.
There is an arms race: between end users and cloudflare.
The fact that a human chimes in on a HN discussion carries no information.
We continuously scrape a sizable number of ecommerce sites and have had no trouble whatsoever bypassing CloudFlare's antibot technologies.
CloudFlare representatives often defend user hostile behaviour with the justification that it is necessary to stop bad actors but considering how ineffective cloudflare is at that goal in practice it seems like security theatre.
I disagree.
We’ve worked across a number of equivalent anti-bot technologies and Cloudflare _is_ the AWS of 2016. Kasada, Akamai are great alternatives and are certainly more suitable to some organisations and industries - but by and large, Cloudflare is the most effective option for the majority of organisations.
That being said, this is a rapidly changing field. In my opinion, regardless of where you stand as a business, ensure abstraction from each of these providers is in place where possible - as onboarding and migrating should be table stakes for any project or business onboarding them.
As we’ve seen over the last 3 years, platform providers are turning the revenue dial up on their existing clientele.
It's success as a business aside, at a technical level neither Cloudflare nor its competitors provide any real protection against large scale scraping.
Bypassing it is quite straightforward for most average competency software engineers.
I'm not saying that CloudFlare is any better or worse at this than Akami, Imperva etc, I'm saying that in practice none of these companies provide an effective anti-bot tool, and as far as I can tell, as someone who does a lot of scraping, the entire anti-bot industry is selling a product that simply doesn't work.
In practice they only lock out "good" bots. "Bad" bots have their residential proxy botnets and run real browsers in virtual machines, so there's not much of a signature.
This often suits businesses just fine, since "good" bots are often the ones they want to block. A bot that would transcribe comments from your website to RSS, for example, reduces the ad revenue on your website, so it's bad. But the spammer is posting more comments and they look like legit page views, so you get more ad revenue.
I don't believe that distinction really exists anymore.
These days everyone is using real browsers and residential / mobile proxies, regardless of whether they are a spammer, or a Fortune 500, a retailer doing price comparison of an AI company looking for training data.
Just because some evil is a standard policy does not mean it's excused. The sending of broad NDA just to address a problem with Cloudflare itself is more throwing it's weight around again ala,
"I woke up this morning in a bad mood and decided to kick them off the Internet. … It was a decision I could make because I’m the CEO of a major Internet infrastructure company. ... Literally, I woke up in a bad mood and decided someone shouldn’t be allowed on the Internet. No one should have that power." - Cloudflare CEO Matthew Prince
Requiring every web browser to support every bleeding edge feature to be allowed to access websites is not the status quo of how the web has been for it's entire existence. Promoting this radical ideology as status quo is also seemingly shady but perhaps the above corporate rep is just in so deep so long they've forgotten they're underwater. Corporate use cases are not the entire web's use cases. And as a monopoly like cloudflare they have to take such things into consideration.
But they keep forgetting. And they keep hurting people. The simple solution is for them to make cloudflare defaults much less dependent on bleeding edge features for the captchas. If sites need those extra levels of insulation from the bandwidth/cpu-time to fulfill http requests it should be opt-in. Not opt-out.
The solution for the rest of us humans that can no longer read bills on congress.gov or play the nationstates.net game we've been playing the last 20 years is to contact the site owners when we get blocked by cloudflare and hopefully have them add a whitelist entry manually. It's important to show them through tedious whitelist mantainence that cloudflare is no longer doing it's job.
There is no such intent from us to throw around our weight. The team is challenged with a very hard task of balancing protecting web assets VS ensuring that those same assets remain accessible to everyone. It's not an easy problem.
The features you refer to are not bleeding edge, and not only that, they are security features. We are still discussing internally but I hope we can publish soon the details so that point can be addressed.
Final but not last, this only affects our challenge system, which is never issued by us as a blanket action across Internet traffic. It's normally a configuration a Cloudflare user implements in response to an ongoing issue they have (like a bot problem). We do report challenge pass rates and error rates but we can certainly always improve that feedback loop.
One of the things I really appreciated when I worked for Mozilla was their legal department's policy that Mozilla employees not sign over-reaching NDAs [0]. Some of the points they insisted on:
* It has to be limited in scope. It cannot just be "everything we give or tell you is confidential information."
* Confidential information has to be clearly marked or indicated.
* It has to be limited in duration. Not, "You are required to take this information to your grave."
If your project does not have lawyers backing you up, you might not know to ask for these things, or might not think you have the negotiating leverage to get them. But I think they make a real difference to a developer working on an open-source project, and I encourage anyone presented with an NDA to insist on them.
[0] https://wiki.mozilla.org/Legal/Confidential_Information
Every interaction I've ever had with CloudFlare has left me feeling like I needed a bath. The vertical desperately needs some competition but I don't know how that could happen at this point.
Things like Cloudflare are a natural monopoly. They are most useful when they have servers in datacenters worldwide in every possible location. So it takes a lot of capital to start. So competitors are few to none.
Personally, I'd like to see browsers moving away from HTTP for the web, towards something more P2P, so that there is less need for Cloudflare. Something like; look up your site key in DNS, then look up things signed by it in the BitTorrent DHT, and go from there.
It's not a monopoly, there are lots of CDNs. Volunteer run P2P networks are vastly more vulnerable to DDoS. CDNs basically are P2P networks of a kind, they're just run by one organization and use dedicated network links for nodes to talk to each other so you can't disrupt the internal network comms too badly by doing DoS.
And the core issue here is that the site owners want it, so a P2P network that couldn't offer bot protection wouldn't get adopted.
If we went to P2P, how would you get around caching issues/slow propagation of new versions when updates are pushed to a given website? That seems like a dealbreaker unless I’m overlooking something.
Same as in the not-P2P Cloudflare world, get the data from the only node that has a copy of it, which would be the HTTP server or the P2P node run by the website owner.
So CDN with extra steps? In your world Cloudflare or anything like it would be in the best position to make itself indispensable for such a network.
Regular client nodes won’t be the backbone of your P2P network these days since many of them are going to be mobile devices. So you are back to a tiered system where you have nodes which are more suitable for hosting (servers) and most suitable for consumers (clients).
It's nowhere near as much as you think it is.
We think of the internet as one big flat network, but it's actually a conglomerate of separate networks (interconnected by peering and transit agreements). There are a finite number of networks on the internet. Of those, only some are good CDN locations as you don't need a CDN node on every single network. The number of places where you could possibly ever want a CDN location is finite, with three or four digits.
Cloudflare has a presence in 335 cities - a lot, but not an impossible lot. We're not talking about ten million. Ten million dollars, maybe. (Ten million dollars would be $30k per city - respectable)
How many of Cloudflare's customers even care about all 335 cities? If you're a European business with European customers, you only care about the ~10 mainstream internet exchange sites in Europe (e.g. Frankfurt, London). Cloudflare has 59, but I don't think they need 59. If you want to be a Cloudflare competitor and support European businesses, you only need ~10 physical locations. That's an extremely manageable number.
What you want is at least one peering connection to every major European network, and ideally, a hotline to their NOC or a detailed BGP community agreement, to block attack traffic as close to the source as possible.
I should point out that due to the ongoing collapse of US hegemony, a lot of European institutions would like to reduce their dependence on Cloudflare right now.
Back in a day around ~2014 there were multiple alternatives with meaningful market share. However all of these products
- Lacked free trial
- Had multiple times more expensive price point for the first twee ($2000/mo)
- Where just worse (bad UI, documentation, etc.)
Cloudflare won and grow so big because it was just better product.
Yes, I love Cloudflare’s products, but the way they interact with the community and the internet ecosystem at large leaves a lot to be desired.
> Every interaction I've ever had with CloudFlare has left me feeling like I needed a bath.
And let's not forget they are MITM'm all internet traffic that passes through them, which is a lot of it.
If a company wants to succeed in this space, they have to be killers. Cloudflare is positioning for domination.
From what I can see, there’s reasonable competitors for CF’s offerings, but extremely limited parallels to their free tier. The free tier is the killer.
Exactly this. If I’m doing something big enough to pay for it, I would almost never choose Cloudflare. But as much as I dislike them, for my small projects there just isn’t an option better than their free tier.
Like who?
Akamai or Imperva maybe? No personal experience, but they seem to offer similar suites of WAF/DDoS/CDN products.
What about Workers?
https://fly.io/