Hacker News new | past | comments | ask | show | jobs | submit login
You don’t want to be on Cloudflare’s naughty list (ctrl.blog)
484 points by _fnqu on Sept 20, 2022 | hide | past | favorite | 330 comments



Well into the second day of Cloudflare’s blockade of my home internet connection, Google Search also began blocking requests. It required me to resolve a CAPTCHA challenge for every other search. This luckily only lasted a day.

Cloudflare shares IP reputation data with partners like Google, coordinated through a program called the Bandwidth Alliance. So, my original offense might not even have been against Cloudflare. It might have received the reputation data from a partner, and it just propagated through the Bandwidth Alliance network.

That's not what Bandwidth Alliance is at all. It's about reducing or eliminating egress fees between a cloud provider and Cloudflare. Not sure where the idea that it's about sharing IP reputation data comes from.

https://www.cloudflare.com/bandwidth-alliance/

So, if Google Search started showing a CAPTCHA that's not Cloudflare.


FYI, this guy is far from alone, your "protection" has given me a lot of grief over the past few years, particularly on highly NATed mobile networks.

I've been gradually removing cloudflare based CDNs from services I develop and control because I don't want my users being arbitrarily discriminated against.

There was a good article posted on HN recently titled "The ideal level of fraud is non-zero" which I think is highly relevant here... In essence any mechanism employed to prevent illegitimate use comes with a negative cost to legitimate users, if that cost is too high it defeats the purpose. i.e what's the point in a website that is completely immune to a botnet and also cannot be accessed by anyone else? unplugging the ethernet cable also effectively protects against botnets. More subtly the cost of outright rejecting some legitimate users is usually not worth the savings of rejecting 100% of illegitimate ones. I think Cloudflare's service has it the wrong way around: it currently accept blocking legitimate users far too easily, that is not an acceptable cost; whereas you should be letting a higher level of bots through to avoid pissing off legitimate users - if it's not obviously a DDoS, it's probably worth the bandwidth cost.

Consider the bigger picture, if you save a slither of a penny by blocking a bot, but also end up blocking or seriously inconveniencing 10 real users... is it worth it.


Cloudflare just isn't worth the tradeoffs: the risks associated with their centralization, how they made Tor basically unusable on non-onion sites, the lack of transparency when content-moderating the internet, etc.

The space is in need of solid competitors to break the stranglehold they have on the internet. Whether it's the right combination of services, documentation, etc.


Tor made Tor unusable on non-onion sites. I feed a netfilters table with the list of exit node IPs that Tor publishes (https://check.torproject.org/torbulkexitlist) as a standard part of server deployment, and it's the single most effective way to reduce form and login abuse on hosted sites. I like the idea of Tor, but there's no denying that it's a huge source of nuisances.


I live in a country with censored internet. What you are doing is harmful. I can only hope whatever you provide is irrelevant enough.


I'm sorry. I have a colleague based out of Venezuela. We've had to work together to get tunnels and vpns configured so that he can get uncensored and secure internet access.

But Tor is an enormous source of abusive traffic and if I don't filter it, then that's harmful to site owners. I'm being forced to choose between the needs of people that I know, work with, and depend on financially, and the needs of people in countries with issues that are far outside my ability to resolve. It's not a hard decision.


There are probably more sophisticated options that would solve your problems than simply blocking it.


Is using CAPTCHAs one of those?


captchas are fine. recaptcha is not.

between 2015 and ~2020, my home ISP was blessed with every recaptcha being 3 rounds of slow fade-in bullshit. I have also seen infuriating gaslighting of "please try again" after certainly correct solutions, as well as 5+ rounds followed by a notification that my network is entirely blocked.

I've developed a reflex to Ctrl+W upon seeing it, unless that is absolutely vital for me to get past it - which is exceedingly rare.

if I had a genie lamp, I'd waste one of my 3 wishes to do terrible things to the people responsible for that shit.


Most captcha services are just used to force users identified as having few other options into giving free tagging labour.


Such as?


The answer depends on the type of service you host. I don't know what you need to do, but I do know that filtering IP space is merely security-by-obscurity, it is a cheap and broken solution to the hard problems of sybil resistance. If you need IP filtering to operate on a day-to-day basis, then the security of your service is fundamentally broken.

Tor users do not have any special properties over clear-net users besides low accountability for their IP space. There are other ways to acquire this type of setup that don't involve broadcasting a public list of known exit nodes as an act of good faith. Any sophisticated attacker will be able to easily get ahold of the IP space and bandwidth they need to do their work, whether it's through a botnet or simply because they operate out of some less-accountable country like China or Russia.

IP filtering: now you have two problems!


This is why I'm strongly against spam filtering for email. Spam filters are fundamentally security-through-obscurity. I mean, they don't protect your email from targeted bombing attacks or phishing. If you need spam filters to operate your email on a day-to-day basis, then the security of your email is fundamentally broken.

/s, obviously, I hope.

Blocking Tor isn't a security measure, it's a nuisance reduction measure.


>This is why I'm strongly against spam filtering for email. Spam filters are fundamentally security-through-obscurity. I mean, they don't protect your email from targeted bombing attacks or phishing. If you need spam filters to operate your email on a day-to-day basis, then the security of your email is fundamentally broken.

You kid, but this is completely true Email is simply an incredibly flawed, outdated and broken system, especially when used without PGP. Phishing is a massive problem, and it has only continued to grow in scale because spam, uh... finds a way. At the same time, spam filters regularly create false positives, making email an unreliable transport (leading "oops, it got lost in my spam folder").

>Blocking Tor isn't a security measure, it's a nuisance reduction measure.

You should block all IP space, this will reduce nuisances by 100%. In fact, this will save you from having to consider any real security practices or do your job properly.


The correct analogy here would be implementing spam filtering by blocking large segments of email addresses. Eg, dropping mail from all non microsoft/gmail domains (as a nuisance reduction measure!), with predictable impact on smaller providers and self hosted email.


You're reframing this to make Tor look a lot better than it is. The signal:noise ratio for Tor is epsilon. It's almost entirely garbage. If a network generated spam at rates analogous to network traffic from Tor, yes, I guarantee that network would be on every single email service's block list.

Tor's advocates in this thread keep trying to argue it from ideology, as though anybody's obligated to deal with Tor traffic on principle alone, and not one of them so far has tried to argue that Tor is not 90+% bots and garbage. Funny, that.


With all the blocks in place, is it ever possible to know whether the 90% is still an innate effect of Tor, or actually an effect of sites blocking Tor?

I have Tor installed, figured it would be worth adding my boring browsing to the mix sometimes, but since most sites I try to load block Tor exits, Tor browser now sits unused.

On the other hand, if I woke up tomorrow deciding to start a bot farm or whatever other malicious thing, or course I'd be interested in hiding through Tor and might try it again (don't worry, I won't wake up that way).

So even if a hypothetical 100% of global internet users really wanted to do all their browsing through Tor, they might all reach the same conclusion as me that too many sites are blocked and therefore leave Tor to mostly bad traffic. Of course it's nowhere near 100%, but hopefully you see my point that the sites blocking Tor IPs (and I absolutely appreciate why) can become a self-fulfilling prophecy - and I'm not sure how you'd get out of that loop?


And if everyone blocks all non-gmail addresses then soon enough the snr of non-gmail addresses will also be garbage because you are actively preventing any legitimate user from using them.


I think it would be sensible to block new account registrations with addresses from email address aliasing services (e.g. duck.com) or disposable email address services (e.g. mailinator.com).


I am strongly against any kind of spam filtering that drops/rejects messages that the recipient did not intentionally configure for those kinds of messages. Sorting suspicious mail into a separate folder is fine, preventing two humans from communicaing based on heuristics, IP block reputation and other such bs is not.


Outlook did that actually (preventing two humans from communication without reason)

https://www.linode.com/community/questions/22305/entire-ip-r...


> It's not a hard decision.

Depends on what you imply under 'hard'.

As a IaaS provider I endured alk the hurdles about that and ten years later - I don't care, at least not until my outbound bill is bigger than usual.

Like some of the clients are on CentOS6, on a public facing machines.


I'm a noob, can you give me a pointer?

What kind of abusive traffic is coming through Tor and why do they do it?


Mainly forms -- login forms, comment forms, signup forms. Bots use Tor pretty heavily because it's anonymous and hard to block them without blocking the entire network. Login form abuse is mildly irritating but not a huge deal if you have other measures in place. Comment spam is annoying but there are some options that deal with it pretty well.

But the signup spam was a headache. I didn't want to just blackhole Tor traffic, and tried to reduce the abuse with other tools, including some custom stuff. The final straw was a customer's small business site that had a MailChimp or Constant Contact signup form. Those vendors want you to embed their code by default to render the form, so you have less control over the form itself. There were workarounds, but they all sucked.

Tor bots would sign up email addresses through this newsletter form, and then I'd have to go through and manually scrub them before newsletters went out, or the service would penalize my client for too many bounces/unsubscribes/complaints. Very nearly 100% of the abuse on that particular form came from Tor IPs.

I do not want to spend my limited time on this Earth manually sorting out bots from humans because of one particular network. Blackholing Tor made that problem disappear immediately.

VPNs are dime-a-dozen now, cheap VPSs are available from lots of vendors, there's Wireguard, there's ssh, a clever person could even set up Apache or nginx as a forward proxy with ssl from LetsEncrypt. Tor is well over 90% abusive traffic (https://blog.cloudflare.com/the-trouble-with-tor/). This is a Tor problem, not a me problem. There are better alternatives available.


I think the workflow is the issue with http(s)-based email list sign-ups.

Solution: Require sign-ups by email, so the end account must actively send your mailserver a registration message. This also turns an open-loop control system into a closed loop control system, which is inherently easier to secure / keep safe.


How would this be better? It's trivially easy to spoof email addresses. Someone could sign you up easily, for example.

It's also easy to send "from" an addresses that passes SPIF/DKIM but bounces inbound mail -- not sure what reason someone would have for this other than hurting the service reputation or acting as a DoS of sorts, but it can be done.


> It's trivially easy to spoof email addresses. Someone could sign you up easily, for example.

Proper DMARC configuration is table stakes to send e-mail, which makes that anything but trivial.


But neither the newsletter host nor the email user has any input into how dmarc/dkim/spf are implemented. Only the user's email provider does. And if that's a small business domain, it's likely not very strict with the rules.


I thought DMARC/DKIM was necessary for delivering to Gmail for years now; in any case, there should be few who can't use a backup email to subscribe, as your newsletter won't be the only thing that has these anti-spoof requirements.


Not necessary. Just very highly recommended. I can still deliver my cron emails from a rando host successfully.


That doesn't rule out DKIM, which only requires the `From:` header's domain to list a pubkey and the email to include a DKIM signature from a matching private key. SPF is the one that regulates which hosts a domain's outbound SMTP servers are on.


> Mainly forms -- login forms, comment forms, signup forms. Bots use Tor pretty heavily because it's anonymous and hard to block them without blocking the entire network. Login form abuse is mildly irritating but not a huge deal if you have other measures in place. Comment spam is annoying but there are some options that deal with it pretty well.

Then put the form behind your monopolistic internet gatekeeper. There's no reason for a GET to redirect to a sysiphean captcha treadmill.


https://blog.cloudflare.com/the-trouble-with-tor/

> . Based on data across the CloudFlare network, 94% of requests that we see across the Tor network are per se malicious. That doesn’t mean they are visiting controversial content, but instead that they are automated requests designed to harm our customers. A large percentage of the comment spam, vulnerability scanning, ad click fraud, content scraping, and login scanning comes via the Tor network. To give you some sense, based on data from Project Honey Pot, 18% of global email spam, or approximately 6.5 trillion unwanted messages per year, begin with an automated bot harvesting email addresses via the Tor network.


Say you're running an account take over script that spams login forms with a list of known username and password combos. If a website owner sees thousands of login attempts coming from a single IP address they're likely to block you to prevent abuse on their website. This is annoying for you as you then need to rotate your IP address.

Using tor hides your IP address from the website and makes switching exit nodes very straightforward, so you can run your account take over script in peace.


That's not that easy in practice. There's less than 2k exits normally, not all of them usable. Your abuse script competed with other malicious traffic for those exits and their reputation gets burned pretty much immediately.

So yes, you can switch exits easily, but effectively your switching from one known bad IP to another bad IP.


How often is the list of exit nodes updated?


Daily, I believe. I don't have the file git-controlled. That would be a good idea, though.


> how they made Tor basically unusable on non-onion sites

I wonder if that's such a bad thing. Tor is safer when the traffic never leaves the network. In the ideal world, everything that matters would be inside the Tor network instead of being merely accessible through it.


there are many solid competitors: Amazon, Fastly, Akamai, Imperva to name a few


Fuck Akamai. Have you worked with them? They are the most archaic internet company you can think of. Their UI is stuck in 2000. Just like their procedures.


There is an easy way to get the banhammer from Amazon, and it is possible host a JavaScript page that triggers it for any visiting user.

I did tell Amazon about it, but it fell for deaf ears. The ban lasts for about a week and the internet is mostly unusable in that period


Bunny


Just 10 minutes ago, I got the following email from a housemate (I'm not home at the moment):

> The past few weeks I've been getting tons of redirects to verify my humanity before being allowed to view a webpage. Usually I just have to click the box that says human, not find all the ladders in a photo. SoFi is doing it every single time I log in. Petco, too, along with others who are more sporadic. This is happening with and without uBlock on. Same browser I've always used. ...

SoFi and Petco both use Cloudflare. I do exactly zero web crawling / scraping / abusive anything from my home connection.

I'm noticing a recent increase in volume of complaints about Cloudflare's human verification filter. I'm starting to wonder if they touched a dial.

I had already started pulling some infra back from Cloudflare after their last appearance in the tech news cycle. Now I've got an additional reason to continue doing that.


> I do exactly zero web crawling / scraping / abusive anything from my home connection.

That you know about. Your house mates share the internet connection.

I’m guessing you have WiFi, so you may have unintended guests.

You probably have lots of devices, one of which may be infected.

Your ISP may have issued you a different IP which may have a negative reputation score.

You could be using a malware infected browser or browser extension.

There are lots of variables. You haven’t isolated all of the ones in your control, so assuming CloudFlare is the only possible cause isn’t rational.


It's simple, really. Cloudflare is the single root cause of the issue. All the others are not a huge issue until Cloudflare notices. It's perfectly rational and reasonable to blame the company trying to gatekeep the whole internet, without taking any responsibility for false positives.

Scaling to infinity isn't a right, it is a privilege. Any company that builds this sort of no-human-decision systems are abusing that privilege and hoping that anyone who suffers wrongly under their systems doesn't have enough voice (google seems to be the worst for this, though cloudflare seems set to follow).


I'm not sure how to phrase this without sounding like a prick, but I'm not exactly new at this stuff. You missed on all of those examples. I appreciate the point you're trying to make, but Cloudflare is in fact the primary factor here.


> That you know about. Your house mates share the internet connection.

This is actually the least likely these days... the no.1 cause would be CGNAT, the vast majority of residential endpoints share an IPv4 address with a huge number of users, mobile networks are even worse... that's before we even get to IP recycling for dynamic IPs which happens at high frequency with mobile networks again, so you will inevitably get affected eventually.

This is why it's a bad idea to block IPs outright, because today one IP address never equates to a single individual or the same set of individuals over time. The other problem with blocking IP addresses based on abuse is considering them equal in user weight, yet one IP might have 2 users, another might have 10000 users - Blocking a TOR exit node is a good extreme example of this... people think of it as an effective defence because of the concentration of abuse on that single IP address, but they fail to consider the concentration of users behind that IP address - TOR exit nodes probably are a slightly higher source of abuse per user, but not any where as high as per IP - if you measure abuse per IP you are more likely looking a rough picture of users per IP for highly NATed IPs.


> I had already started pulling some infra back from Cloudflare after their last appearance in the tech news cycle.

What triggered your reaction? That they terminated a customer with zero notice?


You're looking at it all wrong. From Cloudflare's point of view, this kind of blocking is a feature. Anyone doing legitimate web crawling, or offering alternative web services such as Starlink, now needs Cloudflare's permission.

Essentially, for a broad class of web-based businesses, they have made themselves gatekeepers. I'm sure they'll find a profitable use for this position. Charging outright would look bad, but investing in businesses that just happen to not run into Cloudflare-based trouble, but whose competitors do...


I'm familiar with that perspective, and biased towards it... Cloudflare is certainly in such a position, but they are a relatively young company (for their size and reach) and I've seen good things come from them.

I'd guess the intent is unlikely to be anti-competitive or monopolistic, just over-aggressive. However regardless of intent their position does cause an absence of market forces to put pressure on fixing such issues - Similar to how it's become acceptable to have downtime when it's on AWS, because "everyone is affected".


It's true that any wall around you that protects you from something unavoidably comes with a gate that someone else guards, unless you want to guard your gate personally (and that entails filtering out armies of spambots, worst case manually).

As with any power and control you delegate to any entity, only time and good behavior will earn them your trust. That's theoretically what companies are competing for, your loyalty.


Isn’t there an config option to dial down the anti-bot stuff so that you still get the benefit of Cloudglare’s caching but with much less chance of dropping legit traffic from schools, VPNs, etc? I think their lowest setting only really kicks in if they think an ip is participating in a DDOS attack.


My dude, it isn't about money. At least not directly.

I encourage those of you attempting to block Cloudflare to try and host your own website for a bit. Make sure you don't do it on a metered/paid connection. I know one eCommerce site with 1,300 employees that went bankrupt overnight thanks to the AWS bill (and lack of options to get back online, this was prior to companies such as CF). Bankruptcy as in the company filed for bankruptcy and no longer exists. They were profitable for a decade prior. One DDoS attack...

Also make sure you don't have a democratic opinion if you are in the US, like a 50 person manufacturing company. They were shut down completely thanks to saying a single wrong thing about Republicans. CF existed there, but they weren't aware thanks to not having IT folks. They were a non profit.

CF may be evil to some, but there is a reason they exist. I use CF. I don't like throwing money at them every month, however, many of my websites have also been attacked, usually via competitors. We can either deanonymize the internet or allow companies like CF to exist. There is really no other way.


> the company filed for bankruptcy and no longer exists. They were profitable for a decade prior. One DDoS attack...

Being milked dry by a single DDoS is a hosting issue in my opinion, there should be sensible limits in place, AWS is notorious for making it very hard to understand and control this...

Even if you disagree and consider it a problem that must be solved with a separate DDoS protection service, this is not what I am talking about, I think it's a good idea - if there is a clear ongoing targeted DDoS attack, that system needs to engage and do it's best to try to filter through only legitimate users (which is the only point in time it makes sense to potentially block regular users - because the alternative is that no one can access the site).

The problem is this is not how cloudflare's protection operates, there is no throughput trigger, it's always on, it attempts to block bots at all times and has a very high false positive rate.


> I know one eCommerce site with 1,300 employees that went bankrupt overnight thanks to the AWS bill

Had they just called AWS and explained the situation, they would likely still be in business.

I keep a backup DDoS mitigation service for my entire network that costs me less than $200/mo to mitigate up to 100 Gbps.


> That's not what Bandwidth Alliance is at all. It's about reducing or eliminating egress fees between a cloud provider and Cloudflare. Not sure where the idea that it's about sharing IP reputation data comes from.

It comes from the Cloudflare blog. https://blog.cloudflare.com/cleaning-up-bad-bots/

There’s a support page about it too. https://developers.cloudflare.com/bots/get-started/free/


I need to look into that. Thanks for pointing it out. I had totally forgotten about that post.

Edit: team tells me this idea never got off the ground. Did talk with some potential partners (which did NOT include Google) but didn’t happen. So if Google was throwing CAPTCHAs it wasn’t because of our IP reputation.


Dear John. What am I — as a normal human being/end-user — supposed to do in this situation? People can’t do anything without any information about why they’re blocked. Who do you contact? Where do you go? What to do? The challenge page doesn’t help the end user understand why this is happening to them. It’s okay if you only see it for two seconds. But the page stays on screen for over a minute. When this happens for every website — what do you do? You’d be furious if this had happen to you. I’m just trying to read my online comics and lookup some stuff about some interests and hobbies. It reduced my quality of life/sanity for a week. The last two days, I started worrying that this was going to be the new normal. I even looked into swapping ISP to get a new IP address.

PS: I love all the innovation and engineering stuff you guys regularly share on the Cloudflare blog. It’s [almost] always an interesting read. Even though I’m no fan of the massive centralization your company has caused.


Once upon a time Matthew made us set the IP reputation of every Cloudflare office to bad so that we experienced the worst case scenario. Helped a lot.

I don’t understand why you saw one minute block screens. That’s not right. Should be seconds.

I’m talking with the team about your other points.


The main problem of course, and it isn’t limited to Cloudflare and I won’t pretend to have the solution, is that if you are caught in this kind of web, you have no recourse but go public and hope the spotlight lands on you. For every problem we see in an upvoted post there’s tons that nobody sees.


What about answering his actual question?


I haven't been getting challenges that last that long, but I have noticed that the redesigned "security check" challenge pages with the spinner do seem much slower than the old design with the loader that was made of 3 orange dots.


> People can’t do anything without any information about why they’re blocked. Who do you contact? Where do you go? What to do?

This is the most serious problem with all of the major companies these days. Cloudflare, Google, Apple, etc. When you get on their "bad side", you're just screwed. You'll never even know what got them mad at you, and there's nothing you can do to recover.

The only reasonable way to deal with this is to avoid them all to the greatest extent possible. You have no control over whether or not you deal with Cloudflare, unfortunately, which makes them the worst of the lot.


> It’s okay if you only see it for two seconds. But the page stays on screen for over a minute.

That doesn't sound right. You shouldn't see a loading page for over a minute. If you're open to providing more details privately I'd love to help troubleshoot. You can drop me an email at amartinetti @ cloudflare.


What I do when I want a new IP is change my router's MAC address and reboot the modem.


I edited and added a second link to a support page that mentions it too.


Thanks. I'm talking with the team.

Edit: see comment above.


You block this guy from the internet for a week —- for no apparent reason —- and then you come in here with a nitpick about how another related system works?

Really?


The point is that Cloudflare does not beam IP reputation data to Google. If Google and CF are blocking this IP separately, what's the chance there's some malicious device or hacked IoT device on the network, participating in DDOS attacks or unauthorized vulnerability scanning of random websites?


According to another comment, it's a wrong point: https://blog.cloudflare.com/cleaning-up-bad-bots/

> Once enabled, when we detect a bad bot, we will do three things: (1) we’re going to disincentivize the bot maker economically by tarpitting them, including requiring them to solve a computationally intensive challenge that will require more of their bot’s CPU; (2) for Bandwidth Alliance partners, we’re going to hand the IP of the bot to the partner and get the bot kicked offline; and (3) we’re going to plant trees to make up for the bot’s carbon cost.


I'm pretty sure this was for a situation like Digitalocean themselves hosting a bot, but such IP sharing very well might be currently (ab)used by partners, if it's happening here.


Yeah. I'm looking into that.


Yeah, if for example Spamhaus (which both Cloudflare and Google consult) has detected that a subnet is bad then that could be the cause.

Still, it doesn't excuse Cloudflare that there's no redress if you are caught on a block or even a clue on what you can do to reduce it (especially that Spamhaus do have redress procedures).


Fair point


A wrong nitpick, even! Way to look like the asshole.


[flagged]


[flagged]


It makes a very broad claim which makes it sound like an extortion racket but doesn't have anything to back it up. I would bet that if it included some evidence it would fare much better. For example, they have a ton of large organizations which are customers. The very first question the average reader is going to have is whether it's really the case that these sites are predominantly attacked by booter services which use Cloudflare for hosting? That seems unlikely and as general rule here the broader the claim the more people are going to expect you to show that you did your homework first.


The claim was discussed in this post: https://news.ycombinator.com/item?id=32709329

Basically DDOS booters use Cloudflare to protect their websites from competitors, since Cloudflare is one of the best. The same people Cloudflare is protecting (and claims to do so on an ethical neutrality basis) is furthering the need for Cloudflare to exist.


Note that I’m not saying whether or not this is true, only that a comment which links to something like that will generally fare better than one which begs the question.


Its like finding the worst videos on youtube and saying that's their business model.


The tone of this reply is a bit shit from a PR perspective.

How about _also_ pointing to a knowledge base article for how an end user could go about working out what network activity from their IP might be flagging Cloudflare’s systems?


>"Not sure where the idea that it's about sharing IP reputation data comes from."

One source of that would be a blog post on your company's website that was actually authored by you! Point 2 below:

>"Once enabled, when we detect a bad bot, we will do three things: (1) we’re going to disincentivize the bot maker economically by tarpitting them, including requiring them to solve a computationally intensive challenge that will require more of their bot’s CPU; (2) for Bandwidth Alliance partners, we’re going to hand the IP of the bot to the partner and get the bot kicked offline; and (3) we’re going to plant trees to make up for the bot’s carbon cost. [1]

So it's not such a far-fetched notion is it?

[1] https://blog.cloudflare.com/cleaning-up-bad-bots/


They do have a threat score

https://developers.cloudflare.com/firewall/recipes/block-ip-...

I was surprised to learn Cloudflare was born out of Project Honeypot, so I am guessing Cloudflare does share data with them:

https://www.projecthoneypot.org/cloudflare_beta.html


FYI you're responding to the cloudflare CTO


It’s naive to assume Cloudflare CTO would not be lying if beneficial to him or Cloudflare.


I wonder if HN posters have ever held a job before. Can you explain why it's beneficial for Cloudflare to block legitimate users? Why is the simplest explanation "Cloudflare just hates this one user in particular?"


The story I've heard is--because their direct customers are websites, not end users--that Cloudflare loves to be ostentatious with these branded blocks and have a vested interest in offering services which punish users because it makes people feel like the product really really does something. Do you constantly hear about people being hosted by Akamai or CDNetworks or whatever going down due to DDoS attacks? No. However, despite a bajillion websites being hosted by Akamai--including, for example, virtually everything from Akamai--have you ever accidentally been blocked or severely rate limited by one, or been given a CAPTCHA... even behind Tor? I doubt it (and this is coming from someone who nigh unto tried to cause themselves problems with Apple and all I ever got was a subtle speed cap); and yet, I feel like everyone I know has experienced being stuck behind Cloudflare at various points in their lives :/.


Well, apparently they scared this user into installing their browser extension, so it sounds like this incident was a win for them.


That is indeed their goal - this kind of targeted harassment is done deliberately to collect more personal data of the user.

This tactic is quite common among BigTech and something I've experienced with both Google and Amazon - once you are hooked onto their product, one day they will suddenly deny some aspect of their service to you and force you to share more personal data with them to get access to it. For example, Amazon will one day start to ask you to click a link sent to your mobile to access their account, or Google or Microsoft will block your account and ask you for your mobile number to "verify" you etc. When you are blocked from using a service suddenly, however privacy conscious you are, in your desperation you will be forced to comply.

I have experienced this with CloudFlare too once when many website were suddenly blocked for me by CloudFlare on all browsers and I was forced to install their extension to access some information I needed from a website urgently. I have no doubt in my mind that even otherwise, they just deliberately and randomly blocked access to some sites and displayed their "captcha" page just for PR and "brand awareness". Now that CloudFlare has realised this is backfiring on them because of the negative emotions being associated with their brand, they have now redesigned their "captcha harassment" page to give less prominence to their branding than before.


Privacy pass uses [VOPRFs](https://datatracker.ietf.org/doc/draft-irtf-cfrg-voprf/) with the express goal of avoiding tracking, so all this talk about "targeted harassment" is a bit much.


And they also got them to write a blog post, giving massively negative press on HN. I doubt his PII (assuming they collect it) is worth the trouble this thread is causing.


That is easy to explain: because it is easier/cheaper for Cloudflare to build a solution that works for 99.99% of the people and simply throw that extra 0.01% under the bus. So the simplest explanation is "Cloudflare knows random users will be locked off the internet, and is happy with the trade-off".


It's even more naive to assume Cloudflare's CTO would tell lies that can be trivially shown to be untrue.


How would you show they are untrue? Ask? :-D


I don't assume anything. The previous comment was just trying to teach something about cloudflare to its CTO


Can you acknowledge the main point of the article? What should someone do if they find themselves misclassified by Cloudflare's systems?


(not the parent commenter)

That person should start with the assumption they haven't been misclassified and eliminate the possibility that a device on their network is compromised.


(Author here.) That’s missing from the article. But I have logs of the network. There’s nothing out of the ordinary. “I don’t know what I did wrong,” as I started the article, means “I’ve checked logs and such and there’s no indication of anything wrong on my end.”


A task that would be made much easier and less likely to miss something if the affected person had some indication as to what the problem was.


Devil's advocate - would it not then be pretty easy to engineer malicious bots to avoid detection?


Depends on the level of detail provided. That much detail isn't necessary in order to provide a helpful pointer to innocent bystanders.


Do you expect the average user to know how to "eliminate the possibility that a device on their network is compromised"? That is untenable.


No, but I wouldn't expect the average user to write a blog post with unsubstantiated technical claims, either.

I do think Cloudflare could do better here to let the owner of an IP know why they're suffering from poor reputation.

However, it's not immediately clear to me how they could accomplish this without weakening their side of the car vs. mouse game.


But that's beside the main point. You guys are essentially the "single point of failure" for half the internet. [1] Being competent and smart doesn't really help too much, as demonstrated by how you guys had to give in to the pressure to censor recently.

[1]: https://easydns.com/blog/2020/07/20/turns-out-half-the-inter...


What happened to PrivacyPass? It seems to have stopped working completely when connecting over Tor several months back. I say this from having spent several hours trying to get it work on multiple devices with different OS/client software (chromium/FF), both with the store versions and bundling the extension from source.

We did have it working mostly fine for some time back in 2021 but haven't been able to since.

There are multiple open issues reporting this on the GH repo with no real follow-up from maintainers apart from maybe a "should be fixed, open again if still an issue".

ie https://github.com/privacypass/challenge-bypass-extension/is...


> Not sure where the idea that it's about sharing IP reputation data comes from.

Probably from scam called mail blacklists


It is interesting that the Bandwidth Alliance partners list shows pretty much every big cloud provider except AWS and Akamai [0]

[0] https://www.cloudflare.com/bandwidth-alliance/


Yep. That paragraph made me pause and consider that maybe OP is the victim of some compromised device running on their network.

If two independent sites believe you are a bot, you or something at your address just might be.


If you'd like to experience this treatment first-hand, try surfing the web using the Tor Browser.

Spoiler alert: many websites simply refuse to load at all (e.g. any google service, and lots of websites "protected" by CF). Captchas are everywhere: in many cases, you can't even complete simple GETs of blogs without donating free labor to CF.

And the most infuriating part, you get CF marketing messages right in your face while your browser is calculating hashcash (I guess?)... At this point I can recognize every single one of them: something about bots making up 40% of all internet traffic, something about their web scraper protection racket, something about small businesses (???), etc etc...

To be fair, Tor exit nodes have an awful reputation for sure. Nevertheless, I have a hard time forgiving how CF makes browsing the Internet hell for those who actually need Tor.


> And the most infuriating part, you get CF marketing messages right in your face while your browser is calculating hashcash (I guess?)... At this point I can recognize every single one of them: something about bots making up 40% of all internet traffic,

Yeah, there's something amazingly aggravating about CF telling you how much traffic is bots while showing that they can't distinguish you from a bot.


CloudFlare are creating a new devision for advertising to bots. They have projected that in the near future, bots will be 90% of spending, so the bot demographic is the most important to target, marketingwise.

The fact that humans are seeing the traffic meant for bots is an unfortunate side-effect.

I personally welcome our future bot overlords (not only because being unwelcome might be unhealthy for me — why would I publicly disagree with an overlord or not want to be their friend?).


Someone has seen a basilisk...


Cloudflare has mixed up the definitions of "bot" and "abuse". Tor users may or may not be bots, but as long as they don't abuse (spamming or DoS), they ought to be treated the same.


Citation needed.


I think this is more of an opinion than a matter of fact


It wasn’t framed as an opinion. And even if it was, I’m saying I think it is wrong and I want to know why I should change my mind.

The fact is that CloudFlare distinguishes abuse (DDoS at IP layers 3 and 4) completely separately from bot detection. And it allows user controls to domain owners to allow some bots like Google Search Crawler.

So my statement stands: I want to see a citation of evidence that CloudFlare doesn’t have the ability to distinguish abuse.


You don't even need TOR. Try a public wifi that is not in the "preferred geographical location" (i.e. US or Europe). The gaming cafes in SEA are probably responsible for 90% of all AI training datasets lol


I routinely use Youtube with Tor. I will occasionally get kicked off with a "suspicious traffic" message, but it isn't my experience that it "refuses to load at all".


Harsh blocking/limiting/challenging is way too valuable to sites that are actually trying to make money online. It's not going away short of legislation banning it. Losing 1/10,000 legitimate customers to cut fraud attempts, spam, exploit attempts, and so on, by 90% or more, is just too good a trade-off.

I have bad news about the most-likely fix for it, longer term, so we can lay off the IP-based reputation stuff and the geo-blocking: it's tying some form of personal ID to your browsing activity, so that bears the reputation instead of the address.

Sorry. Said it was bad news.


An alternative that preserves some privacy also doesn't seem that hard to imagine... though it probably has its own can of worms*.

Basically, the core problem is digital identities (accounts, IPs, phone #s etc.) are cheap to create (even considering captchas and all) so fraud is easy. The solution could be just to make it "costly" to create new digital identities. For example, you could get a "verified but anonymous" identity issued by locking some assets (could be real world money, or maybe something intangible like community reputation) as collateral with a trusted party (or, for the crypto people, the blockchain). If you misbehave, you lose your reputation on that identity (and essentially your collateral) and have to start over. This lets anyone bootstrap a "minimal" level of trust at the beginning before they can use time to prove themselves trustworthy.

Note: This model might remind some of things like staking in crypto. However the idea is really not anything new... Putting money on the line is really how most low-trust bootstrapping happens.

*: To name a few:(1) this can result in participation being gated by wealth, which can be unfair. (2) it makes accounts more valuable to hack so people need better security practices [re: twitter checkmark]. (3) one would need some authority to decide how accounts lose their collateral or maybe the collateral is just burned to create that initial credibility...


> Basically, the core problem is digital identities [...] are cheap to create [...] so fraud is easy. The solution could be just to make it "costly" to create new digital identities.

We already use this model in practice. It's why so many services require a phone number verification now - they are hard enough to get en-masse, especially if you block things like Google Voice. They even have a big advantage in that they are comparatively hard to hack, as the SIM card is effectively a weak form of physical security key.

I think the big problems this causes is discussed on HN quite often.


Your idea is comes from a good place, but identity theft is already a thing in the real world. Digital identities would also be very stealable. This malware more harmful in the long term. Imagine if your Twitter gets hacked and your digital identity makes it so your Gmail gets blocked.

Similar, the internet is already very difficult for the people with limited means. This would make it even harder.


Easy solution.

Go down to your local post office.

They physically hand you an identity token on a physical $2 2fa device if you give some evidence you live nearby. You can put down the deposit or hand over the device for an old id which is cleared and reused.

It's traceable to the post office but no further, nothing is recorded other than that the token is deployed and roughly when.

Local communities can be responsible for cleaning up local messes. No need for the scammers two cities over to effect your reputation. No need for a corrupt employee handing out tokens to effect the reputation of the token you got ten years ago.


So every country in the world should simultaneously roll out this $2 2FA token?

And the governments of the world are going to do this is an anonymous way?

Who is going to manufacture these 8 billion (Or at least 3 billion if we only count Facebook's MAU) tokens?

And there still needs to be a global database of valid identifiers, else anyone could just create a software token that they can reprogram ever second.

And we expect all people to carry these 2FA tokens perfectly?

And what happens when someone looses this token? The post office has way to prove you owned that token in your proposal.

Same thing for revoking a token. There is no identity out of the token, so how do you revoke it after it is lost? People are not willing to store a piece of paper in a security deposit box.

This "easy" solution is impossible in practice.


You're projecting use cases that weren't proposed.

The only purpose is to provide evidence of not being a bot. Not to log in or verify identity. You don't need a server or proof that a particular token is owned by a particular person, just a cert chain and a list of postcodes with current public keys. The post office has a private key. They sign a message saying 'the holder of this token walked into the store'. Let servers make whatever judgements they wish about the chain's credibility. If a particular key signs lots of bots then you know where to look for the source of the bot farm and the people that live there know where to look to fix their reputation.

It doesn't need to roll out simultaneously. Just be an alternative to captcha that isn't as abusive as device attestation.

The manufacturers will be the same ones that manufacture the hundreds of billions of usb drives and phones and smart light bulbs.

The only problems are it's not as useful for abusing users or spying on citizens as revoking access to general purpose computing, and idiots who project problems onto it that come from use cases that are not proposed or say 'big number make thing impossible'.


It's not really my idea (has been proposed multiple times long ago to the point it has even been implemented in many places).

As for identity theft, it's actually not that common/easy except in the US (which has no centralized national ID issuer and largely depends on hacks building on the SSN).

An besides, protecting digital identity is already important even without this bootstrapping.

> Imagine if your Twitter gets hacked and your digital identity makes it so your Gmail gets blocked.

Flip the services around and you have the reality of today.


> Basically, the core problem is digital identities (accounts, IPs, phone #s etc.) are cheap to create (even considering captchas and all) so fraud is easy. The solution could be just to make it "costly" to create new digital identities. For example, you could get a "verified but anonymous" identity issued by locking some assets (could be real world money, or maybe something intangible like community reputation) as collateral with a trusted party (or, for the crypto people, the blockchain). If you misbehave, you lose your reputation on that identity (and essentially your collateral) and have to start over. This lets anyone bootstrap a "minimal" level of trust at the beginning before they can use time to prove themselves trustworthy.

I've always thought that client certs would be an interesting solution to this problem. Any given certificate can carry signatures from multiple signing authorities, right? So we could imagine a world where there are many different certificate authorities, each of whom have their own criteria for signing a particular certificate and each of whom offer different varieties of assurance regarding the signature-holder's identity.

From here, the question of "should I allow the user identified by this client cert to use my service" simply becomes a question of 1.) checking the validity of the signatures of the client cert and 2.) deciding if the CA's criteria for signing certs aligns with my desired userbase.

For example, a particular CA might insist that their users go through some real-world process to renew their certification every few years, but when they sign a cert it means that the bearer has been strongly vetted as a real person.

An interesting side effect of this auth model is that a service provider accepting certs from a particular CA has someone to complain to if a user bearing their signature acts improperly on their platform. You could imagine a CA which has a code of conduct expected of the users whose certs they sign, and would perhaps revoke a user's certification if too many websites complain.


That's not safe for a lot of sites, though.

I hear that porn tends to be officially frowned on in a fair number of places.

Reading non-approved news is dangerous in some places.

Honestly debating political topics can be super dangerous if you're identifiable.

Sometimes even having a login on a site is dangerous, I think I heard about this after a non-mainstream discussion site got hacked like a hear and a half ago.


So, my thought process here was: given a fairly robust selection of certificate authorities, a given user could have a number of different client certs for use in different trust scenarios. Contrast the following:

- A user bearing a client cert with the name "Jonathan Grant", signed by a U.S. government agency which is known to verify that its signees' certs are a citizen of the United States.

- A user bearing a client cert with the name "Michael Black", signed by Alice, who is known to only sign certs after verifying that the real-world identity of the signee matches the name on the cert.

- A user bearing a client cert with the pseudonym "c00ln4m3", signed by Bob, who is known to only sign a single cert for any given real-world person. (To do so, he verifies the person's real-world identity but does not reveal which cert corresponds to which person.)

- A user bearing a client cert with the pseudonym "hunter217", signed by Charlie, who is known to sign certs without verifying the real-world identity of his signees at all, but who is also known to revoke his signature on certificates if a service provider complains about the user bearing that cert.

- A user bearing a client cert with the pseudonym "cypr3ss", signed by David, who is known to charge $1000/year for a cert bearing his signature but performs no other identity verification.

The point of listing out these different scenarios is that the underlying mechanism (client certs) is the same, but the end-user and the service provider don't actually have to trust each other: they only have to agree on a CA with mutually acceptable policies.


I think this is true. It also reminds me of one possible purpose of regulation and government, given the majority will usually be happy to throw any sort of minority under the bus for the "greater good."

This also reminds me of the anxiety of Google deciding to just ban my account for some reason. They can't be bothered to commit resources to making sure mistakes can be resolved. They don't care to lose a fleetingly small percentage of customers.

Not sure I have an answer. Just a thought.


> Harsh blocking/limiting/challenging is way too valuable to sites that are actually trying to make money online.

I'm not understanding the generalized sentiment here. How would, for example, a retailer benefit from this strategy? How does it protect their bottom line?

I can see how a particular kind of "facilitated user economy," such as games, gambling and promotional companies could benefit, but it doesn't seem that broadly applicable to what most people would consider a "mainstream" business.

> so we can lay off the IP-based reputation stuff and the geo-blocking: it's tying some form of personal ID to your browsing activity

And a new market for identity theft is born.

Also, as someone who serves content and geo blocks it, that's not up to me, that's up to the owner of the content or whoever happens to be licensing it for them. So, even if you sent me a picture of your government ID, it changes nothing.


> I'm not understanding the generalized sentiment here. How would, for example, a retailer benefit from this strategy? How does it protect their bottom line?

The amount of automated and apparently-manual attempted credit card fraud (and exploit attempts, for that matter) any halfway-prominent site with a CC form is subjected to is hard to appreciate if you've never seen it. It's a whole lot. They aren't even necessarily trying to buy what you have, but to validate that their stolen cards work. And they're quite busy. If too much of that gets through—really, any more than a very tiny amount of it gets through—you're gonna have an extremely bad time.

Various CC service providers like Stripe do provide tools to try to block those attempts, but defense in depth is usually a very good idea, including fairly aggressive firewall-level blocking.


> a retailer benefit from this strategy? How does it protect their bottom line?

A couple of examples I can think of is blocking bots from scraping their site for pricing and details and from resellers from buying up all of the stock (see sneakers, electronics, etc). The last example doesn't directly impact their bottom line, but it will make customers go elsewhere.


That's not a solution, it would be way worse. Companies would then make automated decisions and associate them with your personal ID, and spammers/DDOSers would be spending serious effort to hack their way to using the IDs of innocents. So rather than just your home network or whatever getting a sh*t reputation with no recourse, you would.


> it's tying some form of personal ID to your browsing activity

That wouldn't just be bad news, it would be disastrous news. It would immediately render the entirety of the web worthless to me.


How does having a personal ID tied to browsing activity help with spam? Are spammers not real people with IDs?


Of course, but the theory is it's restricting 1 real person to 1 account, versus 1 spammer creating 1,000 accounts via automation.

And once your spammer has been identified then that's them banned/removed, unable to sign up again.


What's to stop them from using fake IDs


Airports seem to be able to spot fake passports pretty reliably.


Try forcing 100% of online traffic through an airport security checkpoint.


Presumably there can be more than one, like real life airport checkpoints? What are you even trying to say?


Spammers typically implement bots to carry out tasks. I mean, technically at some point a spammer is a real person, but when you're automating tasks and using bots, it's not at the same scale.


So what happens when your ID gets hacked and reused for fraudulent activity?

Would you have to submit a dispute with the internet credit agencies? Maybe join a class action suit against the entity that leaked your ID so that they're forced to give you a year of free internet identity monitoring?


The same that happens now when somebody stills your identity and ruins your credit history. You'll have to live in a bureaucratic hell for the next couple of years. And yes, as a compensation, you'll get the $6.99 worth of services from the guilty party. If you win the class action suit, that is.


Exactly. Why on earth would we want to replicate such a terrible system online?

We should be reforming our current credit agency system, not empowering it with a new mandate of judging somebody's social or political creditworthiness.


Then you need to deal with levels of rate-limiting that are fine for individuals but make it not feasible for spammers.

Keeping with the cloudflare topic, if Cloudflare only permits you 10 requests per second (HTML + JS/images) that's still usable for web browsing, but someone running a cloud of hundreds of bots would be effectively shut down. Similarly with email, an individual probably doesn't need to send more than one email per 10 seconds but email spammers wouldn't find any ROI at that rate - business needs being different might necessitate a different registry or something in that case.


Nobody said it wouldn't suck. The only question is whether it sucks less than the alternatives.


If you have a better solution, I'm sure it would be very lucrative.


Looks like Cloudflare beat us to it.


They are already testing out digital IDs. Now link that to the social score... and make the browsers and the sites exchange these data on the background, and make frontend services providers refuse connections from non-supporting browsers as "bots"...


The other not-so-great approach is to act like a normal user. This stuff doesn't tend to happen to the average Joe who browses the WWW. It's when you're doing unusual (albeit harmless) things.


Cloudflare is a regular problem for Starlink users. We're on CGNAT so users share IPv4 addresses. I see CAPTCHAs when using Starlink ten times as often as on my other ISP. I don't think it actually breaks things the way this article describes, it seems like a gentler behavior, but it's annoying.

A few months ago I got on Akamai's naughty list (with my other ISP) for some very light automated website downloading. That was a straight block with HTTP errors and I had to use a proxy to access the Web. It cleared up after a few days.

The lack of any user feedback or support for this situation is really annoying. Reminds you how much power the CDNs have. It'd be really bad if loading websites got as difficult as sending email through all the layers of spam filtering.


I feel like Starlink could at least partially mitigate this by supporting IPv6. T-mobile US supports IPv6, and I hardly notice this as an issue on my phone. Or the time my work ran the business over a 4G mobile while waiting for ISP install.


A genuine question from an ignoramus: how on earth did Starlink launch a brand new ISP in 2020 which doesn't support IPv6? Is IPv6 really so difficult? Does actually nobody care about IPv6 still, after all these years?


Not an answer to your question, but an indicator of shared culture: Tesla vehicles also don’t support IPv6 whatsoever.

Things you might use an internet connection for in your Tesla include triggering air con remotely, live traffic and satellite maps, streaming music or online radio, web browsing, or YouTube/Netflix/Disney+ clients.

It completely refuses to use IPv6 over mobile or wi-fi. Also it refuses to access anything over IPv4 (apart from DNS) which resolves to an RFC1918 address, even if it's connected to said RFC1918 network.

So yes, Starlink and Tesla are different companies, but I see cultural parallels which I'm sure surprises nobody.


That's wild. I wonder if Musk has some wealth tied up in the scarcity of IPv4 addresses, like an IP NIMBY.


I've been wondering that too, it's kind of confounding. Particularly since Starlink started with CGNAT to manage a limited IPv4 address allocation.

FWIW there've been hints from time to time that Starlink was working on IPv6; users reported being given working addresses. That mostly stopped though when they handed over ISP operations to Google last year.


> Cloudflare is a regular problem for Starlink users. We're on CGNAT so users share IPv4 addresses. I see CAPTCHAs when using Starlink ten times as often as on my other ISP. I don't think it actually breaks things the way this article describes, it seems like a gentler behavior, but it's annoying.

I've been noticing this too, and it's why Starlink remains my secondary ISP/bulk transfer connection. If I had to drop one connection, I'd drop Starlink for this reason alone.

There are some sites that I simply can't browse, and it's not Cloudflare errors, either. Lowes, in particular, simply returns error pages for anything but the main landing page on a regular enough basis. Of course, my observed public IP changes so it's not consistent, but it's genuinely annoying.


> I've been noticing this too, and it's why Starlink remains my secondary ISP/bulk transfer connection. If I had to drop one connection, I'd drop Starlink for this reason alone.

Could cloudflare legally charge them a bribe to captcha their users less? It isnt good to have a company in this position of power if so.


> If I had to drop one connection, I'd drop Starlink for this reason alone.

Why are you using Starlink at all if you have other options?


Because my other connection is a 25/3 WISP link that mostly doesn't. I generally see about 5/1 in the evenings, if that.

I've had several area WISP connections, as there's no wired infrastructure to my area, and they vary in quality. I work full time remote, so I need two connections as a general habit - I can work with one, but when that one is down for a week straight, I have problems. I like being able to fail over.

I typically keep one connection for "interactive" traffic, and one for "bulk transfer/failover" - things like my local Ubuntu repo mirror, offsite backup traffic, etc. And I can fail to it if needed, which I do often enough.

On a good day, Starlink is far better than my WISP connection, and I have some machines routed out it persistently. On a bad day, I can't hit much from it, because that particular public IP has been blocked from large parts of the internet. It's very hit and miss, and overall bandwidth has definitely dropped from the early days, though reliability of getting packets where they need to go is drastically improved.


I wonder if a IPv6 tunnel broker to get IPv6 addresses would help with your Starlink problems.


Starlink supports IPv6 addresses, I'm sure it would help. My network infrastructure is just lacking general IPv6 support, as I've not cared to get it set up in great detail, and my testing has demonstrated that my IPv6 addresses behind my router are publically reachable, so... I'll get around to proper firewalling at some point.

I could do a range of things to solve it, but as I have two ISPs, I typically just switch to the one that works better for the solution. I'm aware it's not a technically fancy solution, but it's quick and easy (change the gateway on the machine), and works fine.


What archival tool were you using? I've been looking for a replacement for HTTRACK forever.


A combination of shotscraper and metascraper; really more web previews than archives. And in a single thread, to different hostnames, maybe one every 10 seconds? Honestly surprised Akamai or anything even noticed. I fake my user agent now, lesson learned.


But any automated tool won't work. I have a similar problem with my self hosted feed reader, my vps hosting ip doesn't have 100% reputation with Cloudflare and I can't download some feeds

Edit: spelling


I moved from a local CGNAT'ed WISP to starlink.

Starlink is at least 10 better (fewer captchas).

I'm really hoping cloudflare gets busted for having backroom deals with big ISPs or something. (For instance, if the cgnat had a cloudflare CDN cache endpoint behind / accociated with it, I suspect the IP would be white listed.)


Cloudflare said they're working on this- https://blog.cloudflare.com/eliminating-captchas-on-iphones-...


That's... The opposite of working on this. It's moving the internet further away from being an interoperable, endpoint-agnostic medium.


They're working on double dipping by providing both the problem and the solution. Somehow this is not a recurrent issue for every other CDN / ddos shield. They're not even mentioning any other hosting company collaborating on this open solution that requires hardware from a specific company they totally don't have a deal with...


Anecdote: I've been using Starlink for about a year now, and I've had no trouble with Cloudflare.


If you surf on desktop sites from Philippines on a mobile phone plan (which is often the best Internet connection in that country) you also get Cloudflare's captchas everywhere.

I told it before and tell it now again: Cloudflare is dividing the World between first and second/third World countries with their captchas. I call it discrimination of second/third World countries! If you are from US and Europe you will never notice it but if you travel a little bit more you see these blocking captchas everywhere.


I’ve had a similar experience in India with wired internet from a local ISP: CGNAT is used so there are who knows how many customers on the same IPv4 address, https://iknowwhatyoudownload.com/ shows at least forty hours of movies being downloaded every day, the IP address is on half the blacklists out there because someone is part of an email-sending botnet, and yeah, Cloudflare hates you.


Is there even any way to reliably identify individual users behind a CGNAT without invasive fingerprinting?


I am from Europe and I notice if I use some non residential ip. The captchas are extremely annoying especially when trying to access a site I have already been logged into with 2fa. Who is protected in this case.


I get it browsing from a major ISP in the US. I have the gall to browse in private mode and to block trackers and ads because of all the malware they contain. (And I don't use a browser that requires me to login just to browse the web - gasp!) And apparently, that means I'm worthy of this sort of punishment as well.


> I have the gall to browse in private mode and to block trackers and ads because of all the malware they contain.

I do these things as well. It’s been months since I’ve seen a CloudFlare challenge page.


I don't get them as frequently as discussed in the article, but they come up a few times per week.


The other side of this story is that PLDT stands out from other residential networks as a persistent source of web form spam. I’d love to learn what’s going on differently there.


I get these a lot and I'm from EU. But it's "seasonal".


Maybe your mobile ISPs dont do enough to stop malicious/spam traffic. That's not Cloudflare's fault


It only affects Cloudflare hosted sites though.


That's true, but it's the Cloudflare customer who decides what to block by default with their Firewall setting mode, and custom rules etc


the default is to block most malicious sites or something which is in their opinion everything outside of regular US and Europe networks. And that's wrong. Many people also do not change defaults.


No, it affects visitors of “Cloudflare hosted sites”. It also affects all sites not hosted by CloudFlare.

You are complaining about the use of IP as an imperfect signal. Everyone involved knows that. But it’s still better than the current alternatives.


There is a chance you might’ve been hacked.

You would be surprised to see how easy it is to hack domestic routers.

1. Find and disinfect the devices, including the router. If you don’t have enough technical knowledge, then buy a new router.

2. Use 30 character long random password on the router.

3. Disable UPnP.

4. Anything with WI-FI and weak password can be hacked within minutes, so check your other devices as well, especially IOT ones.


My assumption is also that something on his network is compromised, and getting his IP into reputation issues.

Tarpitting (serving content slowly from the edge, in order to slow down bots) is necessarily one of the most expensive tools in a WAF/CDN's toolbox.

It's much more likely that something on his network is sending sketchy traffic to CF-fronted/Google sites, and the slow loading he's experiencing elsewhere is because his upstream is being saturated by whatever is happening on his network.


(Author here.) My router isn’t a domestic router. It’s a MikroTik running RouterOS, completely unsupported by the ISP. Outgoing connections and DNS is logged. UPnP is only allowed for the Xbox, PS4, and off-most-of-the-time gaming PC. Nothing out of the ordinary in the logs.


> It’s a MikroTik running RouterOS

https://google.com/search?q=mikrotik+botnet

These things are the absolute scourge of the internet.


They're a powerful tool that lets you shopt off your foot and half your brains with the same bullet. However, this my router isn't compromised. MikroTik routers can easily be misconfigured to be insecure or misbehave. It's a Cisco clone, so that is the product you're buying.

I don't recommend them to anyone who doesn't enjoy and are familiar with the lower-level intricacies of network operations.


> It’s a MikroTik running RouterOS

It's almost certainly compromised.


No. It isn't.


Why would you disable UPnP? You're gonna break most collaboration tools/video games/etc.


Disabling UPnP doesn't break much. I've used enterprise firewalls at home for years, none of them have UPnP, I've never noticed a problem arising from that lack. I don't have a problem with video games or collaboration tools

UPnP allows devices inside your network to open ports to the outside world without your knowledge. I think everyone should avoid it if they can get by without it


It’s absolutely required for most multiplayer games. Many need random ports and some even refuse to work if UPnP is blocked even if you manually open a port for them.


I've never had UPnP enabled and I don't have any problems doing online gaming / flight sim / video chatting / etc.


Same. I've found the biggest problem was SNAT rewriting the (source) port number. netfilter, by default, doesn't do this. pf does but you can configure it not to.


On series X you can set up port forwarding really easily. I had to do it for openwrt


What's your solution for the grandmother who just wants to make a zoom call to her grandson? Have her log into her router portal and setup a static ip for her laptop and then port forwarding routes for zoom?


I don't think zoom uses UPnP. If it did, that would cause problems on corporate networks that typically have UPnP disabled.


STUN servers? Also, while I (not GP) do think UPnP is dangerous, I also think it's only something you disable if you know you can live without.


To be frank, that's exactly the problem with NAT-PMP et al. assuming that there's no router bugs: the ability to forward ports has been abused to set up bot relays on hacked IoT devices. This is why I predict that even in IPv6 era we would still have to rely on a TURN-equivalent.


That's exactly the problem with NAT-PMP?

So what's your alternative for peer to peer connections? Static routing that the common end user can't figure out? Re-centralize connections?

UPnP is necessary.


I'm simply pointing the problem, a real-world an realistic problem, and you're acting like it's a non-issue. Point me a CGNATted network which has enable port forwarding. Does it break a lot of things? Oh, absolutely. Did the carriers still not activated it? Yes. Automatic port forwarding is only beautiful when you know how would your device react. It's ugly when you're a network administrator who don't control all devices.

There is no "perfect" solution here because the real world is a messy place with devices that you cannot personally vouch for.


So this gets me thinking. We know Cloudflare will boot a site if they really don't like them. Now, what happens if Cloudflare doesn't like you? I mean, really really doesn't like. Maybe, you said something wrong online or participated in a wrong group activity, or something like that. Is it the case that they have the power to essentially deny you (provided you have a static IP and don't use VPN, say) access to a major part of the Internet? And you can do absolutely nothing about it?

I know they haven't done anything like that yet. But the technical capability is there, and we all know how short is the distance between technical capability and doing it, when the appropriate pressure is applied. So I wonder, how long before activists start demanding for CF to boot people from the internet, and how long before CF caves in to that...


> and we all know how short is the distance between technical capability and doing it

Fact-less conspiranoia.

The CIA has the operators, equipment, and info to be able to kill almost any US citizen in a couple of hours for arbitrary reasons. How many times have they done it?

You are overweighing how much technical capability factors in and very much underweighing the costs of doing something like that. Opportunity costs, collateral damage, unintended consequences, reputation costs, brand harm.

Hell even ethics and morals of those involved. Who do you know would want to work for a company that did that? Who do you know would program that feature and not say anything about it? Why do you believe that CloudFlare would have so many of those kinds of people working there, but you know so few?

Why not make the same complaint about your ISP, your hardware manufacturer, your OS manufacturer? You have exactly the same amount of evidence they are doing this or could do this.

Remember that US criminal system attributes 3 elements to a crime: {means, motive, and opportunity} and even then we use evidence and an assumption of innocence. You just threw out every part except “means”.

I’m not defending CloudFlare here so much as tired of conspiracy theories and paranoia and social panics. We have enough of those things right now.


> Who do you know would want to work for a company that did that?

Pretty much anyone who works for Twitter, Facebook, Google, Paypal, Venmo, Amazon, Microsoft, Gofundme, Mailchimp, Tiktok, Reddit, Nextdoor, and many other tech companies routinely engaging in censorship and unpersoning. The idea that people in tech are some kind of high morals freedom lovers that would never work for a company that censors doesn't suffer even minimal scrutiny. If anything, they'd refuse to work for a company that doesn't censor enough - Twitter workers were in utter screaming panic when they thought Musk could but Twitter and relax the censorship a bit. So if anything you just disproven your own argument - maybe what will force CF to censor is not external pressure but the internal one. I don't see why Cloudflare workers would be any better than Twitter ones.


> The CIA has the operators, equipment, and info to be able to kill almost any US citizen in a couple of hours for arbitrary reasons. How many times have they done it?

How would we know?


How would we know if a homeless person was the second coming of Jesus Christ?

Non-falsifiable claims are easy and limitless. Their credibility without evidence should be proportional.


> How would we know if a homeless person was the second coming of Jesus Christ?

I think the eschaton would be pretty hard to miss, what with the rapture and the final battle at Armageddon and the horsemen and whatnot.

The CIA’s entire bailiwick is covert operations. A government agency that’s been specially trained to engage in illegal covert operations involving lethal force without getting caught isn’t going to leave too much evidence, which means a lack of evidence doesn’t really tell you anything either way.

That having been said, the CIA was involved in the Obama-era drone strike program that ended up killing four US citizens (only one of whom was deliberately targeted). Frank Olson also died in suspicious circumstances connected to MKULTRA in 1953.


> Fact-less conspiranoia.

I love how people reflectively answer with cries of "no evidence!" to something that presents the evidence about exactly the thing they are claiming has no evidence. I get a distinct impression that the only person they're trying to convince is themselves, by self-hypnotically denying the reality in public.

There's a fact of CF booting sites, there's a fact of CF having IP blacklist, there's a fact of getting into IP blacklist being a very frustrating experience, there's a fact of various activists itching to make the lives of their political enemies a very unpleasant experience and launching successful pressure campaigns to do exactly that.

Did that happen with CF and IP blocking? No, I explicitly said it didn't, at least - I don't know any cases of it. But there's a lot of facts confirming there's a capability and motivation to do so. You may not believe it would happen, and you have a right to believe so, but when you are denying known facts, I don't think your beliefs are based on anything but wishful thinking. Your argument would be strong if you showed that, despite the known facts, it still couldn't happen. But instead to claim it couldn't happen you have to deny the facts.

> How many times have they done it?

Probably more than I know, but it's too big to bother with me, so I'm not too concerned about it right now. Maybe if I was in the same business as Assange, I'd be worried more.

> very much underweighing the costs of doing something like that.

Like what costs? You mean to say, no major provider would dare to boot the person from the Internet? Like Facebook, Twitter, Paypal, Venmo, Gofundme, Google, Amazon, Microsoft, Mailchimp, Tiktok, etc. would not dare to block people for political dissent and expressing unpopular opinions? Because, you know, opportunity costs, collateral damage, unintended consequences, reputation costs, brand harm. That' just couldn't happen. All that is fact-less conspiranoia.

> Why not make the same complaint about your ISP, your hardware manufacturer, your OS manufacturer

I can buy different hardware. I can install different OS. With some effort, but I can connect to a different ISP. Any of that won't help if Cloundflare would refuse to talk to me.

> Remember that US criminal system attributes 3 elements to a crime

Oh, but that's not a crime. That's the beauty of it - remember, it's a private action of a free enterprise, and you have no rights there. And even if the government would hold weekly meetings with Cloudflare suggesting them who exactly needs to be banned, it's still free enterprise, right? I mean, excluding the fact that the government would never do something like that, because reputation costs, brand harm, etc. That's another instance of fact-less conspiranoia, of course.

> I’m not defending CloudFlare here so much as tired of conspiracy theories and paranoia and social panics.

That's nothing. Imagine how tired you'd be when it turns out everything you thought is "paranoia" is actually happening. Of course, it would never happen to you - you'd never disagree with the government, or any people in power, or voice any unpopular opinions in public, would you now?


My entire post was about targeting specific visitors. Zero evidence.

> That's nothing. Imagine how tired you'd be when it turns out everything you thought is "paranoia" is actually happening.

Every genius and every crackpot experience this. The sad fact is that crackpots outnumber geniuses by a factor of hundreds.

You missed my point about means, motive, and opportunity by a mile.

None of the web services/app you mention block specific visitors; only accounts. Again, my entire comment was specific to the hypothetical capability and desire to block per visitor, not domain/account or IP.


You don't have to be a genius to see the facts. Just somebody who is not working very hard to not notice what is going on.

> None of the web services/app you mention block specific visitors; only accounts

So? How the situation with visitors is specifically different? Yes, they didn't want to block visitors for now, but what if they feel like it?


There’s a real lack of education I’ve seen in developers for small projects who go directly to cloudflare for anything and everything. They don’t understand that they are immediately losing a large chunk of their user base who is either from the third world or is privacy literate. Devs working on projects that are targeting those groups need to understand the tradeoffs from using cloudflare.


Obviously they don't. Clourdflare's markets itself as a super easy set-it-and-forget-it solution. The problem is that it isn't. The defaults are broken and it requires careful configuration and monitoring. Of course this isn't good marketing so the only way a user can know is posts like these or to accidently block their users and hear the reports. (Obviously Cloudflare's UI will only tell you how evil bots it blocked were.)


The rise of Cloudflare is the first real threat I've seen to ordinary people running webcrawlers.


Which is in turn a threat to the open web in general. Could not agree more.


Tragedy of the commons, unfortunately. There were a bunch of cases where web crawlers and scrapers built competitive services on the back of the services they scraped, some of these ending up in courts [1].

[1] https://www.derstandard.at/story/1389860104020/eu-gerichtsho...


Maybe there are but the specific example you linked is about real-time API use and unrelated to scraping/crawling.


Reputation systems should be based on /abuse/, not on automation. I also ended up on the naughty list for running an archival scraping program. Trying to preserve part of the Internet is apparently against the rules. It's really a shame because my code honors rate limits, doesn't spam, and is completely docile.


> Trying to preserve part of the Internet is apparently against the rules.

It’s against copyright laws too, unless you get the right holders’ go-ahead first.

Some regional differences, but it’s mostly not allowed with a few exceptions for some institutions.


Imagine all the people in countries deemed less desirable by Cloudflare that go through this all the time. Cloudflare, whether it's their stated goal or not, is re-stratifying and re-centralizing the Internet because of their desire to be a monopoly, and we'll all suffer as a result.


there are multiple other large CDNs out there... its a lot more like 5 market leaders tbh


But how many of them:

1) refuse to take responsibility for content they host by claiming they don't host

2) discriminate against huge parts of the Internet with no publicly known rules, nor methods to change that discrimination

3) make the abuse reporting process intentionally difficult and time-consuming

4) want to aggregate all the DNS data they can by making a deal with Firefox to turn on DNS-over-https by default without asking or even informing end users

5) want to re-centralize the Internet, in part so they can mix bad actors with good, in ways that make blocking next to impossible

How many of them do the discrimination we're all writing about here?


1) refuse to take responsibility for content they host by claiming they don't host >CDNs don't host content, they proxy it

2) discriminate against huge parts of the Internet with no publicly known rules, nor methods to change that discrimination >Not large parts of the internet, scammy and attacky parts of the internet. If the rules were public they wouldn't be effective.

3) make the abuse reporting process intentionally difficult and time-consuming >simply untrue, every abuse report i have filed has had an answer back within 24hrs

4) want to aggregate all the DNS data they can by making a deal with Firefox to turn on DNS-over-https by default without asking or even informing end users >this is a good thing as they are audited as having not keeping logs of dns queries

5) want to re-centralize the Internet, in part so they can mix bad actors with good, in ways that make blocking next to impossible >again every cdn centralizes the internet, and many sites need this protection


Wow. You've really drank the Flavor Aid. You didn't even offer any new, actually useful information.

Hosting is providing services without which a presence on the Internet won't work. Hosting was around before the web, so how is it that you think you can magically come along and declare, "this is now the definition of hosting"? Only through bullshit.

If by "scammy and attacky parts of the internet" you mean whole countries, good for you for being an elitist.

I'm happy that you've "had an answer back within 24 hours", but that doesn't address the fact that it's time consuming and arduous. Notice that you didn't respond to that part at all. Their reporting site doesn't have an option for spam (because they don't care), the Javascript allows more text to be entered than the form will accept (so you have to know to go and delete some), and it doesn't allow nearly enough in the first place. For someone who wants to forward abuse to abuse@cloudflare, it's shitty and it's a way to discourage abuse reporting.

So tell me about how Google, Facebook and Amazon have never lied about what they're doing with data. Then go ahead and explain to me how it is that we're just supposed to trust Cloudflare. Audited by whom? When? How is there conclusive, testable proof that the data isn't analyzed or siphoned off somewhere else? Are you ignoring the fact that this is in part an attempt to become a monopoly, and in part an attempt to make it so that network-level filtering next to impossible? You didn't reply to any of this, which makes you seem all the more like a paid shill than someone who actually cares about an exchange of ideas.

But then you say, "every cdn centralizes the internet", which means you're either willfully ignoring the points brought up here, or you're really, really clueless and don't know how to respond to point brought up, so you talk about other things instead.

We don't need any more paid shills. If you really don't understand the points brought up about how Cloudflare is working tirelessly to become a monopoly in ways that are measurably different from regular CDNs, then ASK. If you don't understand how we (the Internet collectively) are going to assume that Cloudflare cares more about making money than about doing the right thing, then please look at all the privacy nightmares we've learned about Amazon, Google, Microsoft, Facebook, et cetera.

If you're just here to tell us how much you love Cloudflare, that's fine, too, but you don't do that by just randomly disputing points with irrelevant responses.


tbh I think one of the very few positives of having so many sites going through a few CDNs is that you can make it impossible to block a protocol or site without significant collateral damage, which can be a good thing, things like Tor's meek bridge rely on that.


If an ordinary user would have to deal with google/CF bs everyday as I do, they'd burn their computer.

PS Proud user of Firefox + resistFingerprinting=true PPS Ain't nothing better than CF guard page constantly-reloading on 20% of sites if you open some url :( No, fella, you first have to open the root '/' page so that guard page finally can either pass me through or show the cloudflare captcha. Ugh. Progress, they say.


Its amazing how Cloudflare became another tech monopoly that can decide the lives of ordinary people in a totally unregulated, private fashion.


Is it plausible some ISP shared some IP address that was on Cloudflare's list of suspicious IPs, or that some IoT device on this person's network created a burst of suspicious traffic?

I get that this sucks for the end user, but I wonder how much we should blame Cloudflare vs the wider systemic challenges of managing DDOS protection on the web.


I believe that might happen, but then I also believe it's the ISP's responsibility to ensure that its IP addresses are kept clean


For sure, the point I'm making is that there's a multi party transaction here, with systemic complexity. Makes it hard to pin responsibility on just Cloudflare (or just the user or just the ISP, etc).


Cloudflare is the one blocking a user based on things that aren't their fault; I'm happy to blame them.


That's fine, but you are ignoring the broader picture if you do. You've correctly identified a detail, but haven't placed that detail in context.


I'm not ignoring the context, I'm saying that it's irrelevant. Cloudflare made the choice to block real people based on factors outside of their control, and then to market that product as a panacea; they don't get to pass the buck, doubly so when they don't expose enough information to let other people fix the things they broke.


I'm used to getting assaulted by Cloudflare's browser check interstitials along with random Cloudflare and Google CAPTCHAs because (presumably) I run Firefox and an ad-blocker instead of vanilla Google Chrome. It's already tremendously inconvenient to wait multiple seconds on many page loads and click 20 bicycles, I can only imagine how infuriating it would be if every page load started taking 60 seconds because your IP ended up on some random algorithmic blacklist....


I use firefox and an ad blocker and I don't see these CAPTCHAs ( except for a few rare instances that I can recall). Something else must be going on to get you flagged.


I'm also on firefox w/ adblockers and had similar issues... the 'privacy pass' plugin solved this for me.


I actually think that Cloudflare is setting up the foundation of Chinese style (but privately outsourced in the US case) censorship machinery in the US. Between their AI erroneously flexing its power, Kiwifarms scandal and similar, they are emerging as a rival to Google in its censorship effort. One of the most dangerous companies in the internet.


Yeah, this just continues to reinforce my opinion Cloudflare. It's not something I would ever recommend, and there are numerous other superior options out there. I see Cloudflare failing frequently enough that if it were something I was responsible for, I'd be embarrassed at the very least.


I'm curious if you've had experience with their enterprise package?

I can understand people's gripes about things on the free/cheap packages, where Cloudflare makes decisions for you, sometimes ones you don't like.

But as an enterprise customer, I've never found it to be anything short of fantastic - I can tailor it to behave exactly how I want, and not interfere with my customers.


Your response seems to ignore the very article being discussed.

Or are you suggesting that if you're having trouble visiting sites because of Cloudflare, you should become an enterprise customer? (slightly sarcastic, but not completely)


My response is simply trying to understand where you are coming from. You've mentioned there are numerous superior options and you would never recommend it.

I'm wondering (genuinely!) if you are speaking as an enterprise customer or a free plan, or what.... both for the sake of meaningful discussion and potentially learning about even better options for my own work.

As to the article - I fully believe the responsibility lies with site owners to pick and choose how they want to serve their sites. Nobody is forcing them to use Cloudflare on a free plan, or to ignore any analytics it provides and make sure it is serving their customers correctly. Cloudflare is one piece of a delivery solution, and only works as well as you configure it. If your decision for your app is "I'll just use the free plan, and let Cloudflare decide everything for me" then you get what you pay for.

If Cloudflare is getting in their way, they can go somewhere else.


What superior options would you recommend that are privacy focused and free?


You can't have privacy-focused and free services. You're either the paying customer or the product being sold.


Yea, but it doesn't address the original poster's statement that there are numerous other superior options. I just wonder which ones they mean.


Not saying that this is the case here, but this may be possible due to having a bad tab open. Especially over cellular. Haven’t looked into it with any depth, but I’ve had correlations on a much shorter timeframe. Suddenly, CloudFlare and/or Google start questioning my humanity, so I close all tabs. Then okay. Sloppy hypothesis with no evidence: JS gone haywire


Here's a handy list of correct uses for IP addresses:

1. Packet routing

In other words, I wish services like Cloudflare were made illegal.


I have two dedicated home internet IPs (one iCable fibre and a China Mobile 5G fallback/quarantine WiFi) and get these "checking if your internet connection is secure" interstitials all the time now. Also see them on my HKBN work connection.

I'm from Hong Kong and suspect the whole territory is on the naughty list.


If you haven't done anything, someone else might have. Check your router logs for strange devices and activity in your network, also check your machine/s for malware.


(Author here.) Plenty of logging of outgoing connections and DNS. Nothing out of the ordinary.


Is your IP address listed on https://www.abuseipdb.com/ or any other spam blocklists?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: