Cloudflare, now offering to be your Single Point of Failure

There have been many articles about the downtime issue with Cloudflare last week, so I won’t get into the technical details of that. However, there’s the fine print to remember. Consider this a subtle reminder that core Internet infrastructure services like Cloudflare’s DNS-based “Always Online” caching and packet inspection security services do not come with Service Level Agreements even at the “Pro” account level. Even with a Pro account you are paying for a service with no uptime guarantee and you must only hope that it resolves your sites the majority of the time. This is fine, this is what the contract says: no SLA unless you pay for the Business account. An odd naming convention given that most Professionals are using their websites for business and would want the SLA, but I digress.

So, the SLA is not really the issue if you look at the architectural alternatives to building an architecture that desires availability when your primary and secondary DNS servers potentially going offline. The typical design involves using more than one and certainly more than two DNS servers for your domain so that your domain addresses will still resolve if the primary and even the second go offline. Typically these servers will be on separate subnets and even in separate geographical regions so that events like tsunamis and dataceneter fires do not take out both your primary and secondary name servers; so there are options for a third and fourth resolver; but not with Cloudflare.

Cloudflare limits the user to only using their DNS servers for your domain – of which they only provide two resolvers, not three or four like most DNS services. So if you wish to have a third or fourth name server entry to ensure that even if the primary and secondary Cloudflare DNS servers go offline, well sorry, you cannot do so. Cloudflare will disable your domain in their system if you use any DNS entries that are not their own – which includes a third or forth setting. So now you have your “Professional” websites using a DNS and security service that has no SLA but which you are paying for “Professional” level services. If your “Professional” grade sites go offline because Cloudflare botched the router upgrade or was hacked, you’re SOL and you do not get downtime credits, sorry. You can’t even design your architecture to resolve with alternate name servers or they will disable your domain. So if Cloudflare ever goes offline your sites will go offline with them and there are no alternatives. If you use Cloudflare then their service becomes your Single Point of Failure.

I am not one to create drama but this is an issue that none of the other users of Cloudflare “Pro” account users that I’ve talked to were aware of. So, here is a recent email exchange with Cloudflare regarding a credit for having caused all of my sites to be offline on more than one occasion — this is not limited to the recent event with their routers.

Cloudflare: “I’ve reviewed your account and note that you currently have 3 Pro subscriptions with us. At this time we do not offer a guaranteed level of service or SLA for our Free or Pro plans… We are also investing a great deal of time and resources to ensure the resiliency of the network even in the event of localized failures that may happen from time to time.” — So not only is there no SLA but there could be localized failures from time to time that you also do not get credit for; that explains the monitoring failures for some of my sites that are in the same rack and some even running on the same servers but only the Cloudflare enabled domains are shown as being offline at the same time as the others being available.

My response to Cloudflare. “I will not keep Cloudflare running for my sites. There are many reasons but another one has recently made itself know to me when I decided to add tertiary and quaternary DNS servers, yet I run into the following technical limitation that precludes my ability to rely on Cloudflare for my back-end infrastructure domain: if I want to specify a 3rd or 4th DNS server as a backup resolver (like Route53 or my own servers running PowerDNS for example) then the Cloudflare system complains and disables my domain. I understand why the system is designed this way — you want all traffic going through the CF system so that the features are executed and so forth. However, in the event that an issue occurs like the previous outage then there is no fall back for users/systems to resolve my domain via an alternate DNS system. I am limited to two Cloudflare DNS servers and nothing more.

The way that the DNS requirement is setup makes Cloudflare an all or nothing solution – you either use Cloudflare for the domain or you do not. And, as 785,000+ sites experienced, this makes Cloudflare (no matter how resilient and improved after this incident) a single point of failure that system engineers and architects cannot design failover services around.

This is the second time that I have had issues with Cloudflare services not working correctly. The first was when one of my servers went offline and the “Always On” feature didn’t do anything, the site was not kept online via cache even though there had been plenty of time for the crawlers to get the content (which is static unchanging, non dynamic, non-database driven = simply a front page that is supposed to load fast and act as a click portal to our primary systems).

And now I have been seeing users connect to my site from countries that I have setup in the block list. I have a number of ‘trouble’ countries configured in Cloudflare to disallow access to my site yet these users are connecting anyway. Clearly the country blocking feature is broken as well.

I want to use Cloudflare. I want to love it. I want to tell everyone I know how great and useful it it. But after six months of using it on several sites it has done nothing more than cause me a lot of time trying out different configurations and wondering why feature x/y/z isn’t working as stated. Then there have been the outages from human error and incorrect ITIL process adherence.

So I will be setting up some alternate caching servers at different datacenters and moving some of my content onto a CDN. Cloudflare has failed and I am tired of wanting to like it.”

Tagged , , , ,

13 thoughts on “Cloudflare, now offering to be your Single Point of Failure

  1. Matic says:

    Business and Enterprise plans both have 100% uptime SLA. Also, Cloudflare uses DNS servers with anycast addresses, so there are more than just 2 servers. Considering what Cloudflare is offering, it’s cheap.

  2. admin says:

    Considering that their country blocking feature is broken, that even their SLA accounts suffer downtimes (more than just the recent one), and considering the solutions available from other DNS providers combined with a better CDN (like Akamai) I would argue that Cloudfare isn’t cheap at all. Additionally, due to the constraining requirements of their service (cannot use other DNS services in addition) you are allowing them to be your single point of failure. No one should ever be forced to rely on a single DNS provider for their enterprise services.

  3. Bryan says:

    Your not accurate about having only two dns servers with cloudflare. They use anycast so technically you have two name servers in every POP which is currently 25 x 2 geographically disperse data centers. These are I’m sure highly available as well so each data center may have more than two dns servers serving your domain. I don’t know anywhere else you can get 50+ anycast dns servers for free. The SPOF still exists for an issue affecting all their data centers but they are quite resilient

  4. admin says:

    I understand anycast technology and how the two DNS entries can translate into more than two physical servers, however if I want to have a non-cloudflare DNS server as my 3rd or 4th entry at the registrar then that results in CF disabling the services for the domain. This limits the control the user has over their DNS architecture if they want to use CF services. It’s very simple to implement secondary DNS service for a domain so that the next inevitable cloudflare failure doesn’t bring down the customer’s site but their automated processes check registrar entries and disallow any alternative DNS servers. Anycast doesn’t make a difference when cloudflare fails to resolve entries and that is the situation where one would benefit from having 3rd or 4th non-cloudlfare DNS servers on their domain.

    Overall CF is a fine idea, and it combines many previously existing technologies into a ‘one stop shop’ sort of product. However, the requirement to have all DNS run through their service for a TLD implementation but then not offer an SLA equates to quite high risks to the customer when things go wrong – and which have been going wrong far too often for a core technology that they’re selling. Overall my experience is that CF has good ideas but their implementation leaves much to be desired and many items still to fix. Cloudflare needs to fix their basic services and offer an SLA to Pro accounts or remove that account type as being labelled “professional” since sites engaged in professional services generally want their sites online and expect something for their money aside from “oh sorry, but that $25/m doesn’t mean it will work all the time”.

  5. Bryan says:

    I agree 100% especially considering their recent 100% network outage due to a misconfig causing a juniper bug. The foundation of their product starts at the dns level so they do not permit AXFR or secondary dns. Also it would contradict their business model as a security and acceleration offering. If they wanted to offer a dns only product they could but they are in another niche. The best combo I’ve found is linode dns + dns.he.net (secondary) or AWS route 53.

    But what would they offer in the SLA ? It would prorate to pennies for downtime.

  6. Unknown says:

    hi, I really experienced some difficult time using CloudFlare, I think CF did hijacked my DNS, even after I changed my personal DNS setup. I m not sure how it did it yet, but I think CF is not doing right.

  7. tester says:

    hi, I really experienced some difficult time using CloudFlare, I think CF did hijacked my DNS, even after I changed my personal DNS setup. I m not sure how it did it yet, but I think CF is not doing right.

  8. Gary B says:

    Thanks for sharing this article. I agree that it is crucial not to have a single point of failure. I have used DNS Made Easy in combination with a secondary DNS provider for years, which has worked out well. Never had any problems.

  9. Gary says:

    Cloudflare was a huge headache for me. Every day I was getting DNS errors stopping my pages from loading for about an hour. It was a pro account, cancelled it yesterday and still trying to get straightened out today. There page says its the hosts fault and the host says they can find an issue. Bye Bye Cloudflare.

  10. Greg says:

    Even if Cloud Flare fixed their dashboard to not detect third party name servers incorrectly, that really won’t improve your uptime in many cases.

    The reason it won’t improve your uptime is that when cloud flare is having trouble, customers will get error pages from Cloud Flare rather than your third name server being used.

    Browsers aren’t smart enough to request from alternative name servers when they get bad results. The only time the third name server would get used is when Cloud Flare doesn’t respond at all.

    Basically, we’re talking about a feature that would provide 20 minutes more uptime over the past 2-3 years? If 20 minutes of uptime is really, really important to you, then you should be paying more than $20.

  11. Glenn says:

    Your webhost could / should help here. If they are a cloudflare partner – you can utilize THEIR DNS servers

    Then should an outage happen – simply disable the cname and your back up and running in seconds.

    Tons of hosting partners make this available.

    :-)

  12. admin says:

    That may work for some users but in this particular case my infrastructure was the host; eg multiple global datacenters with lots of bandwidth and our own DNS server for backup with Route 53 as the primary DNS.

  13. admin says:

    I agree with all of that, except if Cloudflare is going to name their accounts “pro” or “business” or similarly terms then they should include features that professional/businesses actually need. With that naming convention from other vendors you generally get an SLA and not a response of “yeah sorry, it’s down and we don’t credit or really help much with trying to get your site online at all”, types of responses from support. My uptime was significantly worse due to using CF services so it’s not even worth the $20 or $200 for business level. If one is actually under a large DDOS then CF is reportedly good, but their standard/pro/business plans are basically a waste of money IMHO.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>