Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Netlify Status – CDN Issues (netlifystatus.com)
91 points by yabones on March 25, 2021 | hide | past | favorite | 85 comments


Netlify CEO here. I'll try to answer the questions from the thread so far:

Some of our customers are affected by an outage of Googles Load Balancer.

These customers are not taking advantage of our DNS management, or they are not using a DNS provider that supports CNAME flattening and are using their root domain name for their website (ie, no www prefix).

While we don't recommend the setup, we do provide a single IP address to bind an A records for customers that want it.

In general we run our edge infrastructure as a large multicloud setup spanning several different network providers, and offer two separate networks, one for free/self-serve customers that will get newer features faster and one for enterprise customers running mission critical projects where we guarantee very high uptime and reliability through formal SLAs.

The single IP mentioned above however corresponds to a Google Load Balancer, and they are unfortunately currently having an outage for all load balancers in the relevant region. Read more on https://status.cloud.google.com/

Again, while we generally don't recommend using the A name setup for anything mission critical, we are currently doing everything we possible can helping enterprise customers that have chosen this setup to change their configuration.

Really sorry for all the trouble this are causing for our users, full RCA will be forthcoming.


> These customers are not taking advantage of our DNS management

I think I understand the point you are trying to make, that customers who are utilizing Netlify DNS Management are unaffected because reasons, but this is phrased in a way that implies that it is your users fault for this downtime because they didn't chose to use your related service.


Full RCA with the steps the team has taken to improve this setup will be coming soon. The main issue with AWS's DNS solution, in this context, is that they don't support ALIAS records or similar techniques (CNAME flattening, etc) for A records pointing to any external provider. That limits our options a lot in terms of what we can do, since anyone using this setup need to point all their traffic to one or more fixed IP addresses.

Our current solution for the free/self-serve tier of Netlify has been to rely on Google's load balancer product to give people a stable IP pointing to a highly available solution. In light of recent issues, our team has setup a new permanent IP for A records (75.2.60.5) backed by a different solution, but due to the way DNS providers with no ALIAS record support work, it does require our customers to manually change their A records.

I totally get that moving DNS providers is a big deal and we want to give the best experience we can regardless of what provider you're on, but we have to work within the technical limitations of those providers and it's the nature of things that we do have more options to deliver a completely seemless experience when we operate both the DNS and the edge layer for customers.


Route 53 General Manager here. Flattening of external provider CNAMEs has a number of availability and accuracy risks. Route 53 offers a 100% availability SLA, and we really mean it. We’ve heard over and over from customers that reliability is our most valuable feature. We can’t provide that same reliability when external queries are in the mix; if we query asynchronously then features such as geo-based routing don’t work as expected for customers. If we query synchronously, then latency and availability are impacted directly.

We do offer ALIAS records between Route 53 hosted zones, and this capability is open to providers such as Netlify. We’d be happy to have customers ALIAS to a hosted zone managed and updated by Netlify. It sounds like your IP addresses are relatively stable, keeping these in sync doesn’t sound like it would be a big deal, and would give you a lever you could pull to change your customer DNS quickly in an event such as this. You could also configure health checks on your own DNS records, which any customer ALIAS records that point to your DNS records in Route 53 would inherit.

If you’re interested in going this route, please contact me at alecpete <at> amazon <dot> com.


If each Route 53 POP is already close to the querying DNS client, then things like geo routing with cached answers might just work well enough in most cases? With each POP having its own cache.

Auto-refreshing the popular records in the background before the TTL expires to help smooth over any temporary issues?

Other big name DNS providers have ALIAS type records. I imagine according to the SLA, AWS Route 53 is still "available", even if it can't resolve a "target address record" (as the ANAME draft calls them) but Route 53 is still able to respond.


Phrasing can always be better but the point is that there's a way to map your DNS to Netlify which is risky and Netlify hasn't made the aggressive decision of blocking it. They outline in their docs all the reasons why you shouldn't do it, provide instructions for how to avoid it and also offer (but do not require) a hosted DNS setup which avoids this pitfall by design.

Some folks still choose to use this way, some have no other choice for various reasons and some don't care/comprehend the potential pitfalls. I do believe most users avoid using a root domain name for their website.


> I do believe most users avoid using a root domain name for their website.

This is where you're definitely wrong.


I could be. Are you saying this based on data or intuition?


As someone who is a little clueless about network infrastructure: if I own "dwrodri.com", and I'm not running a bunch of other services which need to point to this domain, is there any reason why I wouldn't have my root domain pointed to my personal website?

I would personally imagine that any individual or SOHO business hosting their website on GitHub/GitLab would just buy "MomAndPopShop.com" and point it there. I guess I don't know off the top of my head how many of those sorts of places on the web still exist...


The problem is not that they're pointing their apex domain to a personal website; the problem is that they have a CNAME record in place for their apex domain, which is not actually allowed per the DNS standards


Sadly, even after switching to their DNS I am still affected.


This should not be the case; if you'd like, Netlify's Support team will be happy to review your settings to help discover why it didn't help you out (start from https://netlify.com/support) and ensure that you are "futureproofed"!


I can heartily recommend contacting _fool for support at Netlify. Always an absolute pleasure.


I switched to using your DNS to resolve this issue, but https://js.la is still busted and because I'm using your DNS, I can't manually set the A record to go to the workaround IP address.


Hi Bob, just want to say, I like your service a lot.


Thanks! Appreciate the kind words!


Seconded, I use it for all my static hosting. Great service.


"These customers are not taking advantage of our DNS management"

You're right. I'm using Cloudflare's DNS. I trust them more than I trust Netlify and that's just a function of their size vs Netlify's size. This response needed better wording.


Cloudflare DNS supports CNAME flattening and you won't be needing the fixed IP address if setting up DNS with them.


More details for folks who are curious about optimal config using Cloudflare's DNS hosting, can be found here: https://answers.netlify.com/t/support-guide-which-are-some-g...


Depending on your config, another DNS related issue with Netlify is the way NS1.com (their vendor) handles domain names. A domain can only be added to one NS1 account. So if Netlify adds to their account internally, you can't use NS1 and vice versa.


Are you all having shake ups within the company? I'm not going to deep dive, but I heard some rumors about some higher ups leaving.

After the Cloudflare Pages release, I'd be curious of what your future road map looks like and how you all plan to compete and grow.

Thanks for all you and your team does. What you have done for front-end development and the community has been nothing but awesome and inspiring.


Honestly, "not taking advantage of our DNS management" is a garbage response. We use AWS for our DNS management. If you offer a configuration, you should support it fully.

Our sites have been down for 3 hours now, and you're blaming someone else? We have 5 properties on Netlify now and will have 0 this time next week.


> Our sites have been down for 3 hours now, and you're blaming someone else?

Well if the issue is at Google then maybe "blaming" isn't really the right word. No need to be rude.

I might as well make the same argument for your sites.

- Your sites have been down for 3 hours now, and you're blaming someone else?


Yes, it is our fault for believing Netlify had contingency plans as hosting is their core business. We're fixing this mistake now so that our customers don't have the same experience.


By the same line of reasoning, your customers could be faulted for believing you had a contingency plan.


Nobody is telling parent's customers how to feel. But the OP suggests that Netlify customers should be faulted for choosing the the wrong setup. Broken trust goes all the way down the chain, which is why the middle links have every reason to get ticked off.


The difference is that Netlify communicated the risks to its customers, something other parts of the chain apparently did not do, in addition to not evaluating the risks presented to them by Netlify.


Did you read the docs [1] before writing this? Putting a "(recommended)" on one branch of configuration instructions isn't the same as saying that the other option has a single point of failure. Also, people on both sides of a service don't have the same responsibilities - that's the whole point of the service.

Communicating about risks OR outages are both hard, and every company has both. I'm actually a happy (though impacted) Netlify customer. But it's completely bizarre to me to try to invalidate this customer's complaint.

[1] https://web.archive.org/web/20200303050851/https://docs.netl... (search "flattening")


Yes, I’ve visited that page before today. I admit my familiarity with these DNS setups may have made the tradeoff jump out at me. No problem invalidating the complaint.


Point your apex domain to 75.2.60.5, Netlify recommends it here [0] and in their documentation now [1].

I just did for a site that's hosted by Netlify and it solved the issue. Thankfully I had a short TTL, I hope you do too.

[0] https://www.netlifystatus.com/

[1] https://docs.netlify.com/domains-https/custom-domains/config...


I'm not sure your organization's setup with Netlify but isn't the whole point of Serverless to be... "serverless"? I could migrate twice the amount of properties you have to another provider in less than 3 hours...

I get your frustration but maybe cut some slack. If anything is mission critical, you should have had a backup plan if Netlify, Vercel, Cloudflare, or something else.


We use(d) Netlify for the frontend. I agree, our mistake was believing Netlify could be used for more than toy websites and took care of backup plans for us. Clearly they do not.


I do believe you to be trolling now by saying that. If not, congrats on the valuable lesson!


Not trolling, just very frustrated. But yes a valuable lesson.


What's keeping you from migrating your frontends? Shouldn't that take a couple of hours at worst?


It's not just migrating the front-end if they're also using other functionalities like Netlify functions, forms, authentications etc. Netlify is not just static file hosting.


This could've been avoided with an HTTP LB, vs a L4 one...


This has been the third major outage for Netlify in the last few weeks.

I like the company, they have good people on their team, and their interface and functionality is great (deploy previews are so nice!).

But this is probably the last straw, as the static portion of our company's website has been down for 45 minutes now.

Fortunately, the beauty of a static site is they're quite easy to host anywhere.

We're already on AWS, and it's easy enough to set up CloudFront. It won't be _quite_ as quick to deploy but it will probably rarely if ever break. Guess that's my task for the day :(


We're in the same boat, and when Netlify blames an "upstream provider", what I hear is that they don't have a backup plan.

They're giving us plenty of time to research alternatives while our site is down though.


Based on the comment from bobfunk, it seems they do have a backup plan but not all customers use a configuration which takes advantage of it.


A backup plan that does not service apex domains is not really a backup plan.


It sounds like it does handle apex domains, but only if you're using Netlify DNS or a provider which supports CNAME flattening. Assuming the potential problems with not doing so are disclosed during setup (not sure if they were) that actually seems pretty reasonable to me.


I run the Netlify Support team, and this statement from @michaelmior is correct: apex domains are served using redundant, global CDN if you use Netlify's DNS hosting, or Flattened CNAMEs from Cloudflare.


It seems that a lot of people didn't have backup plans either. I'm not sure if it's a good idea to rely on just one provider for something critical.


You should checkout Cloudflare Pages. For static stuff it’s a dream to setup, and you get previews out of the box.


The main advantage that AWS will have for us over anything else is that, since AWS already manages our DNS, we are going to be able to offer our visitors the best performance by using geo-specific IP addresses.

The static site in question for us lives at the apex record (mywebsite.com), so it's generally not possible for other providers to do this without having them manage our entire DNS infrastructure, which we aren't willing to do.

In fact I think this is part of why we've had so many issues with Netlify. It's clear their preferred way to host apex domain sites is to manage the DNS completely.


I think cloudflare can do it. They give you the option to set A records for your domain, in your own DNS.

Cloudflare runs an AnyCast[0] network and multiple peers, so even through your using static IPs, the traffic will still get routed to Cloudflare nearest PoP, and pages is served by their edge network, so your site will be served from the location nearest to your customer. All without DNS shenanigans.

[0] https://www.cloudflare.com/en-gb/learning/cdn/glossary/anyca...


Cloudflare pages is in beta and right now lacks many features that Netlify has, while also containing some showstopping issues (https://developers.cloudflare.com/pages/platform/known-issue...)


Why wouldn't CloudFront be quite as quick, out of interest?


I think they're talking about the setup/configuration time. Netlify is pretty much one click.


This got me today. Probably could be characterized as the classic case of: Company says a certain use case is unsupported, but tries hard to accommodate users who are stuck with the unsupported use case, so they hack up a decent work around under the hood. Then the technically unsupported use case blows up, so they then have to scramble to support it with a quick work around...for the workaround.

The workaround worked. I think at this point it makes me more likely to keep using Netlify. I love the product. And I think I love the support for unsupported un-recommended feature that they supported today.

Thanks Netlify Ops!


Update on the Netlify Status page [0] -- TLDR anyone experiencing this issue should point their apex domain to 75.2.60.5

---

Full announcement:

Our team have created a new load balancer instance which is not associated with the upstream provider who is currently experiencing issues. Please update A record values for your site(s) bare domain to 75.2.60.5 to mitigate against this outage.

---

Their documentation page [1] now includes the same IP.

[0] https://www.netlifystatus.com/

[1] https://docs.netlify.com/domains-https/custom-domains/config...


thanks for the TLDR - i pointed it over and can verify that it fixes the problem. annoying to wait out the caching on mobile devices though, i am not sure how to clear DNS cache on mobile but am not too bothered.


"We have identified the issue and it is attributed to an upstream provider."

The upstream issue is probably at Google:

"We are experiencing an issue with L4 load balancers in us-west1-c. Multiple managed services relying on LB and located in this zone might be affected."

https://status.cloud.google.com

This has been going on for at least an hour.


"To make sure you can minimize the impact of our single-homed loadbalancer being down"[1]

Interesting. I'm surprised that's how their CDN works.

[1] https://answers.netlify.com/t/support-guide-minimizing-impac...


This hit me, as someone using A records to point to the Netlify IPs. If you are using GitHub, I found switching over to Github pages quite easy for my static site. I used this guide: https://docs.github.com/en/github/working-with-github-pages/...


What is Netlify?

"An intuitive Git-based workflow and powerful serverless platform to build, deploy, and collaborate on web apps"


A quick infrastructure service (from what I understand.) If you are building a JS single page app and want to be able to deploy it and some backing cloud functions without really worrying about CDNs, gateways, etc. Git push, code runs tests, code is deployed, done.


My startup has also suffered from the recent Netlify outages with our main landing page.

Last time I already did the research for potential alternatives:

- Cloudflare Pages is now available in public beta

- even more interesting seemed this offering by PerfOps to put my CDN behind a Load Balancer that can monitor uptime and dynamically shift traffic between multiple CDN sources: https://perfops.net/flexbalancer

What do you think?

- it seems like the multi cloud approach to CDN

- but at the same time I'll have a problem if this Load Balancer fails (single point of failure)


> CDN behind a Load Balancer

This sounds crazy to me. Besides the obvious superfluous network/layer hops, complexity and points of failure it would also partition the cache, right? So working against the very thing CDN's optimize for.


There's also Vercel and Firebase ( Hosting ) which are stable ( not in beta as Cloudflare Pages).


I wish Netlify all the best! In the mean time, I just hopped on to Cloudflare and saw their Pages product is in public beta. Seems to work the same as Netlify for static pages, just tried it out for my personal site and it worked great! I was already using Cloudflare as my CDN and to manage DNS, it's actually really nice to have my entire website configuration live there.


oof, I got bit by this issue this morning. if you're using cloudflare, set your domain's apex (`@`) as a CNAME pointing to the default subdomain (sitename.netlify.app) and use CNAME Flattening. It's the A record pointing to the CDN IP address that's broken.


Hmmm, lots of recent outages.

Has anyone else noticed erratic response times in recent months? My web vitals score sometimes dips heavily because "response time from server" (or whatever it's called).

Anyone can recommend another place where I can host? (except Vercel, which has similar results)


Cloudflare Pages, begin.com, fly.io, and surge.sh are a few I've come across as alternatives (depending on what features you're after).


Firebase?


Whom are they using? In 2021 for static sites behind a CDN being down is... odd. Pretty much all CDNs by now should support equivalent of serve stale.


think this is a DNS issue, not CDN. bobfunk noted it is Google, scroll up


So they implemented their own CDN-ish thing on top of GCP without doing anycast and serve stale and they have a non-trivial number non-mom and pop customers?!


It must be very stressful to run a production hosting company - DigitalOcean and Cloudflare both went through similar issues in the early days.


No comment.


Glad you overcame it all in the end - we're happy users, especially given the outstanding uptime for the past few years.


They and Vercel both seem to be going thru a lot of growing pains. Quite a few outages over the past year.

PS: we use both (for different sites). Probably should consolidate to one.


I always assumed there is an 'underwriter' (like fastly or akamai) for these CDNs. Is that not the case?


I know Vercel uses AWS Lambda's behind the scenes to process web requests at least. I'd assume caching is also handled through Cloudfront by default. The place I currently work uses Fastly for caching and Vercel for hosting and it's definitely caused some issues(and much finger pointing on both of their sides) when one of those services makes a breaking change.


Perhaps they're using CloudFront and providing an end user CDN on top of it. Both are partners with AWS. Both were multicloud, and it seems, aren't anymore.


I’m not sure the future in this space, but I kind just wish Cloudflare and Vercel would join forces. It would make sense to me.


Cloudflare is already on the way to building out all of the features Vercel has which is exciting. Eventually their Pages product (static hosting) will integrate Workers [1] for serverside APIs.

[1] https://blog.cloudflare.com/cloudflare-pages/#oh-and-one-mor...


They lack a framework and crystal clear way to start developing apps. Nothing exists that taps their features beyond maybe Flarereact.


Shameless plug, I can suggest StaticDeploy (https://staticdeploy.io/) as an open source, self-hosted alternative to Netlify, which can give you a similar deployment workflow.

It's definitely possible to just host directly on S3/CloudFront, but StaticDeploy sets you up quickly with a workflow and a dedicated management interface.

Disclaimer: I'm the main developer of the project.


Honestly, don't do this. Netlify is having a bad day and it's not fun for them. The great wheel of karma turns around slowly and one day it'll be your turn to have a bad day.

Submit StaticDeploy to HN some other day and tell us about it. Sounds cool.


Cloudflare CTO telling off the open source project guy, while ignoring all the other comments suggesting cloudflare?


There's a difference between self-promotion and other people making a suggestion.


Thanks for pointing this out, I admit I did not consider their point of view (the plug was shameless, but in the self-promotion-is-inherently-shameful sense), and I agree it's in bad taste.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: