- 50 months ago
Merry Christmas everyone!
What a day. It was fun (mostly), but I'm glad it's over.
I was up until about 1:30am last night finishing up wrapping presents (yay procrastination). My youngest child woke me up at 5:30am this morning screaming out asking if Santa had come. "Yes, he did. Please go back to sleep." (Last night we told him Santa wouldn't come unless he was in bed. I've never seen him run to get in bed so fast in my life.)
Then we had the Steam debacle. Wow. I... I can't even imagine being on the other end of that. Doing web based stuff, I can imagine some bad failure modes where things like that could possibly happen. But that's a nightmare. Seeing stuff like that makes me incredibly thankful we don't ever do anything with credit cards, addresses, personal information, etc.
Then later in the afternoon I got the ping that our site was inaccessible. Sure enough, all of Linode's Dallas datacenter was essentially offline. We serve a good portion of east and west coast traffic, so the bulk of our infrastructure splits the difference and sits in the Dallas Linode region. Today it seems amidst the Steam DDoS and other service DDoS attempts, Linode Dallas also was DoSed. It didn't last too long (maybe 15 min?) but it had some lingering effects that made the site pretty sluggish. I checked what I could on our end and things looked ok. So thinking the coast was clear I went on a bike ride with my kids. At literally the furthest point on the loop from the house, I got another notification that the site was inaccessible again. So we rode back, and my oldest's bike chain fell off. Literally three times. Oh no worries, no rush... lol. Got home, investigated more, poked around a bit, and got things back to normal operational status and performance again.
I think it's safe to say we're probably going to migrate off of Linode in the near future. In the beginning (for us) they were pretty robust with minimal downtime. But in the last year we've seen a number of issues particularly in the Dallas datacenter. A few years ago, AWS and US-east had lots of major outages, so we opted not to migrate there then. Since then their robustness has improved (but still isn't perfect). The difference though, is that when Linode Dallas goes under, people think it's your fault. When AWS US-east goes under, half the internet is failing and everyone knows it's Amazon's fault and not yours. But I think it's about time that infrastructure-wise, we start looking at what it would take to be able to survive AWS availability zone outages and keep running. That's a pretty big shift, but could be fun to do.
So now the site is running smooth again, the kids are in bed, and the day is almost over. Phew. About time to start working on the code again, gearing up for the next big go-live code push.