Beginning December 25th, our main upstream provider (Linode) began experiencing large DDoS (Distributed Denial of Service) attacks aimed at their network. To give you some additional background; Linode have a presence in multiple data centers around the world. The attacks that began on December 25th were targeting every single network link they had to each of their data centers. In short, this was a directed attack to try and take Linode down across the world. As a result of the attack targeting Linode, anyone utilising Linode’s network would also be affected and also taken offline.
At this point, a very significant part (read: almost all) of our infrastructure was affected due to how we utilise Linode. This included our client area/website as well as our shared and semi-dedicated hosting servers. It should be noted that not all of our infrastructure utilises Linode, and we have spread out parts of our infrastructure as much as we reasonably can without complicating matters further, but these attacks certainly had a significant negative impact on our services.
Once the DDoS attack(s) were detected, they were relatively quickly mitigated (stopped) by Linode and service was restored. Unfortunately, these attacks continued for several days in what I can only describe as a game of cat and mouse, with Linode stopping the attacks only for them to start again shortly thereafter. While I am sure Linode were doing their best, sadly we saw no immediate or positive signs that they were able to respond and contain the problem on a permanent basis. As the downtime continued we knew we would have to make a choice between waiting to see if they could permanently stop the attacks or to migrate to a new provider who were not under attack and subsequently distance ourselves from the attacks altogether. With already significant amounts of downtime being experienced, we made the decision that, very regrettably, we would have to move to a different network provider in order to hopefully avoid further downtime. After all, we had no idea how long these attacks could last and postponing any migrations could be extremely costly not just to ourselves but also our customers.
Approaching January 1st 2016, the attacks appeared to have been largely mitigated with them hitting primarily the Atlanta data center and parts of Dallas. While the UK networks and some of the US locations were no longer under attack (or the attacks were being fully mitigated), we began migrating customer accounts to new servers that we had already setup with a new provider in another location. We did this as quickly as we possibly could, as we were unsure whether or not the attacks at these locations would return. Unfortunately for us, 2 of our US shared hosting servers were based in the Atlanta data center that was heavily under attack. We had no access to these servers to migrate accounts directly, and the servers would only come up for very short periods of time – nowhere near long enough for us to reliably transfer any data to the provider we had waiting on standby.
The backups that we regularly take of our servers are block level backups, which can be used to perform “bare metal” recovery. That is, if a server were to have a catastrophic hardware failure and lose all data, we would (in theory) be able to restore a completely identical copy of the server at the time of the last backup snapshot. Unfortunately, these backups only really work where you are restoring to an identical server configuration as the previous server. That is, the same specifications and same overall configuration. Because we had moved to a new provider, these backups could not be restored as bare bones and we could not use them in this fashion. This meant that we had to wait for access to the servers that we had in the Atlanta data center.
Towards the end of 3rd January 2016, Linode began to mitigate the attacks long enough for us to begin moving data out of the Atlanta network across to a new provider. We are still in the process of migrating the final accounts across to new servers now, but I can happily report that most customers in Atlanta have been migrated across. For the time being, the Atlanta network does not appear to be under attack any longer, but as a precaution and to fit in with our new infrastructure strategy, we will continue with the migrations until we have moved all customers in all locations across to our new provider.
Why did we continue to migrate accounts to a new provider once the attacks have been stopped?
This is fundamentally a combination of decisions. While the attacks were obviously our primary motivation, we had already made the decision to migrate to the new provider and had to see that through regardless of the outcome on Linode’s side.
What are you going to do to ensure that this doesn’t happen again?
While the DDoS attack was aimed at Linode’s networks specifically, and not us, we are by no means going to lay all of the blame on Linode. We have certainly learned a lot throughout this situation and have identified and hopefully acknowledged some of our own shortcomings. We know we can do better and are going to investigate options in the future that can help make us much more resilient to not only attacks like this, but other potential points of failure in our infrastructure.
Our new provider, theoretically, brings us several significant advantages. It is much, much easier for us to perform bare metal recovery tasks now. What this means is that should we ever experience a situation again where we anticipate lengthy downtime (either due to DDoS or other reason), we would be able to relatively quickly spin up a new server and perform a bare bones restore much faster than before. There would be some post-restore tasks (such as updating DNS) that would need to be actioned to get sites back up and running, but overall we are much more in control of our backups than we were previously.
Finally, our new provider specifically offers and advertises DDoS protection to their customers. While there are limits associated with this, and it in no way pretends to be able to prevent or block every single DDoS attack out there, Linode in contrast do not provide or offer such a service to their customers. While we were not the target of the DDoS attacks that began on December 25th, the fact that we have the option for some DDoS protection does indicate that our new provider are in a better position to be able to respond to such attacks if needs be in the future.
There were multiple faults from several sides that led to the problems many of you experienced, and we are working to identify and improve upon in these in time. Going forward we will be investigating ways and software that may assist us in distributing customer sites across multiple geographical locations to provide further redundancy. I can’t say much about this at the moment, but rest assured if this is something we see as viable in the future we will certainly let our customers know.
I would like to apologize for those affected by this recent downtime. An attack of this nature is extremely rare and something I think a lot of people in the industry; us included, have never seen before. We have learnt a lot from the situation and hope to implement some positive changes to make us more resilient in the future.
If you have any questions for us, please respond to this blog post and we will happily try to address them.