I very much wanted to take the time to write a blog post regarding the recent downtime experienced by users and to give you more information about the (yet again) server migrations. Firstly, account moves/migrations again.
Approximately 4 months ago we started evaluating our options in terms of both server performance vs currently available server hardware, and the cost of this performance vs the cost of what is currently offered. As you are probably aware from the consumer market, hardware is constantly changing and server hardware too is becoming increasingly more powerful as the years progress. Not only does server hardware become more powerful, it also becomes cheaper over time – in no small part due to the decrease in power consumption of components. After this evaluation we decided that we could achieve better server performance vs cost by performing a hardware refresh. In simple terms, this meant moving accounts from one server to a newer and more powerful server. At the same time as the hardware refresh we also made the decision to try a new (to us, not the industry) provider who could offer us a much more competitive pricing structure.
This new provider, whilst offering a fantastic service, had a significantly different operating structure and implementation than we are used to working with. I guess we had become accustomed to our old provider and the way they do things. After some time we soon realised that one of these fundamental differences was significant enough that it actually jeapordised the operation of our servers – and in turn, the operation of your websites. We knew then that we would need to migrate customers back to prevent what could turn into significant problems and downtime, further down the line. It had been some months since we had migrated our US shared servers away, and during this time our old provider had now increased their offerings so they could match the same server performance that we wanted. It was time to move back.
Several days ago we began moving US shared and reseller customers to identical specification servers with our old provider. We made the decision to do this without prior notification for two reasons. Firstly, time is and was a very important factor in performing these migrations (which are currently almost 50% complete). Every day that passes increases the risk of further downtime, so it was essential this started immediately. We felt that being contacted minutes or even hours before your account is being transferred is almost a kick in the teeth, as it were. Secondly, and perhaps more importantly from a communications point of view, the feedback we received from the last hardware refresh we performed overwhelmingly told us that you felt being told of possible downtime (which in a vast majority of cases is never even experienced), actually causes more panic than legitimate justification. That said, we have and always will send out post-transfer communications to inform you what has taken place and what may need investigating as a customer.
I honestly cannot convey how sorry we are that a lot of you have to go through transfers again. Believe me, we understand how difficult it can be to handle any post-transfer tasks and for that we are truly sorry. We made a business decision that simply did not work for us, and our customers (you) were affected. This is a huge deal and something we need to reflect on and learn from. This is not simply a case of “We have to transfer your accounts again. Sorry, it happens.”, but more a case of “We made a decision that negatively impacted a large amount of our customer base. How do we not do this again?”. Of course, before the hardware refresh occurred we spent a lot of time and research into our options. In the end, it was one of the minor procedures we overlooked as something that would be relatively trivial, that turned out to be our undoing. A mistake we will certainly try not to make again.
As a customer it is almost certainly perhaps unnerving and worrying to experience 2 server moves in such a short space of time. I write this blog post in the hope that it sheds some light on just why these transfers were performed, and puts any possible fears or caution to rest. This recent activity should not be indicative of our future uptime or downtime. We have no plans and absolutely no intention of doing any more transfers once these are complete. If you were concerned that ThisWebHost was heading “down the tubes” (a quote from a naturally worried customer), then please don’t be. We made a mistake and are in the process of trying to rectify it. This is a temporary situation and you can expect the same level of reliability and performance that you experienced many months ago.
To close this post I also wanted to discuss the downtime incurred on the last batch of site moves/transfers. We recently had a new server built and deployed. Typically these go through preliminary checks to ensure they are operating correctly and no obvious faults are found. It is, however, impossible to test a server in a production environment – that is to replicate the typical activity and workload of a server. After deployment and some batches of transfers we noticed that the server rebooted itself a single time. No cause of this fault was determined and we merely put it down to a one time event. Sadly, this happened a second and then a third time. After being unable to locate the cause of this we decided it was necessary to replace the entire server (with the exception of the HDD’s containing your data). Perhaps if this was an internal environment we would have attempted to replace and test individual components, but of course uptime and your websites are our priority, so we felt that replacing all components would be the best bet. The hardware has now been replaced and we are actively monitoring the server for any possible issues. At this stage we cannot say for certain that the problem is solved, things are looking very promising.
Why am I mentioning the above? I feel it is important to make the distinction that todays downtime, though closely related to the transfers of course, is not an indication of service to come. This is/was a hardware failure (albeit not fully diagnosed) and is sadly one of the things that can occur in this industry. As a small company it’s something we do not experience as often as those with hundreds or even thousands of servers, but it is indeed something that does happen from time to time. It is just very unfortunate timing.
I hope that this blog post answers some questions you may have had in the back of your mind. As always, you are always more than welcome to get in touch with us via the client area if you have any questions, comments or concerns. We are very sorry for the inconvenience of these transfers and hope that despite the recent setbacks, you will still consider this* at the top of your list when it comes to hosting services.