How to do a bad comparison of cloud vs colocation
There's this post by Ahrefs , a SEO company from Singapore, which is getting quite a lot of shares recently.
While nobody ever said that the cloud was cheap, it's a terrible comparison. Basically, the article enumerates what they have bought (a number of Dell servers + network equipment and colocation costs) and roughly translates those to AWS cloud services.
Some problems:
- Apparently, all of Ahrefs servers lie in a single datacenter. Good luck with disaster recovery and availability.
- The cloud is nice because, you know, it's a cloud. Scalability. Flexibility. On-demand resources. Nobody forces any customer to always rent 850 EC2 servers. Ahrefs could have used EC2 machines and cloud databases with autoscaling to pay less without getting performance problems. They could have used completely managed solutions (e.g. Lambda) to totally forget about the need to manage a server, and to get a dynamic pricing.
- No operational cost is included. It's true, cloud engineers aren't cheap; but a few of them can automate and manage quite a large fleet of cloud resources. On the contrary, deployment and maintenance of such a large hardware fleet is probably more complex. What was the setup cost?
Don't do those comparisons. Choose to compare a scenario, instead, and put some numbers to estimate the risk of problems, the cost in case of downtime (which is quite easy to prevent on the cloud), and the availability of relevant resources (are you keeping your datacenter technicians on payroll even when they don't have any work to do?).
The cloud will almost inevitably come out more expensive than raw server/colocation costs, but quite often it's going to be cheaper than a lot of "enterprise" managed servers that I've seen in my life (like, some agency managing servers in a DC for a customer).
Towards the end of the article, Ahref acknowledges that there are tradeoffs for cloud vs on premise, so I think the first part of the article is even more insincere. They know what they're doing: it's an hybrid approch where the predictable workload is kept in a private DC, but probably there's an underlying cloud infrastructure that, if everything goes wrong (e.g. the DC burns) would be able to take 100% of the required workload. Quite a different story.