top of page
Search
  • admin-head

Redundant Cloud Services? Really?


Twin towers depicting the nature of 1:1 redundant cloud services
Twin towers depeciting the nature of redundant cloud services

Yes, because data loss is a real threat that has been known to cost businesses millions of dollars per incident. The modern technology landscape and the rapid shift to always-on operations makes it critical for organizations of all sizes to have redundancy in their cloud services and backups. Oh and we're just here to provide some hard facts about the cloud, and some softer ones too.


Do you really need to spend more on redundancy?

Maybe, Because that depends on a lot of factors like the criticality of the hosted application, compliance requirements, uptime requirements, business requirements, complexity involved, etc.

Some general examples:

  • Geographically redundant database? Probably not.

  • Critical database or web application in a single EC2 instance? Yes.

  • Critical web application or authentication servers being backed up in the same data center as they are physically hosted in? Definitely yes.

But isn't the cloud supposed to be redundant and always-on?

Yes and No,


The cloud is designed to give the highest uptime possible and its services are built in a standardized manner to help organizations replicate across the cloud service provider itself to reduce downtime in the event of a site or service outage but that doesn't mean every service is redundant by default.


A CSP (Cloud Service Provider) designs their networks and infrastructure to be highly resilient and most of them do provide some form of protection against failures but it only extends to the infrastrcture underlying the service, not your data. It is a common practice that a CSP will expressly state that the redundancy or data protection of a said workload is the responsibility of the customer. CSP's operate on a "Shared Responsibility Model", under which the customer and the CSP jointly share the responsbility of availability, redundancy, performance, etc. of the workload and any data related to it.


A CSP can, and will, redeploy your services if a host failure occurs but it cannot restore your data kept with them in case it was encrypted by ransomware or was accidentally deleted, atleast not without buying addtional services like cloud backup from the CSP or a thrid-party vendor.


Here's a similar example of what's provides redundancy by default and what doesn't,

A geographically distributed database will provide redundancy from many forms disasters like, networking outages, entire site or region failures, etc.

In contrast, a single compute instance will be running in a single datacenter on a single host, providing zero redundancy. It doesn't protect against networking outages, and it surely doesn't protect against site or region level failures.


Fires, power outages, networking outages, site or region level failures, etc. do happen and recently, the data center industry has been plagued by them. The current fiasco at Google Cloud's Paris Data Centers (europe-west9), Maxnod data center fire, the OVHcloud incident in 2021 all prove that site wide outages do happen and cause significant financial and reputational damage to organizations without redundancy in their workloads or backups.


The GCP outage knocked the entire europe-west9 region offline for 11 days. Imagine your singular database instance was hosted there? Now what if the cloud backup service your organizations uses is storing your backups only in a single region with no geo-redundancy (8 out of 10 online backup providers do this)?


Keep in mind, most cloud storage services or cloud backup providers do offer replication but that is intended to protect only against disk level failures on the provider side, it provides zero protection against site level or region level failures. Geo-redundancy is available for a lot of storage services but it is provided as an add-on and billed seperately due to the associated costs. Although, newer providers like us are baking it right into their solutions, because when all is said and done, every extra mile we go, counts.


What's the net result of not having redundant cloud services and backups ?


Your critical data stays unsafe, in fact, two organizations lost a critical amount of data to the OVHcloud incident which lead to millions in damages because their backups and disaster recovery services were not geo-redundant. The fire in the SBG2 region knocked their production workload offline while also erasing all of their backups.

This incident created a lot of fire (get it?) in the data center industry and while CSP's have largely recovered from the incident the key takeaway remains,


Not having redundancy in your cloud services and backups is a recipe for disaster.

Organizations are moving to cloud services at a record pace but they have yet to realize that cloud doesn't guarantee protection against data loss. It's built to be resilient, not to protect your critical data from every possible threat. A fire will take out your clusters worth of data very quickly, a water leak will fry even the best of UPS systems out there. These are, and will be events that no one can account for, you cannot predict site failures or when data loss will occur but you can plan accordingly.


The GCP Paris incident combined with other major incidents like the Maxnod data center fire serve as proof that you cannot rely on non-redundant services for critical applications, databases and backups because non-redundant services work well until they don't, which is why we follow a redundant by design architecture for building and deploying our services like BootCloud | BNR which protects your backups against network outages, disk failures, site level and region level failures by providing geo-redundant backups by default.


We'll close with this,


Most organizations today are aware that migrating to the cloud isn't a small undertaking and it requires careful and thoughtful planning with dedicated cloud architects guiding them in their digital transformation journey. The planning stage is critical to an organizations cloud migration or expansion success as it defines their IT infrastructure and the services they'll use for years to come. Remember, the cost of redundancy is often much lower than the cost of data loss, which is why at BootCloud we design our architecture by planning for failure because with cloud services it's not a matter of ‘if’ failure occurs, rather it's a matter of ‘when’ failure occurs.

21 views

Recent Posts

See All

Comments


Commenting has been turned off.
bottom of page