Role of AMIs and EBS in designing Fault Tolerant Systems

What is Fault Tolerance?

Simply put it is the ability of the system to be in operational or be available even if at least one or more of its constituent systems fail.  So what happens if a system is not fault tolerant.  Well, the price of being down depends on the business.  In some cases it may cost few millions for few minutes of downtime.

How AWS can help prevent or reduce Downtime?

AWS has tools and techniques to build fault tolerant systems which can be automatic or requires minimum human intervention.  There are many such tools in AWS but in this article let us look at the role of AMI and EBS.

What is AMI?

AMI is Amazon  Machine Image which is basically a prebuilt software configuration which can be applied to EC2 instance.  It can include Operating system, Application servers and application apart from security patches.

As an enterprise it is important for you to create your own library of AMIs which can then be used to instantiate EC2 instances.  You can then decide the OS, patches and application servers and applications needed for that AMI.  The EC2 systems created from these AMIs can then be used in production systems as well as in lower environments.

This is one of the first step towards building fault tolerant systems.

When any server or EC2 instance goes down you can create a new instance by simply launching a new EC2 instance based on AMI.  Depending on the AMI it can be a matter of minutes before the new System is up and running.

Sometimes you can keep a spare instance running to avoid problems due to the startup time of the  EC2 system.  When any problem happens the private IP of the problematic system can be assigned to the new instance.  You can also remap the elastic ip address to the new EC2 instance.

The ability to quickly replace impaired system by launching a new instance based on AMI is one of the critical steps in designing fault tolerant systems.

 What is EBS?

EBS or Elastic Block Storage is used for persistent storage.  EBS is used to host databases, file systems of raw data storage.  They are highly reliable.  To make it fault tolerant you need to create snapshots which are point in time copies of the storage.  Snapshots can be used to recover or create multiple copies of an EBS volume.  By default snapshots are region specific and EBS volumes are AZ specific.  The replicas can be created in new availability zones or regions by copying the snapshots to other region.

While creating snapshot, I/O should be stopped on the volume.  Also in memory data should be flushed to the disk.

How EBS helps in Creating fault tolerant System

EBS data is stored redundantly automatically by AWS.
Further, Snapshots are store in S3 which is highly fault tolerant.

A backup strategy is needed which will regularly backup data and create snapshots.  The interval could be hourly to daily to weekly.

A retention policy is also needed which will dictate how long the backups or the snapshots will be retained.

Related Readings

Amazon AMI
EBS - Elastic Block Storage







Comments

Popular posts from this blog

Some Important Architectural Considerations

How to Manage Fault Tolerance with Availability Zone and Regions in AWS