Posts

Showing posts from February, 2020

How to Manage Fault Tolerance with Availability Zone and Regions in AWS

What are Regions and Availability Zones in AWS AWS infrastructure is available across various geographical region.  Usually regions are across continents of countries but for large countries like US there can be multiple regions.  Each regions is divided into 2 or more Availability zones which are nothing but collection of data centers which are connected to each other with high speed connections. These data centers are separate units which are located away from each other so that if any calamity happens around one of them the other data centers can take care of the load.   These data centers are located in highly secure buildings with power backups as well as multiple network connectivity. How Regions and AZs Help in Achieving Fault Tolerance Availability zones constains data center which are insulated from failures in other data centers.  If a particular data center goes down then also application will not go down as other data centers are insulated....

How Auto Scaling and Elastic Load Balancing can help in Designing Fault Tolerant Systems

What is Auto Scaling Auto scaling feature enables to automatically increase or decrease the number of EC2 instances depending on the load or other factors.  Usually autos scaling done with the help of Cloud watch.  When any threshold such as CPU usage breaches the predefined limit, Cloud watch alarm is triggered.  This can help in terminating problematic instance and launch new instance. Auto Scaling feature can be configured to recognize many symptoms of impaired application or detect failures and based on those launch new EC2 instances. What is Elastic Load Balancing ELB The purpose of ELB is to balance the load across different EC2 instances.  EC2 instances which are managed by ELB could be within same Availability Zone (AZ) within same or different subnets or across different AZs also.  ELB ensures that the load is spread across the EC2 instance evenly or proportionately decreasing the chances of excessive load on a particular EC2 instance. Often E...

Role of AMIs and EBS in designing Fault Tolerant Systems

What is Fault Tolerance? Simply put it is the ability of the system to be in operational or be available even if at least one or more of its constituent systems fail.  So what happens if a system is not fault tolerant.  Well, the price of being down depends on the business.  In some cases it may cost few millions for few minutes of downtime. How AWS can help prevent or reduce Downtime? AWS has tools and techniques to build fault tolerant systems which can be automatic or requires minimum human intervention.  There are many such tools in AWS but in this article let us look at the role of AMI and EBS. What is AMI? AMI is Amazon  Machine Image which is basically a prebuilt software configuration which can be applied to EC2 instance.  It can include Operating system, Application servers and application apart from security patches. As an enterprise it is important for you to create your own library of AMIs which can then be used to instantiate EC2 ins...

Some Important Architectural Considerations

As a Cloud solution architect when considering cloud migration here are some important considerations and questions to ask - Disaster recovery For how long can the application be down?   Do you need the application to be up 24/7 no matter what the cost is?   What are the possible locations from which client is willing to operate.   Is it only a single state/single country or having locations across the oceans or continents. For backup sites do you want it to be active/active in which case application can be recovered in matter of minutes by switching to another site.  Or do they want to keep snapshot of database backups in which case it may need more time like few hours to recover. Business Continuity Can the business be offline for few minutes/hours/few days in a week or Customers use it 24/7? Cost How sensitive if customer towards Cost.  Are they ready to compromise of Disaster recovery time or is it other way round. Is there a...