Disaster Recovery

Disaster Recovery

  • Disaster: any event that has a negative impact on a company’s business continuity or finances
  • Disaster Recovery (DR) is about preparing for and recovering from a disaster
  • Disaster recovery solutions:
    • On-premise to on-premise: traditional DR, expensive
    • On-premise to cloud: hybrid recovery
    • AWS Cloud Region A to AWS Cloud Region B

RPO and RTO

  • RPO - Recovery Point Objective: How often we create backups. Time between the RPO and the disaster is the data loss
  • RTO - Recovery Time Objective: The point in time when the recovery finishes. The time between the disaster and the RTO is downtime

Disaster Recovery Strategies

  • Backup and Restore: high RPO, cheap, easy to manage and accomplish
  • Pilot Light:
    • A small version of the app is always running in the cloud
    • Useful for critical core (pilot light)
    • Similar to backup and restore strategy
    • Faster than backup and restore as critical system are already running
  • Warm Standby
    • Full system is up and running but at a minimal size
    • Upon disaster we can scale to production load
  • Hot Site / Multi Site Approach
    • Very low RTO - very expensive
    • Full production scale is running on the cloud
  • All AWS Multi Region

Disaster Recovery Tips

  • Backups:
    • EBS Snapshots, RDS, automated backups, snapshots, etc.
    • Regular pushes to S3/S3 IA/Glacier, Lifecycle Policy, Cross region replication
    • From on-premise: Snowball or Storage Gateway
  • High Availability:
    • Use Route53 to migrate DNS over from region to region
    • RDS Multi-AZ, ElastiCache Multi-AZ, EFS, S3
    • Site to site VPN as recovery from Direct Connect
  • Replication:
    • RDS Replication (Cross Region), AWS Aurora + Global Databases
    • Database replication from on-premise to RDS
    • Storage Gateway
  • Automation:
    • CloudFormation/Elastic Beanstalk to recreate a whole new environment
    • Recover/Reboot EC2 instances with CloudWatch if alarm is in fail state (ALARM)
    • AWS Lambda for customized automation
  • Chaos
    • Netflix has a “simian-army” randomly terminating EC2 instances

On-Premise Strategy with AWS

  • Ability to download Amazon Linux 2 AMI as a VIM (iso format)
  • VM Import/Export:
    • Ability to migrate existing applications into EC2
    • Ability to create a DR repository for on-premise VMs
    • Ability to export back the VMs form EC2 to on-premise
  • AWS Application Discovery Service:
    • Gather information about on-premise servers to plan a migration
    • Provides information about server utilization and dependency mappings
    • Track all migrations with AWS Migration Hub
  • AWS Database Migration Service (DMS)
  • AWS Server Migration Service (SMS):
    • Incremental replication of on-premise live servers to AWS