Published On: 15 March 2024

2023

Company Overview

One of the online food ordering company enables customers to browse menus, place orders, and have their favorite meals delivered to their desired locations. As an expanding company, they faced the potential risk of natural disasters or major system failures that could disrupt operations. Such incidents can result in extended downtime, loss of critical data, and hinder the company’s ability to quickly recover and restore services. Therefore, establishing an effective disaster recovery plan has become crucial to ensure business continuity and minimize the impact of unforeseen events.

Problem statement

This company is relying heavily on the disaster recovery process to protect their critical data and ensure business continuity in the event of data loss or system failures. However, the traditional backup and restore methods are often plagued by challenges that can hinder the resilience of the backup system, thereby compromising the organization’s ability to quickly recover and restore data. One of the key issues is the lack of backup resilience, which refers to the ability of the backup system to withstand and recover from various disruptions or failures effectively. Backup resilience encompasses multiple aspects, including the reliability, availability, and performance of the backup infrastructure.

Proposed solution & architecture

The design architecture above is made and offered to the customer, based on the best practice solution for Web Application on AWS.

  • This project will be conducted in the AWS Jakarta Region as Production Region and AWS Singapore Region as Disaster Recovery Region.
  • In this solution, the customer will have 1 AWS Account. This Account will have limited access to the IT internal team.
  • Deployment in this account will use 2 VPC in 2 AWS Availability Zone (AZ) for Production, and each AZ will have 3 subnets (public subnet, private subnet and database subnet).
  • There will be NAT Gateway in each Availability Zone, so the traffic to/from each instance can have direct internet connection and access.
  • AWS Backup will be used as a media backup server. Where the backup server can be launched on 2 AZs in the VPC.
  • AWS Backup will be used as a media disaster recovery, where the backup server can be launched in the AWS Singapore Region. With RTO 6 hours and RPO 24 hours.
  • To make the environment more secure, the access will be limited.
    • AWS IAM user will be created limited (in production account only for main administrator).
    • For IT Internal Team must use bastion from office or branch to access the environment.
    • AWS KMS will be used for data at-rest encryption.
    • AWS GuardDuty will be use for threat detection services that continuously monitors your AWS accounts and workloads for malicious activity and delivers detailed security findings for visibility and remediation.
    • AWS Cloudtrail will be used to monitor governance, compliance and risk in customer AWS account

By implementing these proposed solutions, This company can enhance the backup resilience of their critical data and significantly improve their ability to recover and restore data in the face of disruptions or failures.

Metrics for success

  • The primary metric for success is ensuring that the maximum acceptable downtime of 6 hours, as defined by the desired RTO, is consistently met.
  • Another critical metric is the adherence to the defined RPO of 24 hours. It measures the ability to restore data to a point in time that is no older than 24 hours.
  • Backup System Redundancy: The successful implementation of redundant backup systems can be measured by verifying if the backups are replicated across regions and restored to the recovery region.

Lesson learned

By considering factors such as RTO and RPO requirements, redundancy, continuous improvement, and proactive maintenance, organizations can strengthen their backup systems and enhance their ability to recover and restore critical data effectively.

Re:Invent Go:Beyond

Recently case study

  • March 15, 2024

    Arista Group Migrates to AWS To Lower Their IT Costs

  • March 15, 2024

    GetPlus Maximizes the Value of Its Loyalty Program on AWS