Published On: 5 March 2024

Date: 2023

Company Overview

This Company is a leading dental clinic company based in Indonesia, dedicated to providing high-quality dental solutions to its customers. With a strong focus on customer satisfaction and innovative dental technology, This Customer has established itself as a trusted brand in the industry. As a rapidly growing company, they recognized the importance of implementing a robust disaster recovery strategy to safeguard their critical systems, data, and operations from potential disruptions and ensure uninterrupted service to their customers.

Problem Statement

This Company faced a critical challenge in implementing a resilient disaster recovery strategy. Their objective was to establish a comprehensive system with a Recovery Time Objective (RTO) of 8 hours and a Recovery Point Objective (RPO) of 24 hours. However, this customer encountered several hurdles in ensuring the adaptability of their disaster recovery solution while maintaining efficient and cost-effective backup and recovery processes.

The primary challenge for this customer was to design and implement a disaster recovery strategy capable of achieving an RTO of 8 hours. The RTO represented the maximum acceptable downtime for critical systems and data after a disaster. this customer acknowledged the significant financial losses and reputational damage that extended interruptions could cause. In addition, they aimed to achieve an RPO of 24 hours, signifying the desired recovery point relative to the time of failure. This objective aimed to minimize data loss and enable them to restore their systems and data to a point relatively close to the failure event.

Furthermore, this customer faced the task of ensuring the adaptability of their disaster recovery solution to accommodate future growth. As a rapidly expanding company, they needed a scalable solution capable of handling increased data volumes and evolving business needs. They sought to avoid potential bottlenecks or limitations in their backup and recovery processes that could hinder their ability to quickly restore critical data and resume operations. Additionally, this customer recognized the importance of maintaining efficient and cost-effective backup and recovery processes to optimize their solution. They aimed to eliminate human errors, reduce downtime during a disaster, and ensure compliance with their defined RTO and RPO targets.

Therefore, this customer required a comprehensive disaster recovery strategy that could address these challenges by providing a flexible, adaptable, and efficient solution with an RTO of 8 hours and an RPO of 24 hours. They acknowledged the need for regular testing of backup and recovery processes to validate their effectiveness, along with simulated disaster scenarios to ensure the recovery procedures met the required objectives. Continuous monitoring was also crucial to identify and address potential issues or bottlenecks, allowing for ongoing optimization and alignment with their evolving business needs.

Proposed Solution & Architecture

The proposed solution involved leveraging AWS Backup to achieve a disaster recovery strategy with a Recovery Time Objective (RTO) of 8 hours and a Recovery Point Objective (RPO) of 24 hours. The architecture included several key elements to ensure a resilient and efficient solution.

The design architecture above is made and offered to the customer, based on the best practice solution for Web Application on AWS.

  • This project will be conducted in the AWS Singapore Region as Production Region and AWS Tokyo Region as Disaster Recovery Region.
  • In this solution, the customer will have 1 AWS Account. This Account will have limited access to the IT internal team.
  • Deployment in this account will use 2 VPC in 3 AWS Availability Zone (AZ) for Production and 1 AWS Availability Zone (AZ) for Development, and each AZ will have 4 subnets (public subnet, frontend subnet, backend subnet and database subnet).
  • There will be NAT Gateway and bastion in each Availability Zone, so the traffic to/from each instance can have direct internet connection and access.
  • AWS Backup will be used as a media backup server. Where the backup server can be launched on 3 AZs in the VPC.
  • AWS Backup will be used as a media disaster recovery, where the backup server can be launched in the AWS Tokyo Region.
  • AWS CloudFront origin failover feature to help data resiliency needs.
  • AWS Route53 will be used for DNS management.
  • To make the environment more secure, the access will be limited.
    • AWS IAM user will be created limited (in production account only for main administrator).
    • For IT Internal Team must use bastion from office or branch to access the environment.
    • AWS KMS will be used for data at-rest encryption
    • AWS GuardDuty will be use for threat detection services that continuously monitors your AWS accounts and workloads for malicious activity and delivers detailed security findings for visibility and remediation.
    • AWS Cloudtrail will be used to monitor governance, compliance and risk in customer AWS account.

Metrics for Success

  • Reduced Recovery Time Objective (RTO): The proposed solution aimed to achieve an RTO of less than 8 hours. This metric signifies the time taken to restore critical systems and data, ensuring operations can resume swiftly after a disaster.
  • Recovery Point Objective (RPO) Compliance: This customer aimed to achieve an RPO of less than 24 hours. This metric represents the amount of data that can be restored to a point close to the time of failure, minimizing potential data loss.
  • Effective Monitoring: Continuous monitoring were key metrics to ensure the backup and recovery processes met the required RTO and RPO objectives. Any potential issues or bottlenecks in the recovery workflow were identified and addressed proactively.

Lesson Learned

By leveraging AWS Backup and automation, this customer achieved cost savings by reducing manual intervention and minimizing downtime during a disaster. They learned the importance of optimizing costs while maintaining an efficient disaster recovery strategy.

Re:Invent Go:Beyond

Recently case study

  • June 24, 2024

    GenAI Virtual Assistant for Contact Center Application Provider

  • June 24, 2024

    GenAI HR Assistant for Empatia Talenta Indonesia on AWS