How RDS Cross Region Replication Delivers Unbreakable Hospital Pilot Light Disaster Recovery


Company Overview

Hospital Company in Indonesia since 1989. Over 30 years, the company has emerged to become one of Indonesia’s premier private hospital operators with a healthy profitability. Its hospital network is widely recognized as friendly community hospitals that provide full-fledged healthcare services, mostly in two of Indonesia’s most urban areas: Greater Jakarta area and the city of Surabaya, as well as selected other cities.

Problem statement

In the healthcare industry, large hospitals often handle vast amounts of critical data and require efficient management systems to ensure smooth operations. One such system widely used is SAP (Systems, Applications, and Products in Data Processing), which provides integrated software solutions for various organizational functions.

However, the implementation and maintenance of SAP can be complex and resource-intensive. Additionally, hospitals must prioritize data security and business continuity to ensure uninterrupted services and patient care. To address these challenges, many hospitals opt to leverage AWS (Amazon Web Services) for hosting their SAP applications and utilize the DR (Disaster Recovery) Pilot light architecture.

The company needs to implement a disaster recovery strategy with an RTO less than 120 minutes and RPO of less than 60 minutes. They face challenges in ensuring their disaster recovery solution is adaptable. They need to accommodate future growth while maintaining an efficient and critical business system.

Regularly test the recovery processes to validate their effectiveness. Perform simulated disaster scenarios to ensure that the recovery procedures meet the required RTO and RPO objectives. This testing helps identify and address any potential issues or bottlenecks in the recovery workflow.

We also Continuously monitor the backup and recovery processes using AWS CloudWatch and other monitoring tools. Regularly review and optimize the solution to identify areas for improvement and ensure it aligns with evolving business needs.

Proposed solution & architecture

The design architecture above is made and offered to the customer, based on customer needs for SAP on AWS.

  • This project will be conducted in the AWS Jakarta Region as Production Region and AWS Singapore Region as Disaster Recovery Region.
  • In this solution, the customer will have 1 AWS Account in each region (Production Region and DR Region) . This Account will have limited access to the IT internal team.
  • Deployment in this account will use 1 VPC in 2 AWS Availability Zone (AZ) in Jakarta Region for Production, and each AZ will have 2 subnets (public subnet and private subnet).
  • Deployment in this account will use 1 VPC in 2 AWS Availability Zone (AZ) in Singapore Region for Disaster Recovery Site, and each AZ will have 2 subnets (public subnet and private subnet).
  • There will be NAT Gateway in each Availability Zone, so the traffic to/from each instance can have direct internet connection and access.
  • AWS RDS Multi-Region will be used for automatic failover from Production Region to Disaster Recovery Region.
  • AWS Backup will be used as a media backup server. For backup server non database.
  • To make the environment more secure, the access will be limited.
    • AWS IAM user will be created limited (in production account only for main administrator).
    • For IT Internal Team must use bastion from office or branch to access the environment.
    • AWS KMS will be used for data at-rest encryption.
    • AWS GuardDuty will be use for threat detection services that continuously monitors your AWS accounts and workloads for malicious activity and delivers detailed security findings for visibility and remediation.
    • AWS Cloudtrail will be used to monitor governance, compliance and risk in customer AWS account.

This recovery option requires a change in deployment approach. We need to make core infrastructure changes to each Region and deploy workload changes simultaneously to each Region. This step can be simplified by automating your deployments and using infrastructure as code (IaC) to deploy infrastructure across multiple accounts and Regions (full infrastructure deployment to the primary Region and scaled down/switched-off infrastructure deployment to DR regions).

With this approach, we also mitigate against a data disaster. Continuous data replication protects you against some types of disaster, but it may not protect against data corruption or destruction unless your strategy also includes versioning of stored data or options for point-in-time recovery. We can back up the replicated data in the disaster Region to create point-in-time backups in that same Region.It minimizes downtime and enables faster recovery, while also offering the flexibility to perform thorough testing and validation of your disaster recovery strategies.

Metrics for success

These metrics include:

  • Reduced Recovery Time Objective (RTO): The proposed solution aimed to achieve an RTO of less than 2 hours. This metric signifies the time taken to restore critical systems and data, ensuring operations can resume swiftly after a disaster.
  • Recovery Point Objective (RPO) Compliance: This customer aimed to achieve an RPO of less than 1 hours. This metric represents the amount of data that can be restored to a point close to the time of failure, minimizing potential data loss.
  • Effective Monitoring: Continuous monitoring were key metrics to ensure the backup and recovery processes met the required RTO and RPO objectives. Any potential issues or bottlenecks in the recovery workflow were identified and addressed proactively.

Lesson learned

Adopting the AWS DR Pilot light architecture for SAP applications in a large hospital provides critical lessons learned in prioritizing business continuity, minimizing downtime, enhancing data resilience and protection, mitigating risks, streamlining recovery processes, continuous testing and optimization, and collaborating with cloud service providers. These insights can guide other healthcare organizations in implementing robust disaster recovery strategies for their critical systems and applications.