Data availability is one of the biggest concern in IT industry. After moving most of my services to the AWS cloud I was thinking how I can ensure data availability and accuracy in case of AWS data center failure or what if my EC2 EBS volume gets corrupted.
A case study
I have a Oracle RDS running on EC2 instance.
- I need to ensure I can restore data from backup in case of user demand, in case of data center failure or in case of instance failure
- On the other hand I need to ensure it will not increase my AWS monthly charges unexpectedly
- I will only run that service during the business hours
Solution could be
- Use AWS Oracle RDS. The service will take care of everything including backup and patch update. This is really a very reliable service AWS is providing. But to fulfil my last requirement it will be a lot of work for me, since RDS can’t be stopped, you can only terminate RDS (yes, you can take snapshot before terminating)
- Use EC2 instance and take snapshot backup of your EC2 EBS volume. But my EBS volume is 120 GB, much bigger than the original SQL DB backup. Which means it will cost me more to store multiple snapshots in S3 (120 GB x 7days).
The solution I am using
- Created a maintenance plan in SQL Server to take daily db backup
- Created an AWS CLI script to sync data from SQL server backup location to a S3 bucket
- aws s3 sync \\SERVER_NAME\backup$ s3://BUCKETNAME –exclude * –include *.bak
- Created a batch job to move local SQL server backup data to another folder for old data clean-up
- move \\SERVER_NAME\backup$\*.* \\SERVER_NAME\backup$\movedS3
- Create a maintenance plan in SQL Server to delete older files from movedS3 folder. It will help me to control unwanted data growth
- Created a lifecycle policy to delete older files from my S3 bucket
What this solution will ensure
- First of all I can sleep tight during night. I don’t need to worry about my backup data.😉
- S3 provides me 99.999999999% data durability. It means I will be able to access my S3 data in case of AWS availability zone failure also. Because S3 data synchronizes between multiple availability zone.
- S3 is the cheapest cloud data storage solution. That’s why drop box dare to give you such storage space as free😉