Backing up Linux box to online storage for cheap!

In my previous post, dear Theophilus, I described how to setup Amazon S3 storage and how to install s3tools in order to sync files to online. In this post, I will describe the mechanism I ended up with to backup various Linux boxes, mostly web servers, to S3 storage.

1) Setup S3 etc.

Already covered in the previous post

2) Install rsync

sudo apt-get install rsync

3) Create a script file to run the backup

Just using nano or your favourite text editor, create a script file, which will look like the following:

#!/bin/bash
BACKUPDIR="/home/luke/backupdir"    #temp directory for backup files
TAROUTPUT="/home/luke/backup.tar"   #backup output temp file

echo "Starting server backup..."    # Useful for emails
date +%H%M # Echoes the time to the console

# Create temp dir if not exists
if [[ ! -d $BACKUPDIR ]]; then
        mkdir $BACKUPDIR;
fi

#Apache configs
cp /etc/apache2/sites-available/default $BACKUPDIR
# other configs

#ssl certs
cp /etc/ssl/certs/mycert.crt $BACKUPDIR
# other certs

#sites
rsync -az /home/luke/sites/bugtracker $BACKUPDIR

#postfix
rsync -az /etc/postfix $BACKUPDIR

#run mysql dump
mysqldump -u root --password='mypassword' --all-databases > $BACKUPDIR/mysqldump.sql

#Create tar archive from the temporary directory contents, delete existing first
if [[ -f $TAROUTPUT ]]; then
    rm $TAROUTPUT
fi

tar -czf $TAROUTPUT $BACKUPDIR > /dev/null

#upload and encrypt to s3 storage
s3cmd put -e $TAROUTPUT s3://myserverbackups/backup-myservername-$(date +%Y%m%d-%H%M)$

date +%H%M

echo "-----"
echo "Server backup complete"
exit 0

4) Things to note

a) Naturally, you can copy whatever you want to the backup directory before it is tar-ed. Be careful about directories or files that might have the same name. Usually, this will cause an error but potentially you could overwrite something.
b) use rsync for moving any large directories since it will only update files that have changed, meaning subsequent updates are usually very fast
c) I don't use the tar --update function since rsync appears to touch all files when checking them, which means tar thinks all the files have changed even if they haven't, which makes the tar grow each time.
d) s3cmd doesn't support encryption when using its sync function. This means if you need encryption (which I do) you have to put the whole tar archive each time. If you don't want/need encryption, then it would probably be quicker for you not to use tar but to use "s3cmd sync" between the backup dir and your s3 storage, which will make it much quicker
e) In the script above, I use my server name and the current date and time in the uploaded backup file to distinguish it from other backups that end up in the same place. Replace the filename with whatever you want. Note also that the bucket name, after the s3:// should be your bucket name from your S3 account.

5) Make the script runnable

chmod +x ./myscript

6) Test it

If you are planning to run this as your own user, then simply type ./myscript and see that it all runs as expected. If you are planning to run it from cron.d (the cron daemon which runs automatically) then it will run as root. This means two things, firstly, you need to configure s3cmd for root and secondly you need to test the script by running it as root. The easiest way is:

sudo -i
s3cmd --configure
(do whatever you need to configure s3cmd and make sure it tests OK)
/path/to/myscript

All things being well, this should run to completion and prove that the script is OK.

7) Setup cron to run the script

The cron daemon is designed to run scripts at regular intervals, your system will already run a whole host of things that rotate logs, check for updates etc. It is very easy to set this up.

sudo nano /etc/cron.d/myscript

to create a new script in the cron.d directory. This has the crontab format and will be setup to call your script from wherever it is. Remember, this will run as the root user! You can copy in the crontab comments from another script if you want but the two lines you need in the script are:

MAILTO=emailaddresstosendresultsto@mycompany.com
# Note by default, the script will email root@localhost, which may or may not resolve to anything useful
0 23 * * * /path/to/myscript

The first line is self-explanatory and the comment says why, the second line calls the script but the start of the line defines the schedule to use to run the script. The example above has minutes = 0 and hours = 23 followed by 3 asterisks, which means it will run at 23:00 every day. The 3 stars are optional specifiers for the Day of month (only run on the x of the month), Month (only run during a certain month) and Day of Week (Only run on specific day of the week). There is plenty of online help for crontab which explains these in more detail if you care.

8) Sit back and wait

All things working correctly, you should get an email after the script runs with the output of your script, any errors and any information you have echoed to the console.