Making remote backups with rsync
David CoxI recently had to create a backup system for a live web server. I didn’t have a lot of time to test out different backup solutions, but rsync came highly recommended, so I thought I’d give it a whirl.
It’s not too clear on the web site, but it seems like rsync is put out by the people who developed (and continue to develop) Samba. I’m not sure what that actually means, but it helps me feel comfortable using it (that, and the fact that it was recommended by respected sources).
My Situation
My web server is housed at a remote data center and I wanted to back it up to and external hard drive on a local development server. There is quite a bit of information on the live server, so I knew I couldn’t devote the resources to copying the entire file structure over the internet every day (especially not multiple times a day), so I liked the fact that rsync only copies files that have changed since the last backup. This would keep the live server acceptably usable while backups were taking place.
Making it Work
To help me execute the backup properly, I drew from the work of Mike Rubel.
Using the information found on that site, I drafted this rsync backup command which was executed on my local development server (the backup server):
# rsync -vah --dry-run --delete -e "ssh -v" username@example.com:/home/ /backup/directory/home/daily.0
This essentially told the system to run rsync and transfer the files from the /home directory on example.com to the /backup/directory/home/daily.0 directory on the local machine (this directory would be located on the mounted external hard drive). The transfer would take place over ssh and would be run under the “username” user. After entering the command I was prompted to enter the ssh password for “username” and then things proceeded as normal.
This was all well and good, but if I’m prompted to enter a password, I have to be there to run the backup. I had hoped to run this as a cron job, which required that I find another way to run the command.
Taking Myself Out of the Picture
I did some more searching and came across this information published by Troy Johnson. He walked me through the process of creating a key which could be used to authorize a user over SSH without the need for a password.
Using this command, a public and private key were created. When prompted to enter a password, I left it blank. Entering a password would require a password to be submitted along with the key.
$ ssh-keygen -t dsa -b 1024 -f /directory/to-store/key/name-of-key
- The /directory/to-store/key was located in my user’s home directory.
- The /name-of-key was any name of my choice that would distinguish this key from any other keys I might make.
- Note: Troy’s tutorial had 2048 in place of 1024. My system would not generate a key with 2048, and this key has worked fine for me.
Once they key were generated, I moved the public key (/name-of-key.pub) to the remote server. The key had to be stored in a specific location within the appropriate user’s home folder. Troy provides a good command to accomplish this via scp.
If the rsync command would connect to the remote server with the username of “backupper”, the public key needed to be stored in /home/backupper/.ssh/ and the .ssh/ directory needed to have permissions set to 700.
In the .ssh/ directory, you might see a file named authorized_keys. If so, copy the content of name-of-key.pub to the end of authorized_keys. If authorized_keys does not exist, create the file and fill it with the content of name-of-key.pub. The authorized_keys file needs to have permission set to 600.
Once the public key is in place, the rsync command needs to be modified a bit to use the keys for authentication.
# rsync -vah --dry-run --delete -e "ssh -vi/directory/to-store/key/name-of-key" username@example.com:/home/ /backup/directory/home/daily.0
This sends the private key along with the ssh request for authentication. No password was asked for and rsync did it’s thing. The –dry-run flag prevented any file transfer from actually taking place. It was used to test the functionality of the command and was then removed so the real backup could take place.
Making it Automated
Mike Rubel has some excellent backup scripts on his page. I just copied the whole script and modified the actual rsync command to reflect my changes.
At first, I set up this script to run as a cron job every 4 hours. Because the initial backup took hours, the second (and third) cron job started overlapping with the first. I don’t know what problems this might have caused, but I didn’t want to take any chances. I stopped the syncs, deleted the files that had been transferred, and started the sync from the command line.
When the first sync was done, I started the cron job. Each sync now takes about 20 minutes and the scripts keep things organized quite well.
Hopefully I never have to use these backups, but I’m glad to know they’re being made.