Making backups of your DSpace is an essential task to ensure that you can recover from any disaster that might occur, such as hardware failure or data corruption. This guide assumes a dspace 6.x or older setup. Here are the steps you can follow to perform a backup of the repository.
Step 1: Create a Backup Script.
We are going to use this script to automate our backups so that we don’t have to run the backup manually each time we want to generate a backup. The script can be created anywhere in the filesystem of your server. On your terminal, execute the following command to create the script
Then paste in the following code and replace values where necessary using the description that follows below.
#!/bin/bash #Stop tomcat to prevent write operations on the database sudo service tomcat stop # Set variables BACKUP_DIR="/home/dspace/backups" DS_HOME="/opt/dspace-6" DATE=$(date +"%Y%m%d%H%M%S") # Create backup directory if it does not exist mkdir -p $BACKUP_DIR/$DATE # Backup the database pg_dump -U dbUser -Fc dbName | gzip -9 > $BACKUP_DIR/dspace-db-$DATE.tar.gz # Backup the assetstore tar -cvzf $BACKUP_DIR/dspace-assetstore-$DATE.tar.gz -C $DS_HOME assetstore # Backup the configuration files tar -cvzf $BACKUP_DIR/dspace-config-$DATE.tar.gz -C $DS_HOME config # Start tomcat service sudo systemctl start tomcat
Before starting the backup process, it is essential to stop tomcat service so that no write operations on the database can happen while taking the backup.
To stop Tomcat, run the following command in the script is used:
sudo service tomcat stop
Next, we then setup the following placeholder variables:
BACKUP_DIR is the directory where we are going to store our backup files. Inside this directory, we will have other sub-directories that bear the date-time of when the backup was generated.
DS_HOME is the dspace deployment directory.
DATE is used to hold the value of the date-time when the script is executed.
The next task the script will do is to backup our database using the command shown below
pg_dump -U dspace -Fc dspace | gzip -9 > $BACKUP_DIR/dspace-db-$DATE.tar.gz
pg_dump is used to create a dump of the database. Remember to replace
dbName with the actual database user and name of the database accordingly. The output of the
pg_dump is the compressed using the
gzip command. The -9 denotes the maximum level of compression so that we end up with the smallest file size possible. The file is then saved in the
BACKUP_DIR with the current date-time included in the name.
Another task performed by the script is to backup the
assetstore directory. Remember, dspace does not store bitstream files in the database. By default configuration, these are stored in the
assetstore directory inside of the deployment directory.
tar -cvzf $BACKUP_DIR/dspace-assetstore-$DATE.tar.gz -C $DS_HOME assetstore
The backup of the
assetstore directory is done by the above command in the script.
It is also important to backup the configurations made to dspace so that we preserve all customizations made. This is achieved using the command below
tar -cvzf $BACKUP_DIR/dspace-config-$DATE.tar.gz -C $DS_HOME config
Finally, when all the files have been backup up, we restart tomcat service with the following command
sudo service tomcat start
Step 2: Add a Cronjob
Now, we need a mechanism to periodically execute our backup script. This will be done using a cronjob. It if preferable that the script is run at a time when the system is least in use. This will most likely be at night. Therefore, we will execute the script at 2.00AM. To create the cronjob entry, run the following command on your terminal.
sudo crontab -e
This will open the superuser cronfile. At the bottom of the file, add the following line
0 2 * * * /home/dspace/backups/backup_script.sh >/dev/null 2>&1
Step 3: Verify the backups
Some minutes after 2.00 AM, there should be backup files in the
BACKUP_DIR if all went well. To verify that your DSpace repository backup was successful, you can test the restoration process on a test server using the
Backing up your DSpace repository is crucial to ensure that you can recover from any disaster that might occur. By following the updated steps outlined above, you can create a backup script that compresses the backup files with the highest level of compression and includes the datetime in their names. Schedule regular backups using the cron job scheduler and periodically verify the backups to ensure their integrity.