Making backups of your DSpace is an essential task to ensure that you can recover from any disaster that might occur, such as hardware failure or data corruption. This guide assumes a dspace 6.x or older setup. Here are the steps you can follow to perform a backup of the repository.
Step 1: Create a Backup Script.
We are going to use this script to automate our backups so that we don’t have to run the backup manually each time we want to generate a backup. The script can be created anywhere in the filesystem of your server. On your terminal, execute the following command to create the script
nano /home/dspace/backup_script.sh
Then paste in the following code and replace values where necessary using the description that follows below.
#!/bin/bash
#Stop tomcat to prevent write operations on the database
sudo service tomcat stop
# Set variables
BACKUP_DIR="/home/dspace/backups"
DS_HOME="/opt/dspace-6"
DATE=$(date +"%Y%m%d%H%M%S")
# Create backup directory if it does not exist
mkdir -p $BACKUP_DIR/$DATE
# Backup the database
pg_dump -U dbUser -Fc dbName | gzip -9 > $BACKUP_DIR/dspace-db-$DATE.tar.gz
# Backup the assetstore
tar -cvzf $BACKUP_DIR/dspace-assetstore-$DATE.tar.gz -C $DS_HOME assetstore
# Backup the configuration files
tar -cvzf $BACKUP_DIR/dspace-config-$DATE.tar.gz -C $DS_HOME config
# Start tomcat service
sudo systemctl start tomcat
Before starting the backup process, it is essential to stop tomcat service so that no write operations on the database can happen while taking the backup.
To stop Tomcat, run the following command in the script is used:
sudo service tomcat stop
Next, we then setup the following placeholder variables:
BACKUP_DIR
is the directory where we are going to store our backup files. Inside this directory, we will have other sub-directories that bear the date-time of when the backup was generated.DS_HOME
is the dspace deployment directory.DATE
is used to hold the value of the date-time when the script is executed.
The next task the script will do is to backup our database using the command shown below
pg_dump -U dspace -Fc dspace | gzip -9 > $BACKUP_DIR/dspace-db-$DATE.tar.gz
pg_dump
is used to create a dump of the database. Remember to replace dbUser
and dbName
with the actual database user and name of the database accordingly. The output of the pg_dump
is the compressed using the gzip
command. The -9 denotes the maximum level of compression so that we end up with the smallest file size possible. The file is then saved in the BACKUP_DIR
with the current date-time included in the name.
Another task performed by the script is to backup the assetstore
directory. Remember, dspace does not store bitstream files in the database. By default configuration, these are stored in the assetstore
directory inside of the deployment directory.
tar -cvzf $BACKUP_DIR/dspace-assetstore-$DATE.tar.gz -C $DS_HOME assetstore
The backup of the assetstore
directory is done by the above command in the script.
It is also important to backup the configurations made to dspace so that we preserve all customizations made. This is achieved using the command below
tar -cvzf $BACKUP_DIR/dspace-config-$DATE.tar.gz -C $DS_HOME config
Finally, when all the files have been backup up, we restart tomcat service with the following command
sudo service tomcat start
Step 2: Add a Cronjob
Now, we need a mechanism to periodically execute our backup script. This will be done using a cronjob. It if preferable that the script is run at a time when the system is least in use. This will most likely be at night. Therefore, we will execute the script at 2.00AM. To create the cronjob entry, run the following command on your terminal.
sudo crontab -e
This will open the superuser cronfile. At the bottom of the file, add the following line
0 2 * * * /home/dspace/backups/backup_script.sh >/dev/null 2>&1
Step 3: Verify the backups
Some minutes after 2.00 AM, there should be backup files in the BACKUP_DIR
if all went well. To verify that your DSpace repository backup was successful, you can test the restoration process on a test server using the pg_restore
command.
Conclusion
Backing up your DSpace repository is crucial to ensure that you can recover from any disaster that might occur. By following the updated steps outlined above, you can create a backup script that compresses the backup files with the highest level of compression and includes the datetime in their names. Schedule regular backups using the cron job scheduler and periodically verify the backups to ensure their integrity.