DSpace 6.x Backup Guide with an Automation Script

Last updated on: Published by: Systems Librarian 0

Making backups of your DSpace is an essential task to ensure that you can recover from any disaster that might occur, such as hardware failure or data corruption. This guide assumes a dspace 6.x or older setup. Here are the steps you can follow to perform a backup of the repository.

Step 1: Create a Backup Script. 

We are going to use this script to automate our backups so that we don’t have to run the backup manually each time we want to generate a backup. The script can be created anywhere in the filesystem of your server. On your terminal, execute the following command to create the script

nano /home/dspace/backup_script.sh

Then paste in the following code and replace values where necessary using the description that follows below.

#!/bin/bash

#Stop tomcat to prevent write operations on the database
sudo service tomcat stop

# Set variables
BACKUP_DIR="/home/dspace/backups"
DS_HOME="/opt/dspace-6"
DATE=$(date +"%Y%m%d%H%M%S")

# Create backup directory if it does not exist
mkdir -p $BACKUP_DIR/$DATE

# Backup the database
pg_dump -U dbUser -Fc dbName | gzip -9 > $BACKUP_DIR/dspace-db-$DATE.tar.gz

# Backup the assetstore
tar -cvzf $BACKUP_DIR/dspace-assetstore-$DATE.tar.gz -C $DS_HOME assetstore

# Backup the configuration files
tar -cvzf $BACKUP_DIR/dspace-config-$DATE.tar.gz -C $DS_HOME config

# Start tomcat service
sudo systemctl start tomcat

Before starting the backup process, it is essential to stop tomcat service so that no write operations on the database can happen while taking the backup.

To stop Tomcat, run the following command in the script is used:

sudo service tomcat stop

Next, we then setup the following placeholder variables:

BACKUP_DIR is the directory where we are going to store our backup files. Inside this directory, we will have other sub-directories that bear the date-time of when the backup was generated.
DS_HOME is the dspace deployment directory.
DATE is used to hold the value of the date-time when the script is executed.

The next task the script will do is to backup our database using the command shown below

pg_dump -U dspace -Fc dspace | gzip -9 > $BACKUP_DIR/dspace-db-$DATE.tar.gz

pg_dump is used to create a dump of the database. Remember to replace dbUser and dbName with the actual database user and name of the database accordingly. The output of the pg_dump is the compressed using the gzip command. The -9 denotes the maximum level of compression so that we end up with the smallest file size possible. The file is then saved in the BACKUP_DIR with the current date-time included in the name.

Another task performed by the script is to backup the assetstore directory. Remember, dspace does not store bitstream files in the database. By default configuration, these are stored in the assetstore directory inside of the deployment directory.


tar -cvzf $BACKUP_DIR/dspace-assetstore-$DATE.tar.gz -C $DS_HOME assetstore

The backup of the assetstore directory is done by the above command in the script.

It is also important to backup the configurations made to dspace so that we preserve all customizations made. This is achieved using the command below

tar -cvzf $BACKUP_DIR/dspace-config-$DATE.tar.gz -C $DS_HOME config

Finally, when all the files have been backup up, we restart tomcat service with the following command

sudo service tomcat start

Step 2: Add a Cronjob

Now, we need a mechanism to periodically execute our backup script. This will be done using a cronjob. It if preferable that the script is run at a time when the system is least in use. This will most likely be at night. Therefore, we will execute the script at 2.00AM. To create the cronjob entry, run the following command on your terminal.

sudo crontab -e

This will open the superuser cronfile. At the bottom of the file, add the following line

0 2 * * * /home/dspace/backups/backup_script.sh >/dev/null 2>&1

Step 3: Verify the backups

Some minutes after 2.00 AM, there should be backup files in the BACKUP_DIR if all went well. To verify that your DSpace repository backup was successful, you can test the restoration process on a test server using the pg_restore command.

Conclusion

Backing up your DSpace repository is crucial to ensure that you can recover from any disaster that might occur. By following the updated steps outlined above, you can create a backup script that compresses the backup files with the highest level of compression and includes the datetime in their names. Schedule regular backups using the cron job scheduler and periodically verify the backups to ensure their integrity.

Related posts

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.