How to use MergerFS to create a USB based JBOD Backup Pool
Instead of using a single BTRFS JBOD pool, which will be corrupted by a single failed drive, I am now using MergerFS to create a unified filesystem that operates as a middleman, merging separate independent filesystems into one for simplified backup copying.
The benefit of MergerFS is that if a drive fails, only the data on that drive is lost, and you can still easily access the remaining drives in the MergerFS unified filesystem. Add in new or replacement drives as needed.
This is similar to the BTRFS pool, but in this case the pool is virtual, so the pool cannot become corrupted by a failed drive. Additionally, MergerFS has options for where to create new files, and by using the epff option I can force each drive to fill up, one-by-one, before moving on to the next drive, so that data is not split up and spread out across multiple drives, maximizing recoverability in case of a failed backup drive.
These two features, virtual pooling and one-by-one drive filling, address the two issues I outlined above that makes BTRFS JBOD pools (and likely most other pools) dangerous.
Step 1:
For initial setup only, format each drive, creating a BTRFS partition. It was a tossup on the filesystem, but I chose BTRFS over XFS mainly just for personal preference. The partition should be renamed as desired, I'm naming them FS1 through FS6, as this will help me identify my various drives in my Frankenstore backup pool. This will come into play later when using MergerFS to fuse them together. The shorter partition name here will help make the MergerFS command shorter.
image.png
Step 2:
Mount the drives. Unlike before where I only had to mount the first drive in a BTRFS pool, I will have to mount each drive individually. Also, partition sharing must be turned on for all drives, so that MergerFS has access to the mount points for fusing.
Step 3:
Create all main paths on all drives. For me, these are the shares I'm backing up, "DVDs", "Blu-rays", "4K", and "TV_Series". The reason I'm creating these directories now and on all drives is that the "epff" MergerFS option (described down below) will only create a new path if an existing path doesn't exist, and otherwise will continue filling up until free space is gone, potentially causing errors. I want to make sure these paths already exist to hopefully prevent MergerFS from refusing to split a huge share across multiple drives. I'm thinking this will keep individual movies whole on a single drive, regardless of which drive it ultimately resides on.
Step 4:
Install MergerFS plugin via URL:
https://raw.githubusercontent.com/deser ... gerfsp.plg
NOTE: This requires at least Unraid 6.10
Step 5:
Run MergerFS command to mount new merged filesystem:
Code: Select all
mkdir /mnt/disks/Frankenstore
chown nobody:users /mnt/disks/Frankenstore
mergerfs -o cache.files=off,dropcacheonclose=true,category.create=epmfs,minfreespace=4G,moveonenospc=true,fsname=Frankenstore /mnt/disks/FS1:/mnt/disks/FS2:/mnt/disks/FS3:/mnt/disks/FS4:/mnt/disks/FS5:/mnt/disks/FS6 /mnt/disks/Frankenstore
Explanation of Chosen MergerFS Options:
cache.files=off : Since I will be using this merged filesystem just for backup purposes, I want to disable caching as much as possible to ensure file integrity
dropcacheonclose=true : There's still some caching going on, so this will drop any cached data when closing the filesystem
category.create=epmfs : This determines where files are written. The default is epmfs, which is "existing path, most free space", which is similar to the BTRFS behavior which is problematic for my backup goals. epff is "existing path, first found", and first found is the order of the branches at creation time (aka the command line). The expectation here is that it will write to sdx1 first, then sdy1, then sdz1, etc... and finally sdac1 since it is the last drive in the mergerfs command line statement. The minfreespace parameter should control when it rolls to the next drive, but only for new paths. The existing path parameter will try and keep content together for future updates. The "ep" part means paths are preserved, so writes will go to the existing path and new paths will only be created according to the "ff" found first logic.
UPDATE: I was orginally using epff here, but after modifying the mirror.sh script to handle directory creation, I discovered that epff causes more problems than solutions. Occasionally I have a movie collection folder that has a deeper subdirectory structure than normal (which breaks my mirror.sh script's logic), and has a ton of movies in it (i.e. the James Bond collection with over 25 movies springs to mind). My script isn't smart enough to evaluate the source directory's size to make sure it will fit, I'm just counting on dumb luck that 99% of my directories will only have 1 or 2 discs at the most. So these exception directories break my scripts logic, so directory creation auto-reverts to MergerFS control for them. But in this case, epff was forcing these directories to be created on the first couple disks, FS1 and FS2, which were already full (below 90G free) but since they were still above 4G min free space MergerFS chose them.
Then once FS1 or FS2 filled up (pretty quickly since they only had about 60-70GB free), the error handling logic in MergerFS would start writing to FS6 since it had the most free space.
Since these are exception directories that by their very nature typically have a lot more data in them, the right strategy here would be to put them on the disk with the most free space, so I've reverted back to epmfs. And since I'm filling up my drives sequentially from FS1 to FS6, and FS6 is my only larger 20TB drive, FS6 will always be my disk with the most free space - at least for the next few years until I expand my backup array again with FS7.
minfreespace=4G : In order to write to a drive, it must have at least this amount of space available. The default is 4G, and here I am explicitly setting it to the same 4G value, mainly as a placeholder for this value should anyone want to override it.
UPDATE: Originally, I was using 90G here, but I found it problematic to have this value control both where directories were created and where files were written, as a directory could be created on a nearly full branch, but then after a few files were copied into it, space would fill up and the directory would split, being recreated on the next branch and the files would span two drives.
I'm now using a customized mirror.sh script that creates directories based upon free space, and I have that set to 100GB min free space to create a disc title's parent directory, which leaves at least 96 GB of writing space left for the files to be copied into the directory before the 4GB MergerFS min free space limit is hit. This solution solved a major limitation of MergerFS, I'm no longer forced to use the same min free space limit for both creating directories and creating files.
moveonenospc=true : When enabled if a write fails with ENOSPC (no space left on device) or EDQUOT (disk quota exceeded) the policy selected will run to find a new location for the file. An attempt to move the file to that branch will occur (keeping all metadata possible) and if successful the original is unlinked and the write retried.
fsname=Frankenstore : If I don't use this, then the label in the system is an ugly hybrid of the partitions, i.e. "x1:y1:z1:aa1:ab1:ac1" when using something like "df -h" to list filesystem and space.
NOTE: For mounting the source partitions, I specified each full path separated by a colon, i.e. /mnt/disks/FS1:/mnt/disks/FS2:/mnt/disks/FS3 etc. An alternative would have been to use a wildcard to select them all, i.e. /mnt/disks/FS*, but I was afraid to use this because I wanted to make sure that the 6 drives were created in a specific order, so they would fill up in order from 1 to 6.
Step 6:
Check for successful mounting with "df -BM -H". I also piped through grep to filter on /mnt/disks/F so I could see just the 6 drives and MergerFS fused Frankenstore paths:
Code: Select all
root@Tower:~# df -BM -H | grep /mnt/disks/F
/dev/sdx1 17T 4.0M 16T 1% /mnt/disks/FS1
/dev/sdy1 17T 4.0M 16T 1% /mnt/disks/FS2
/dev/sdz1 17T 4.0M 16T 1% /mnt/disks/FS3
/dev/sdab1 17T 4.0M 16T 1% /mnt/disks/FS5
/dev/sdac1 21T 4.0M 20T 1% /mnt/disks/FS6
/dev/sdaa1 17T 4.0M 16T 1% /mnt/disks/FS4
Frankenstore 101T 24M 100T 1% /mnt/disks/Frankenstore
Here I can see the new 100TB Frankenstore fused partition mounted at /mnt/disks/Frankenstore. Looking inside, I see my previously created target backup directories for each of my shares:
Code: Select all
root@Tower:~# ls -l /mnt/disks/Frankenstore
total 0
drwxrwxrwx 1 root root 0 Nov 16 13:08 4K
drwxrwxrwx 1 root root 0 Nov 16 13:08 Blu-Rays
drwxrwxrwx 1 root root 0 Nov 16 13:07 DVDs
drwxrwxrwx 1 root root 0 Nov 16 13:09 TV_Series
Step 7:
Run modified backup job. I created a script last year (on the first page of this thread) but it doesn't work correctly with this backup strategy.
ISSUE: When running RSYNC, it created all directories in a folder before processing files/folders in each subdirectory. For example, when backing up the 4K share, it first created all 56 movie folders on drive FS1 before copying a single file. At some point, the drive will fill up with data, and may have empty directories that weren't copied to that drive, and worse may cause errors as the epff option will tell MergerFS to copy files to existing paths first, so this will cause an error.
The solution is a custom wrapper around the RSYNC process that manually creates each directory just before copying the files into it.
For full credit, the bash script below came from:
https://github.com/ashishpandey/scaffol ... /mirror.sh
However, I did make one change, adding the "--archive" parameter on the rsync command line, since I wanted that option in my backup to preserve metadata as much as possible. Save the code below into a new file named "mirror.sh" and save it in the same directory with your backup job script:
Code: Select all
#!/bin/bash
set -e
# ensure unicode filenames are supported
export LANG="en_US.UTF-8"
export LC_ALL="en_US.UTF-8"
export G_FILENAME_ENCODING="@locale"
export G_BROKEN_FILENAMES="1"
function usage() {
echo "Usage: mirror.sh [OPTIONS]"
echo " OPTIONS includes:"
echo " -x | --dry-run - do not copy only files, only echo what will be done. default mode"
echo " -m | --mirror - copy files from src to dest. overrides dry run"
echo " -s | --src - source directory"
echo " -d | --dest - destination directory"
echo " -h | --help - displays this message"
}
run_type="dry-run"
while [ "$1" != "" ]
do
case $1 in
-x | --dry-run )
run_type="dry-run"
;;
-m | --mirror )
run_type="mirror"
;;
-s | --src )
shift
if [ -d "$1" ]; then
src="${1}"
else
echo "$0: $1 is not a valid directory" >&2
exit
fi
;;
-d | --dest )
shift
dest="${1%/}" # dest without trailing slash
;;
-h | --help )
usage
exit 0
;;
* )
echo "Invalid option: $1"
usage
exit 1
;;
esac
shift
done
function ensure_vars() {
for v in "$@"
do
if [ -z "${!v}" ]; then
echo "ERROR: $v is not specified"
usage
exit 1
fi
done
}
ensure_vars "run_type" "src" "dest"
function log() {
echo "$(date +'%Y-%m-%d %T'): $1"
}
function debug_log() {
if [ "x$debug" == "xtrue" ]; then
log "$1"
fi
}
progress_inc=500
progress_idx=0
function progress() {
((progress_idx+=1))
if [ "x$debug" == "xtrue" ]; then
log "$1"
else
if ! ((progress_idx % progress_inc)); then
log "done $progress_idx"
fi
fi
}
function exec_cmd() {
if [ "${run_type}" == "dry-run" ]; then
echo "dry-run: $@"
elif [ "${run_type}" == "mirror" ]; then
"$@"
else
echo "warning: unknown run type ${run_type}"
exit 2
fi
}
log "run mode: $run_mode"
log "sync $src => $dest"
log "using extra excludes => $EXTRA_EXCLUDES"
log "-----------------------------------------------"
rsync --dry-run --archive --recursive --itemize-changes --delete --delete-excluded --iconv=utf-8 \
--exclude '@eaDir' --exclude 'Thumbs.db' --exclude '*.socket' --exclude 'socket' $EXTRA_EXCLUDES \
"$src" "$dest" | while read -r line ; do
progress "$line"
read -r op file <<< "$line"
debug_log "from $file"
if [ "x$op" == "x*deleting" ]; then
log "removing $dest/$file"
exec_cmd rm -rf "$dest/$file"
else
op1=$(echo $op | cut -b 1-2)
sizeTsState=$(echo $op | cut -b 4-5)
case "$src" in
*/)
src_file="${src}${file}" # src end in slash, $file starts under it
;;
*)
src_file="$(dirname $src)/$file" # $file contains the src itself as root of path
;;
esac
if [ "x$op1" == "xcd" ]; then
debug_log "not eagerly creating $dest/$file"
elif [ "x$op1" == "x>f" ]; then
dest_file="$dest/$file"
dest_dir=$(dirname "${dest_file}")
if [ "x$sizeTsState" == "x.T" ]; then
log "update ${dest_file} timestamp only"
exec_cmd touch -r "${src_file}" "${dest_file}"
elif [ "x$sizeTsState" != "x.." ]; then
if [ ! -d "${dest_dir}" ]; then
exec_cmd sudo -u nobody mkdir -v -m 777 -p "${dest_dir}"
fi
exec_cmd install -o nobody -g users -m 666 -p -D -v "${src_file}" "${dest_file}"
fi
fi
fi
done
Then modify your backup job script to call the bash mirror.sh wrapper instead of rsync directly:
Code: Select all
#!/bin/bash
LogFile=/var/log/array_backup.log
BackupDir=/mnt/disks/Frankenstore
Notify=/usr/local/emhttp/webGui/scripts/notify
echo `date` "Starting Array backup to " $BackupDir >> $LogFile
#Backup 4K via rsync
sleep 2
$Notify -i normal -s "Beginning 4K Backup" -d " 4K Backup started at `date`"
sleep 2
bash mirror.sh -m -s /mnt/user/4K -d $BackupDir >> $LogFile
#rsync -avrtH --delete /mnt/user/4K $BackupDir >> $LogFile
sleep 2
$Notify -i normal -s "Finished 4K Backup" -d " 4K Backup completed at `date`"
sleep 2
$Notify -i normal -s "Beginning Blu-Rays Backup" -d " Blu-Rays Backup started at `date`"
sleep 2
bash mirror.sh -m -s /mnt/user/Blu-Rays -d $BackupDir >> $LogFile
#rsync -avrtH --delete /mnt/user/Blu-Rays $BackupDir >> $LogFile
sleep 2
$Notify -i normal -s "Finished Blu-Rays Backup" -d " Blu-Rays Backup completed at `date`"
sleep 2
$Notify -i normal -s "Beginning DVDs Backup" -d " DVDs Backup started at `date`"
sleep 2
bash mirror.sh -m -s /mnt/user/DVDs -d $BackupDir >> $LogFile
#rsync -avrtH --delete /mnt/user/DVDs $BackupDir >> $LogFile
sleep 2
$Notify -i normal -s "Finished DVDs Backup" -d " DVDs Backup completed at `date`"
sleep 2
$Notify -i normal -s "Beginning TV_Series Backup" -d " TV_Series Backup started at `date`"
sleep 2
bash mirror.sh -m -s /mnt/user/TV_Series -d $BackupDir >> $LogFile
#rsync -avrtH --delete /mnt/user/TV_Series $BackupDir >> $LogFile
sleep 2
$Notify -i normal -s "Finished TV_Series Backup" -d " TV_Series Backup completed at `date`"
## RESTORE
## /usr/bin/rsync -avrtH --delete $BackupDir /mnt/cache/
echo `date` "backup Completed " $BackupDir >> $LogFile
# send notification
sleep 2
$Notify -i normal -s "Array Backup Completed" -d " Array Backup completed at `date`"
Step 8:
Unmount the MergerFS filesystem when you're done by calling the umount command with the previously used MergerFS mount point:
Step 9:
Unmount the individual drives and put the backup drive carrier back into offline storage.