Preliminar remarks:
- This feature is available now for both Bacula Community (9.0.8 or greater) and Enterprise.
- Bacula software compression should not be enabled with Aligned format, resulting in poor dedup performance.
- You will need a small SSD area to store the dedup index engine.
- In this method or Bacula will create distinct volumes to contain the metadata of the files copied from the backup and another one to the data itself.
Data deduplication is a dictionary based data reduction approach, due to its ability to effectively reduce backup storage or archiving datasets size by a factor of 4-40X. It is becoming an essential backup system component because it reduces storage space requirements and also lso a critical one, since the performance of all the backup operation depends on storage throughput.
According to Figure 1, the new Aligned Format proves to be a good storage cost reducing new Bacula Community feature, and to be much more efficient than ZBackup (alternate tar dedup software) in terms of backup and restore speeds. There is a minor impact in backup and restore duration, but it is an acceptable trade-off.
Figure1 – Old Community version without Aligned Volumes versus New Aligned format (AUTORSHIP OF THIS PICTURE IS FROM HEITOR FARIA).
More than ever, disk backups are becoming a feasible replacement for tape libraries, since deduplication is not a feature that can currently be efficiently deployed on the sequential magnetic tapes. Only disks have this advantage.
1. ZFS FileSystem
Currently, there are several deduplication file systems nowadays, such as lessfs, opendedup, ZFS and others.Hardware with deduplication capabilities can also be used with Bacula new Aligned Format. Here, we are deploying ZFS, and then Ddumbfs as an alternative.
a) RedHat/CentOS Install (https://github.com/zfsonlinux/zfs/wiki/RHEL-and-CentOS):
yum install http://download.zfsonlinux.org/epel/zfs-release.el7_5.noarch.rpm echo " [zfs-kmod] name=ZFS on Linux for EL7 - kmod baseurl=http://download.zfsonlinux.org/epel/7.5/kmod/$basearch/ enabled=1 metadata_expire=7d gpgcheck=1 gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-zfsonlinux" > /etc/yum.repos.d/zfs.repo yum install zfs modprobe zfs
b) Debian/Ubuntu Install:
sudo -i apt-get -y install zfsutils-linux
Initializing the ZFS
The ZFS initialization will require one or more physical disks. In the example bellow, /zfs/mnt should be the configured bacula-sd.conf path on ArchiveDevice directives. Compression might also be enabled.
sudo zpool create -f zfs /dev/sdb zfs create zfs/mnt zpool status zfs df -h zfs set dedup=on zfs/mnt zfs set compression=on zfs/mnt chown bacula /zfs/mnt
Reference:
- RedHat/CentOS: https://github.com/zfsonlinux/zfs/wiki/RHEL-%26-CentOS
- Debian/Ubuntu: https://wiki.ubuntu.com/Kernel/Reference/ZFS
2. The Dedup FileSystem (ALTERNATIVE)
Ddumbfs was chosen for this laboratory for being both open source and focused on faster operations thanks to its very simple index design, which is very important for shorter backup windows.
2.1 Installing ddumbfs Dependencies
To compile ddumbfs you need as usual: make and gcc, the headers for fuse and mhash library and pkg-config.
Here are the corresponding package for RedHat and Debian based distributions (some of them need to be built from source):
- RedHat/CentOS: fuse fuse-libs mhash fuse-devel mhash-devel pkgconfig gcc make
- Debian/Ubuntu: libfuse2 libmhash2 libfuse-dev libmhash-dev pkg-config fuse-utils build-essential
a) RedHat/CentOS Packages:
sudo -i yum -y install epel-release.noarch yum -y install fuse fuse-libs mhash fuse-devel mhash-devel pkgconfig gcc make automake
b) Debian/Ubuntu Packages:
sudo -i apt-get -y install fuse libfuse2 libmhash2 libfuse-dev libmhash-dev pkg-config build-essential autotools-dev
2.2 Building Ddumbfs from source
wget -qO- http://www.magiksys.net/download/ddumbfs/ddumbfs-1.1.tar.gz | tar -xzvf - -C /usr/src cd /usr/src/ddumbfs-* ./configure make make install
2.3 Initializing Ddumbfs
Create two directories. First one should be a SSD mounting point to host the ddumbfs index engine. Second one should be a mounting point where your Bacula Storage Volumes will be written, typically a large disk array.
mkdir /mnt/ddumbfs.data mkdir /mnt/ddumbfs.mnt
Initialize the deduplication engine. In this example a 999G volume is created, so change it to the desired size that fits your disk:
mkddumbfs -B 128k -s 999G /mnt/ddumbfs.data ddumbfs $TARGET -o parent=/mnt/ddumbfs.mnt
Add a new line like this to /etc/fstab, to make ddumbfs persistent after boot:
-oparent=/mnt/ddumbfs.data /mnt/ddumbfs.mnt fuse.ddumbfs defaults 0 0
Restart machine to make sure ddumbfs is always mounted at boot time.
3. Bacula Aligned Volumes Configuration
You need to install the Algined Drivers package, available through bacula.org’s personal package repository (Bacula Binary Package Download, requires registration).
yum install bacula-aligned.x86_64
Restart the Storage Daemon to apply the changes.
This is an example of bacula-sd.conf new device. Device Type must be aligned; Maximum Concurrent Jobs should always be 1; block size values can vary according to the used deduplication FileSystem:
Device { Name = Aligned-Disk Device Type = Aligned # Must be aligned Media Type = File1 Archive Device = /zfs/mnt # Or /mnt/ddumbfs.mnt if ddumfs mounting point. LabelMedia = yes; Random Access = Yes; AutomaticMount = yes; RemovableMedia = no; AlwaysOpen = no; Maximum Concurrent Jobs = 1 # Always 1 for Aligned Minimum Block Size=0K Maximum Block Size=128K File Alignment=128K Padding Size=512 Minimum Aligned Size=4096 }
Detailed information:
For the filesystems ZFS, lessfs, and ddumbfs, the following values produce excellent results:
Block Size=128K
File Alignment=128K
Padding Size=512
Minimum Aligned Size=4096For NetApp filesystems, the following are preferable:
Block Size=64K
File Alignment=4K
Padding Size=4K
Minimum Aligned Size=4KWhere the values are shown at right after the equal sign, and the K means to multiply by 1024 bytes.
Block Size is the size of blocks to be written into the Aligned Volume.
File Alignment is the alignment of the first block of each original file stored in the Aligned Volume.
Padding Size is the alignment to which the last block of an original file is filled with zeros if it is not full.
Minimum Aligned Size is the file size below which the file will be placed in the Metadata Volume rather than the Aligned Volume.[Ref.: Sibbald, Kern – https://www.google.com/patents/US20160055169]
At least, just attach the created bacula-sd Device to your Director. Edit your bacula-dir.conf:
Storage { Name = Disk-Backup Address = hfaria-desk-i5 SDPort = 9103 Password = "5PWzqJzEokv3z9U_NwBd6bJ30ib1x4TMW" Device = Aligned-Disk Media Type = File1 }
Run a few full backup jobs. After the first full job, next ones should barely increase deduplicated storage size. The command will display the occupied data:
df -h
And the list jobs command from bconsole will display the size the backup jobs were supposed to occupy.
Enjoy!
Disponível em: Português (Portuguese (Brazil))EnglishEspañol (Spanish)