Script Backup Bacula bpipe Hadoop HDFS

O plugin universal bpipe permite que Bacula receba qualquer fluxo de dados da saída padrão para seu armazenamento de backup, incluindo arquivos do cluster Hadoop HDFS com o máximo desempenho.

#!/bin/bash
#
# This script provides hdfs file copies do Bacula bpipe plugin (FIFO) using multiple hdfs cat commands when backing up and multiple put commands to restore.
# Next backups will only copy changed files from hdfs after last backup recorded time (/etc/last_backup).
# 
# Remark: hdfs /tmp and .tmp. folders are excluded by the grep -v.
#
# By Heitor Faria (http://bacula.us | http://www.bacula.com.br);
# Marco Reis; 
# Julio Neves (http://livrate.com.br) and
# Rodrigo Hagstrom
#
# Tested with Hadoop 2.7.1; August, 2017.
#
# It must be called at the FileSet INCLUDE Sub-resource, used by the job that 
# backups a Hadoop node with a Bacula Client, like this (e.g.):
#
# Plugin = "\|/etc/script_hadoop.sh"

hdfs="/etc/hadoop/bin/hdfs"

if [[ ! -p /etc/last_backup ]]; then
echo "00-00-00;00:00" > /etc/last_backup
fi

Date=$(cat /etc/last_backup | cut -f 1 -d ";")
Hour=$(cat /etc/last_backup | cut -f 2 -d ";")

for filename in $($hdfs dfs -ls -R / | awk -v date="$Date" '$6>=date && $2!="-" {print $7 " " $8}' | awk -v hour="$Hour" '$1>=hour {print $2}' |grep -v -e /tmp/ -e .tmp.)
do
echo "bpipe:/var$filename:$hdfs dfs -cat $filename:$hdfs dfs -put -f /dev/stdin $filename"
done

date '+%Y-%m-%d;%H:%M' > /etc/last_backup

 

Disponível em: pt-brPortuguêsenEnglish (Inglês)esEspañol (Espanhol)

Deixe uma resposta