Backups through degraded networks (e.g., with packet loss) and the Internet, where the connection passes through various NATs, firewalls, and routers, tend to significantly affect the performance and resilience of TCP connections. Errors like the following can occur in Bacula.
2023-04-20 21:03:38 ocspbacprdap02-sd JobId 11052: Fatal error: append.c:175 Error reading data header from FD. n=-2 msglen=20 ERR=I/O Error 2023-04-20 21:03:38 ocspbacprdap02-sd JobId 11052: Error: bsock.c:395 Wrote 23 bytes to client:10.16.152.200:9103, but only 0 accepted. #or 02-Aug 09:13 backupserver-dir JobId 110334: Fatal error: Network error with FD during Backup: ERR=Connection reset by peer
The use of the BBR congestion control protocol on the Director, Storage, and File Daemons Linux machines of Bacula significantly improves resilience to these errors. Response time and network performance are also enhanced, as disconnections and packet losses have much less impact on transfer rates.
What is BBR?
BBR is an acronym for “Bottleneck Bandwidth and RTT” (Bottleneck Bandwidth and Round-Trip Time). The BBR congestion control calculates the sending rate based on the estimated delivery rate derived from ACKs.BBR was contributed to the Linux kernel version 4.9 in 2016 by Google.
BBR significantly increased throughput and reduced latency for connections in Google’s internal networks, as well as for google.com and YouTube web servers.
BBR requires only changes on the sender’s side, with no need for changes in the network or on the receiver’s side. Therefore, it can be deployed incrementally on the current Internet or in data centers.
How to Enable BBR
The following shell script should implement BBR.
modprobe tcp_bbr echo "tcp_bbr" > /etc/modules-load.d/bbr.conf echo "net.ipv4.tcp_congestion_control = bbr net.core.default_qdisc = fq" >> /etc/sysctl.conf sudo sysctl -p sysctl net.ipv4.tcp_congestion_control
If the last command displays the BBR protocol on the screen as follows:
root@hfaria-P65:~# sysctl net.ipv4.tcp_congestion_control net.ipv4.tcp_congestion_control = bbr
If another protocol is displayed, restart the server.
How to Test Network Performance?
iperf3 is a utility for conducting network throughput tests.
$ sudo apt-get install -y iperf3 Reading package lists... Done Building dependency tree Reading state information... Done The following additional packages will be installed: libiperf0 libsctp1 Suggested packages: lksctp-tools The following NEW packages will be installed: iperf3 libiperf0 libsctp1 ...
iperf3 can use the -C (or –congestion) option to choose the congestion control algorithm. In our tests, we can specify BBR as follows:
-C, --congestion algo Set the congestion control algorithm (Linux and FreeBSD only). An older --linux-congestion synonym for this flag is accepted but is deprecated. iperf -C bbr -c example.com # replace example.com with your test target
Note:
BBR TCP is only on the sender’s side, so you don’t need to worry if the receiver supports BBR. Note that BBR is much more effective when using FQ (fair queuing) to pace packets to no more than 90% of the line rate.
How Can I Monitor BBR TCP Connections on Linux?
You can use the ss utility (another tool for investigating sockets) to monitor BBR’s state variables, including pacing rate, cwnd, bandwidth estimate, min_rtt estimate, and more.
Example output of ss -tin:
$ ss -tin State Recv-Q Send-Q Local Address:Port Peer Address:Port Process ESTAB 0 36 10.0.0.55:22 123.23.12.98:61030 bbr wscale:6,7 rto:292 rtt:91.891/20.196 ato:40 mss:1448 pmtu:9000 rcvmss:1448 advmss:8948 cwnd:48 bytes_sent:95301 bytes_retrans:136 bytes_acked:95129 bytes_received:20641 segs_out:813 segs_in:1091 data_segs_out:792 data_segs_in:481 bbr:(bw:1911880bps,mrtt:73.825,pacing_gain:2.88672,cwnd_gain:2.88672) send 6050995bps lastsnd:4 lastrcv:8 lastack:8 pacing_rate 5463880bps delivery_rate 1911928bps delivered:791 app_limited busy:44124ms unacked:1 retrans:0/2 dsack_dups:1 rcv_space:56576 rcv_ssthresh:56576 minrtt:73.825
The following fields may appear:
ts show string "ts" if the timestamp option is set sack show string "sack" if the sack option is set ecn show string "ecn" if the explicit congestion notification option is set ecnseen show string "ecnseen" if the saw ecn flag is found in received packets fastopen show string "fastopen" if the fastopen option is set cong_alg the congestion algorithm name, the default congestion algorithm is "cubic" wscale:<snd_wscale>:<rcv_wscale> if window scale option is used, this field shows the send scale factor and receive scale factor rto:<icsk_rto> tcp re-transmission timeout value, the unit is millisecond backoff:<icsk_backoff> used for exponential backoff re-transmission, the actual re-transmission timeout value is icsk_rto << icsk_backoff rtt:<rtt>/<rttvar> rtt is the average round trip time, rttvar is the mean deviation of rtt, their units are mil‐ lisecond ato:<ato> ack timeout, unit is millisecond, used for delay ack mode mss:<mss> max segment size cwnd:<cwnd> congestion window size pmtu:<pmtu> path MTU value ssthresh:<ssthresh> tcp congestion window slow start threshold bytes_acked:<bytes_acked> bytes acked bytes_received:<bytes_received> bytes received segs_out:<segs_out> segments sent out segs_in:<segs_in> segments received send <send_bps>bps egress bps lastsnd:<lastsnd> how long time since the last packet sent, the unit is millisecond lastrcv:<lastrcv> how long time since the last packet received, the unit is millisecond lastack:<lastack> how long time since the last ack received, the unit is millisecond pacing_rate <pacing_rate>bps/<max_pacing_rate>bps the pacing rate and max pacing rate rcv_space:<rcv_space> a helper variable for TCP internal auto tuning socket receive buffer
Examples of TCP Throughput Improvement
From Google
Google Research and YouTube implemented BBR and achieved improvements in TCP performance.
Here are performance result examples to illustrate the difference between BBR and CUBIC:
- Resilience to random loss (e.g., due to shallow buffers): Consider a netperf TCP_STREAM test lasting 30 seconds on a path emulated with a 10 Gbps bottleneck, 100 ms RTT, and 1% packet loss rate. CUBIC achieves 3.27 Mbps, while BBR reaches 9150 Mbps (2798 times higher).
- Low latency with common inflated buffers on last-mile links today: Consider a netperf TCP_STREAM test lasting 120 seconds on a path emulated with a 10 Mbps bottleneck, 40 ms RTT, and a buffer of 1000 packets. Both fully utilize the bottleneck bandwidth, but BBR can do so with an average RTT 25 times lower (43 ms instead of 1.09 seconds).
From AWS CloudFront
During March and April 2019, AWS CloudFront implemented BBR. According to AWS’s blog: ‘BBR TCP Congestion Control with Amazon CloudFront
BBR usage on CloudFront has been globally favorable, with performance gains of up to 22% improvement in aggregate throughput across various networks and regions.
From Shadowsocks
I have a Shadowsocks server running on a Raspberry Pi. Without BBR, the client’s download speed is about 450 KB/s. With BBR, the client’s download speed improves to 3.6 MB/s, which is 8 times faster than the default.
BBR v2
There is ongoing work on BBR v2, which is still in the alpha phase.
Troubleshooting
sysctl: setting key ‘net.core.default_qdisc’: No such file or directory
sysctl: setting key "net.core.default_qdisc": No such file or directory
The reason is that the tcp_bbr kernel module has not been loaded yet. To load tcp_bbr, execute the following command:
sudo modprobe tcp_bbr
To check if tcp_bbr is loaded, use lsmod. For example, in the following command, you should see the tcp_bbr line:
$ lsmod | grep tcp_bbr tcp_bbr 20480 3
“If the sudo modprobe tcp_bbr command doesn’t work, restart the system.
Reference
Disponível em: Português (Portuguese (Brazil))EnglishEspañol (Spanish)