Or as I like to call it, how to get a good nights sleep.
Keeping a service running 24/7 is never an easy task, there are always problems. Some are service breaking and must be handled immediately, others not so important and can be left for later. Most problems are reported by users, few are caused by them. The few I’m referring to are caused by users who, for whatever reason (curiosity or ignorance), try to ruin it for everyone else.
Over the past two months we’ve had whole DevBox hosts downed due to a single user’s attempt at UDP-Flooding someone. It simply ate up the hosts bandwidth! Other users were unable to connect to their DevBoxes because there wasn’t any bandwidth left for them. As luck would have it, this would usually happen between midnight and 7:00am (ugh…). What follows is a very simple fix for this.
Limiting DevBoxes bandwidth:
Currently our DevBoxes are built with OpenVZ and what is the recommended way to limit container bandwidth? TC (Traffic Control) of course!
Check out the following links for helpful info: openvz.org, lartc.org, linux-ip.net
After banging my head against a wall trying to get the HTB example from openvz.org to work properly, I finally got the following script to work.
1#!/bin/bash2# Name of the traffic control command. TC=/sbin/tc3start() {4 # Hierarchical Token Bucket (HTB) to shape bandwidth5 tc qdisc add dev eth0 root handle 1:0 htb default 99 tc class add dev eth0 parent 1:0 classid 1:1 htb rate 2000Mbps ceil 2000Mbps tc class add dev eth0 parent 1:1 classid 1:11 htb rate 1500Mbps ceil 1500Mbps prio 2 tc qdisc add dev eth0 parent 1:11 handle 10: sfq perturb 106 tc qdisc add dev venet0 root handle 2:0 htb default 99 tc class add dev venet0 parent 2:0 classid 2:1 htb rate 2000Mbps ceil 2000Mbps tc class add dev venet0 parent 2:1 classid 2:11 htb rate 1500Mbps ceil 1500Mbps prio 2 tc qdisc add dev venet0 parent 2:11 handle 20: sfq perturb 107}8stop() {9 # Stop the bandwidth shaping. $TC qdisc del dev eth0 root $TC qdisc del dev venet0 root10}11restart() {12 # Self-explanatory. stop sleep 1 start13}14show() {15 # Display status of traffic control status. $TC -s qdisc ls dev eth0 $TC -s qdisc ls dev venet016}17case "$1" in18 start)19 echo -n "Starting bandwidth shaping: " start echo "done" ;;20 stop)21 echo -n "Stopping bandwidth shaping: " stop echo "done" ;;22 restart)23 echo -n "Restarting bandwidth shaping: " restart echo "done" ;;24 show)25 echo "Bandwidth shaping status:" show echo "" ;;26 *)27 pwd=$(pwd) echo "Usage: tc.bash {start|stop|restart|show}" ;;28esac29exit 0
Run the script as root or sudo:
1sh /Path to script/tc.sh start
Add script to rc to run on startup:
1sudo vim /etc/rc.d/rc.local2sudo sh /Path to script/tc.sh start
Filter Setup
Now that we have the limit setup, let’s setup the filter (who gets limited), we want to limit the DevBoxes, not the actual host (need bursts of bandwidth for updating, backuping, archiving and so on) so we’ll setup iptables like so: Limit uploads:
1sudo iptables -t mangle -A POSTROUTING -o eth0 -p tcp -s 172.16.0.0/12 -j CLASSIFY --set-class 1:11
Limit downloads:
1sudo iptables -t mangle -A POSTROUTING -o venet0 -p tcp -d 172.16.0.0/12 -j CLASSIFY --set-class 2:11
And we’re done.
Well, maybe not...
UDP Issue
Try as I might, TC and iptables wouldn’t limit bandwidth for UDP, doh!
Wait, why do we need UDP again? DNS aaaand nothing else. OK, let’s play the dropping game.
Create chain to limit UDP to 50 per second (in case someone has a legitimate need for alternate DNS):
1sudo iptables -N udp-flood2sudo iptables -A udp-flood -p udp -m limit --limit 50/s -j RETURN3sudo iptables -A udp-flood -j LOG --log-level 4 --log-prefix 'UDP-flood attempt: '4sudo iptables -A udp-flood -j DROP
Now let’s NOT limit normal DNS requests toward Google, limit other UDP port 53 with above mentioned chain, and just drop everything else:
sudo iptables -A FORWARD -p udp -s 8.8.8.8 --sport 53 -j ACCEPT
1sudo iptables -A FORWARD -p udp -d 8.8.8.8 --dport 53 -j ACCEPT2sudo iptables -A FORWARD -p udp -s 8.8.4.4 --sport 53 -j ACCEPT3sudo iptables -A FORWARD -p udp -d 8.8.4.4 --dport 53 -j ACCEPT4sudo iptables -t filter -A FORWARD -p udp -j DROP
As you can see, we’ve limited the DevBox subnet as a whole to 75% of the available bandwidth. We could have limited each DevBox individually, but with the massive fluctuation in DevBoxes, we’d need a MUCH more complicated and dynamic script.
Most recommend limiting all UDP (including DNS over port 53), but we noticed that caused intermittent DNS issues. Intermittent issues mean hard to diagnose support tickets (greping through logs, lots of back-and-forth with users) so I just asked two critical questions:
Is the problem fixed? Yes.
Does the fix cause new problems? No.
Needless to say, we stopped limiting legitimate DNS traffic. KISS and quit while you're ahead.
That’s one less alert to wake me in the middle of the night :)
Login now and test out our OpenVZ based DevBoxes in Codeanywhere!