Val:~$ whoami

I am Val Glinskiy, network engineer specializing in data center networks. TIME magazine selected me as Person of the Year in 2006.

Search This Blog

Wednesday, January 06, 2010

Cacti and 95th percentile

I use Cacti to collect traffic data on my routers and I need to know what 95th percentile is. There are quite a ways to get 95th percentile line on Cacti graph. The problem with all those methods is that if time frame of the graph does not coincide with ISP billing period the 95th percentile value on the graph is useless. But all the necessary data is collected by Cacti into RRD file. All we have to do is to extract it.
First, I need to figure out where the RRD file is. In Cacti, go to Console -> Data Sources, select your edge router and click on IPS-facing interface. In "Data Source Path" field you'll see the name of the RRD file in Cacti's rra directory where data for this interface is stored

Second, we need to know what to extract from this file. I.e I need to know the names of RRD data sources:
rrdtool info border_router_1_traffic_in_14839.rrd
where border_router_1_traffic_in_14839.rrd is file name from previous step.

filename = "border_router_1_traffic_in_14839.rrd"
rrd_version = "0003"
step = 300
last_update = 1262806506
ds[traffic_in].type = "COUNTER"
ds[traffic_in].minimal_heartbeat = 600
ds[traffic_in].min = 0.0000000000e+00
ds[traffic_in].max = NaN
ds[traffic_in].last_ds = "437961211333"
ds[traffic_in].value = 4.9711447176e+05
ds[traffic_in].unknown_sec = 0
ds[traffic_out].type = "COUNTER"
ds[traffic_out].minimal_heartbeat = 600
ds[traffic_out].min = 0.0000000000e+00
ds[traffic_out].max = NaN
ds[traffic_out].last_ds = "138465493978"
ds[traffic_out].value = 1.9428099668e+04
ds[traffic_out].unknown_sec = 0


The data sources names are traffic_in and traffic_out and this is what we are going to extract. Before we proceed we need to remember, that RRD database size is fixed and determined at the time of creation. When limit is reached, oldest data is overwritten. To avoid losing any data, I am going to extract traffic numbers every hour for the last hour and put inbound and outbound data in separate files.
Incoming traffic:
rrdtool xport -s now-1h -e now DEF:xx=border_router_1_traffic_in_14839.rrd:traffic_in:AVERAGE CDEF:bb=xx,8,* XPORT:bb:"out bits" | grep \|grep -v Na | awk -F'' '{print $2}'| sed -e 's/<\/v><\/row>//'|sed -e 's/e+0/\t/' >> incoming.txt

Outgoing traffic:
rrdtool xport -s now-1h -e now \DEF:xx=border_router_1_traffic_in_14839.rrd:traffic_out:AVERAGE CDEF:bb=xx,8,* XPORT:bb:"out bits" | grep \|grep -v Na | awk -F'' '{print $2}'| sed -e 's/<\/v><\/row>//'|sed -e 's/e+0/\t/' >> outgoing.txt

Both commands should be in one line. Above I converted Bytes/sec into Bits/sec and removed XML formatting. You need these 2 lines into shell script and run it from cron every hour on 2 minutes after the hour so Cacti has time to finish collecting on top of the hour. You'll get 2 files - incoming.txt and outgoing.txt looking like this

6.9655133612 7
7.0568998690 7
6.9008144000 7
7.0245826541 7
7.2076520540 7
6.7448901179 7
6.7471832197 7
6.7365174531 7
6.9710477122 7
7.1586411237 7
7.0991637699 7
7.0189321194 7

This are measurements taken every 5 minutes by Cacti. "6.9655133612 7" means 6.9655133612 * 10^7 bits/sec or 69655133.612 bits/sec.

Now all you have to do on the 1st of the month right after midnight is to convert the data to get rid of second column, sort it and remove top 5%. For 30-day month:

cat incoming.txt |perl -e ' while(<>) {$input = $_; chomp($input);($traffic, $power)
=split(/\t/,$input); $traffic = $traffic*10**$power; print "$traffic\n";}'|egrep -v '^0$'|sort -n -r | head -433 |tail -1