Monday, June 21, 2010

Monitoring trunk status via SNMP

If you have not guessed yet, SNMP and monitoring are my favorites.

So, you have configured many trunks on your switch and now need to make sure all of them are actually in trunking mode. Here is 2 SNMP OID that can help you:

vlanTrunkPortDynamicState (1.3.6.1.4.1.9.9.46.1.6.1.1.13) - reports administrative state. From Cisco SNMP object navigator:
1 : on
2 : off
3 : desirable
4 : auto
5 : onNoNegotiate


vlanTrunkPortDynamicStatus (1.3.6.1.4.1.9.9.46.1.6.1.1.14) - reports operational state.
1 : trunking
2 : notTrunking


To get data for specific interface you need to add ifIndex to the end of the OID. For example, for interface ifIndex=10147
snmpwalk -v2c -Ov -Oq -c public myswitch  1.3.6.1.4.1.9.9.46.1.6.1.1.13.10147
To get ifIndex, you can either run  "show snmp mib ifmib ifIndex" command in exec mode or query ifName OID with snmpwalk. Here is the quick script:
  for int in ifIndex1 ifIndex2 ifIndexN
        do
        trunkoperstatus=`snmpwalk -v2c -Ov -Oq -c public myswitch \ 1.3.6.1.4.1.9.9.46.1.6.1.1.14.$int`
                if [ $trunkoperstatus -eq 2 ]
                then
                        trunkadminstatus=`snmpwalk -v2c -Ov -Oq -c public myswitch \ 1.3.6.1.4.1.9.9.46.1.6.1.1.13.$int`
                        if [ $trunkadminstatus -eq 1 ]
                        then
                                echo myswitch $int NotTrunking
                        fi
                fi
        done

Friday, May 28, 2010

Cisco 6500/7600 ACL side effect

When you apply ACL to an interface on Cisco 6500 or 7600, it compiles it and puts into TCAM. The way Cisco 7600/6500 does it might have unintended consequences that can leave you open to DDoS attack. Let's consider following example:
We want to allow any server in 172.16.100.0/24 network to initiate any tcp connection and query any DNS server directly. Here is our ACL
ip access-list extended Test1
 permit tcp any any established
 permit udp any eq domain any
 deny  ip any any
 We apply it to internet-facing interface of Cisco7600 router: "ip access-group Test1 in". Now let's look at what actually happened in TCAM:

Cisco7600#show tcam int gi 1/1 acl in ip

* Global Defaults shared


Entries from Bank 0


Entries from Bank 1

    permit       tcp any any fragments
    permit       udp any any fragments
    permit       tcp any any established match-any
    permit       udp any eq domain any
Our router automatically added "permit  udp any any fragments", i.e. it allowed udp fragments. Now, let's see if it actually happens. First, take a look at the compiled ACL again:

Cisco7600#show tcam int gi 1/1 acl in ip

* Global Defaults shared


Entries from Bank 0


Entries from Bank 1

    permit       tcp any any fragments
    permit       udp any any fragments (41 matches)
    permit       tcp any any established match-any (220 matches)
    permit       udp any eq domain any
Not the counter - 41 matches. Next, on the "attacker" we'll generate fragmented UDP traffic targeting a server in 172.16.100.0/24 network:

hping2 -2 -d 1500 -c 1 -s 10000 -p 90 -m 500 -f 172.16.100.10

In the command above, we send 1 1500-byte UDP packet from port 10000 on local host to port 90 on 172.16.100.10 and we are telling the host that MTU is 500 bytes. On the target host we run tcpdump:

10:47:41.942010 IP (tos 0x0, ttl  63, id 130, offset 496, flags [+], length: 520) 172.16.0.101 > 172.16.100.10: udp

10:47:41.942027 IP (tos 0x0, ttl  63, id 130, offset 1000, flags [+], length: 520) 172.16.0.101 > 172.16.100.10: udp

10:47:41.942034 IP (tos 0x0, ttl  63, id 130, offset 1496, flags [none], length: 28) 172.16.0.101 > 172.16.100.10: udp

 Now, the first fragment, containing IP and UDP header were dropped by our ACL, since we do not allow UDP packets coming from port 10000, but 3 other fragments got through. Let's check the counter again:

Cisco7600#show tcam int gi 1/1 acl in ip

* Global Defaults shared


Entries from Bank 0


Entries from Bank 1

    permit       tcp any any fragments
    permit       udp any any fragments (44 matches)
    permit       tcp any any established match-any (224 matches)
    permit       udp any eq domain any
The attacker can flood your web or email server with UDP fragments causing it to slow down while it is busy discarding incomplete packets. We can not block fragments completely since legitimate DNS replies can be quite big and require fragmentation. The solution would be to allow outbound UDP traffic and, hence, incoming replies only to specific hosts that need it. Like your caching DNS server and put good firewall in front of it.

Monday, May 24, 2010

Cisco 7600: Netflow and high CPU utilization

Cisco documentation states, that:
 If NetFlow is configured for version 7, the flow is performed by the Routing Processor, which could cause high CPU utilization.
For troubleshooting high CPU utilization due to Netflow version 7, configure mls nde sender version 5, as the Netflow export is performed by the SP, which is the default for version 5 or version 9.


It turns out, combination of NetFlow version 9 and NDE sender version 7 also creates high CPU load in certain situations. Here is the setup:

Both routers are Cisco 7604. Other than different IP addresses, the only difference between R1 and R2 was this:

on R1:  mls nde sender
on R2:  mls nde sender version 5
Default sender version is 7. Both routers configured with ip flow-export version 9.
When ever R2's eBGP session was interrupted, R1's CPU utilization skyrocketed to 100% and stayed there for 10-15 minutes rendering router unusable. "process cpu threshold" reported that "IP Input" was responsible for CPU load, not "BGP Router" as I expected, since these CPU 
spikes only happened when eBGP session went down. After changing NDE sender version to 5 on R1, the problem went away.

Tuesday, May 11, 2010

Catching high CPU usage

Suddenly your router stops responding and forwarding traffic, you can telnet into it, response on the console is very slow. Few minutes later everything is back to normal and only "show process cpu history" shows that CPU was at 100% for some time, but what caused it remains a mystery. To catch a process(es) that might have contributed to the problem, add following command in global configuration mode:

process cpu threshold type process rising 70 interval 5 falling 30 interval 5

It will generate syslog message every time CPU usage exceeds 70% for 5 or more seconds and falls below 30%. For example:

May 10 23:50:23.146 EDT: %SYS-1-CPURISINGTHRESHOLD: Threshold: Process CPU Utilization(Total/Intr): 74%/26%, Top 3 processes(Pid/Util): 192/46%, 7/1%, 2/0%

Process id 192 contributed 46%. Let's see:

Router#sho proc cpu sor | i ^_192
192   904947881922327784 47 0.00% 0.18% 0.19% 0 IP Input

 It was "IP Input" which is responsible for process-switching IP packets. Now we have something to work with and can start troubleshooting.

Friday, March 26, 2010

Monitoring logs with SEC

Splunk seems to become de-facto standard tool for log management. But free version lacks feature that lets you configure and send alerts whenever certain events occur. One need to pay for enterprise version which starts at $5000 in US and Canada.

So, I use Simple Event Correlator to notify me of interesting events in life of my router friends. Here, for example, sec template to send me email with syslog line in the body when somebody tries to go to configuration mode and execute certain commands:
type=Single
ptype=RegExp
pattern=.*cmd=(configure|clear|ip|no|interface|switchport|router|spanning-tree)
desc=$0
action=pipe '$0' /usr/bin/mail -s "router/switch config change is happening right now" noc@example.com

You need to put this template into SEC configuration file and tell it were to look for these messages:

sec -detach -conf=/etc/sec-tacacs.conf -input=/var/log/tac-plus/account

In this case it's TACACS+ log file, so you need to configure a router to report such activities:

aaa new-model
aaa authentication login default group tacacs+ none
aaa authentication enable default group tacacs+ none
aaa authorization exec default group tacacs+ none
aaa authorization commands 15 default group tacacs+ none
aaa accounting commands 15 default start-stop group tacacs+
tacacs-server host <server ip>
tacacs-server <key>

Here is another template to report all syslog messages coming from devices with loopback interface IP address in the range 10.9.20.0/24 or 10.9.25.0/24. Why loopback? See my previous post.

type=Single
ptype=RegExp
pattern=(.*)10\.9\.2[0|5]\.(.*)%[A-Z]*
desc=$0
action=pipe '$0' /usr/bin/mail -s " router syslog message" noc@example.com

Thursday, March 25, 2010

Best practices. Sort of.

I tend to agree, that there is no "best practices", there are practices that fit best. Here is one of the thing that I always configure on the router.

There are many advantages in configuring Loopback interface when you use dynamic routing, but I also find loopback helpful for syslog reporting and authentication and authorization queries. So, I always configure:

ip tacacs source-interface Loopback0
logging source-interface Loopback0

Next step is to either add loopback interfaces of your routers to DNS or /etc/hosts file on Tacacs and syslog servers.
The names are no good if you can not use them. I prefer syslog-ng for logging, so, in order to record names instead of IP addresses, you need to configure use_dns(yes) in "options" section of syslog-ng.conf. For TACACS+: run tac_plus with "-L" option.

Making same change on many routers

Suppose you need to make the same change on many routers, but do not have fancy software like Cisco Works to help you. No worries. Perl is the best friend of any network and system administrator. Here is the quick script that goes to a router and types command "logging source-interface loopback 0", saves configuration and exit. It can be used to run any command.
Place IP addresses of the routers, one per line, in file routers.txt. This file must be in the same directory as the script. Remember, you put your username, password and enable password in the script in clear text, so do not forget "chmod 700 "

#!/usr/bin/perl
use Net::Telnet::Cisco;
my $myfile="./routers.txt";
open (FH, $myfile) || die "Can not open $myfile\n";

while () {
chomp;
my $switchname=$_;
print "$switchname\n"; 

my $session = Net::Telnet::Cisco->new(Host => $switchname,Input_log => "$switchname.log"); 
# Replace username and password below with real username and password 
$session->login('username', 'password');

# Enable mode
if ($session->enable("enable password") ) { # insert your enable passowrd
    @output = $session->cmd('configure terminal');
    @output = $session->cmd('logging source-interface loopback 0');
    print @output;
    @output = $session->cmd('exit');
    @output = $session->cmd("copy run startup-config\n\n");
    print @output;
    } else {
         warn "Can't enable: " . $session->errmsg;
        }
$session->close;
}
 
Use at your own risk. 

Monday, March 22, 2010

Debian 5.0.4 on Dell 1950

Normally, installing Debian on Dell servers is piece of cake. This particular 1950 came with Broadcom NICs and PERC5 controller. Debian 5.0.4 does not include driver for Broadcom drivers due to some copyright restrictions. However, the driver is available as deb package. Download it and copy to FAT or FAT32 formatted USB drive. When prompted for NIC driver during the installation process, insert USB drive into USB port. As soon as server loads the driver and moves to the next screen in installation process, remove the drive. If you do not remove the USB drive before installation process gets to partitioning, your drive sequence will we out of whack. You'll have to boot from CD and edit /etc/fstab.
Since this server has hardware I wanted to use instead of configuring software RAID in Linux. The question is how to monitor RAID state from Debian. There is no deb package or source code, but LSI provides RPM. I downloaded "MegaCLI - Linux" from "Miscellaneous" section, unpacked it, installed "alien" on Debian (sudo apt-get install aliean) and then "sudo alien -i  MegaCli-1.01-0.i386.rpm". It install MegaCli under /opt/MegaRAID/MegaCli.  Moritz Mertinkat has great emergency cheat sheet for MegaCli usage.