Val:~$ whoami

I am Val Glinskiy, network engineer specializing in data center networks. TIME magazine selected me as Person of the Year in 2006.

Search This Blog

Friday, May 28, 2010

Cisco 6500/7600 ACL side effect

When you apply ACL to an interface on Cisco 6500 or 7600, it compiles it and puts into TCAM. The way Cisco 7600/6500 does it might have unintended consequences that can leave you open to DDoS attack. Let's consider following example:
We want to allow any server in 172.16.100.0/24 network to initiate any tcp connection and query any DNS server directly. Here is our ACL
ip access-list extended Test1
 permit tcp any any established
 permit udp any eq domain any
 deny  ip any any
 We apply it to internet-facing interface of Cisco7600 router: "ip access-group Test1 in". Now let's look at what actually happened in TCAM:

Cisco7600#show tcam int gi 1/1 acl in ip

* Global Defaults shared


Entries from Bank 0


Entries from Bank 1

    permit       tcp any any fragments
    permit       udp any any fragments
    permit       tcp any any established match-any
    permit       udp any eq domain any
Our router automatically added "permit  udp any any fragments", i.e. it allowed udp fragments. Now, let's see if it actually happens. First, take a look at the compiled ACL again:

Cisco7600#show tcam int gi 1/1 acl in ip

* Global Defaults shared


Entries from Bank 0


Entries from Bank 1

    permit       tcp any any fragments
    permit       udp any any fragments (41 matches)
    permit       tcp any any established match-any (220 matches)
    permit       udp any eq domain any
Not the counter - 41 matches. Next, on the "attacker" we'll generate fragmented UDP traffic targeting a server in 172.16.100.0/24 network:

hping2 -2 -d 1500 -c 1 -s 10000 -p 90 -m 500 -f 172.16.100.10

In the command above, we send 1 1500-byte UDP packet from port 10000 on local host to port 90 on 172.16.100.10 and we are telling the host that MTU is 500 bytes. On the target host we run tcpdump:

10:47:41.942010 IP (tos 0x0, ttl  63, id 130, offset 496, flags [+], length: 520) 172.16.0.101 > 172.16.100.10: udp

10:47:41.942027 IP (tos 0x0, ttl  63, id 130, offset 1000, flags [+], length: 520) 172.16.0.101 > 172.16.100.10: udp

10:47:41.942034 IP (tos 0x0, ttl  63, id 130, offset 1496, flags [none], length: 28) 172.16.0.101 > 172.16.100.10: udp

 Now, the first fragment, containing IP and UDP header were dropped by our ACL, since we do not allow UDP packets coming from port 10000, but 3 other fragments got through. Let's check the counter again:

Cisco7600#show tcam int gi 1/1 acl in ip

* Global Defaults shared


Entries from Bank 0


Entries from Bank 1

    permit       tcp any any fragments
    permit       udp any any fragments (44 matches)
    permit       tcp any any established match-any (224 matches)
    permit       udp any eq domain any
The attacker can flood your web or email server with UDP fragments causing it to slow down while it is busy discarding incomplete packets. We can not block fragments completely since legitimate DNS replies can be quite big and require fragmentation. The solution would be to allow outbound UDP traffic and, hence, incoming replies only to specific hosts that need it. Like your caching DNS server and put good firewall in front of it.

Monday, May 24, 2010

Cisco 7600: Netflow and high CPU utilization

Cisco documentation states, that:
 If NetFlow is configured for version 7, the flow is performed by the Routing Processor, which could cause high CPU utilization.
For troubleshooting high CPU utilization due to Netflow version 7, configure mls nde sender version 5, as the Netflow export is performed by the SP, which is the default for version 5 or version 9.


It turns out, combination of NetFlow version 9 and NDE sender version 7 also creates high CPU load in certain situations. Here is the setup:

Both routers are Cisco 7604. Other than different IP addresses, the only difference between R1 and R2 was this:

on R1:  mls nde sender
on R2:  mls nde sender version 5
Default sender version is 7. Both routers configured with ip flow-export version 9.
When ever R2's eBGP session was interrupted, R1's CPU utilization skyrocketed to 100% and stayed there for 10-15 minutes rendering router unusable. "process cpu threshold" reported that "IP Input" was responsible for CPU load, not "BGP Router" as I expected, since these CPU 
spikes only happened when eBGP session went down. After changing NDE sender version to 5 on R1, the problem went away.

Tuesday, May 11, 2010

Catching high CPU usage

Suddenly your router stops responding and forwarding traffic, you can telnet into it, response on the console is very slow. Few minutes later everything is back to normal and only "show process cpu history" shows that CPU was at 100% for some time, but what caused it remains a mystery. To catch a process(es) that might have contributed to the problem, add following command in global configuration mode:

process cpu threshold type process rising 70 interval 5 falling 30 interval 5

It will generate syslog message every time CPU usage exceeds 70% for 5 or more seconds and falls below 30%. For example:

May 10 23:50:23.146 EDT: %SYS-1-CPURISINGTHRESHOLD: Threshold: Process CPU Utilization(Total/Intr): 74%/26%, Top 3 processes(Pid/Util): 192/46%, 7/1%, 2/0%

Process id 192 contributed 46%. Let's see:

Router#sho proc cpu sor | i ^_192
192   904947881922327784 47 0.00% 0.18% 0.19% 0 IP Input

 It was "IP Input" which is responsible for process-switching IP packets. Now we have something to work with and can start troubleshooting.