Val:~$ whoami

I am Val Glinskiy, network engineer specializing in data center networks. TIME magazine selected me as Person of the Year in 2006.

Search This Blog

Wednesday, October 14, 2009

How to generate lots of BGP routes

I needed to test in the lab whether my Cisco router can handle more than 300K routes - size of current full BGP table. Now, Cisco router can only accept 200 network statements under router bgp configuration, so I would need 1500 routers. Even if I had that many routers to my disposal, it would have taken days to configure all of them. As always, open source software can help. Quagga lets you run OSPF, BGP, RIP, RIPng on Linux and Solaris. If you go with all default options, it is very easy to install. Download and unpack. Go to quagga directory, in my case it was quagga-0.98.6, type
./configure
make
sudo make install
That's it. By default, it went into /usr/local/. I have Debian 4 (Etch) with 2.6.8 kernel and a lot of development packages installed. The only thing I had to do was to add /usr/local/lib to /etc/ld.so.conf file and run /sbin/ldconfig.
Here is the setup


Now I need a valid configuration file for Quagga BGP. Adding 300000 network statements manually is not something system administrators do on Linux. Hence, here is the script
#!/usr/bin/perl

my $host="quagga-host";         #quagga router name
my $logpass="zebra";            #login password
my $enable="zebra";             #enable password
my $myasn="65099";              #local AS number
my $router_id="172.31.2.2";     #bgp router-id
my $remote_as="65001";          #remote-as number
my $remote_ip="172.31.2.1";     #BGP neighbor ip address
my $route_count=0;
my $max_routes=300000;              #max number of routers to generate

open (BGPCONF,'>bgpd.conf')|| die "Can not open bgpd.conf for writing";
print BGPCONF "hostname $host\npassword $logpass\nenable password $enable\nline vty \n";
print BGPCONF "router bgp $myasn\n  bgp router-id $router_id\n  neighbor $remote_ip remote-as $remote_as\n";
MAXR: while ($route_count <= $max_routes ) { 
$octet1=int(rand(223))+1; #generate 1st octet randomly in 1-223 range, 224 and up is multicust and class E  
if ($octet1 ==127) {next;} #need to make sure that 127.X.X.0/24 is excluded 
$octet2=0;  
while ( $octet2 <= 255 ){
$octet3=0;
while ( $octet3 <= 255 ) {
print BGPCONF "  network $octet1\.$octet2\.$octet3\.0/24\n";
$octet3++;
$route_count++;
if ($route_count == $max_routes) {last MAXR;}
}
$octet2++;
}
}
close BGPCONF;
this script will generate bgpd.conf for Quagga. Since it is lab environment not connected to any real network, I do not really care about zebra configuration or restricting access to Quagga BGP console. Copy bgpd.conf file into /usr/local/etc and run /usr/local/sbin/bgpd -d -f /usr/local/etc/bgpd.conf -u root -g root Again, this is not production environment. Do not run Quagga as root in production. Here is relevant configuration from Cisco router:
interface GigabitEthernet0/0
ip address 172.31.2.1 255.255.255.0
network 172.31.2.0 mask 255.255.255.0
media-type rj45
negotiation auto
!
router bgp 65001
no synchronization
bgp log-neighbor-changes
neighbor 172.31.2.2 remote-as 65099
no auto-summary
!
Let's see if it works. On Linux host:
sh-2.05b$ telnet localhost 2605
Trying 127.0.0.1...
Connected to localhost.localdomain.
Escape character is '^]'.

Hello, this is Quagga (version 0.98.6).
Copyright 1996-2005 Kunihiro Ishiguro, et al.


User Access Verification

Password:
quagga-host> sho ip bgp summ
BGP router identifier 172.31.2.2, local AS number 65099
2 BGP AS-PATH entries
0 BGP community entries

Neighbor        V    AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
172.31.2.1      4 65001     283     606        0    0    0 03:41:55        1

Total number of neighbors 1
quagga-host>
on Cisco:
R1#sho ip bgp sum
BGP router identifier 192.0.2.2, local AS number 65001
BGP table version is 330002, main routing table version 330002
310001 network entries using 40920132 bytes of memory
310001 path entries using 16120052 bytes of memory
3/2 BGP path/bestpath attribute entries using 444 bytes of memory
1 BGP AS-PATH entries using 24 bytes of memory
0 BGP route-map cache entries using 0 bytes of memory
0 BGP filter-list cache entries using 0 bytes of memory
Bitfield cache entries: current 1 (at peak 2) using 32 bytes of memory
BGP using 57040684 total bytes of memory
BGP activity 310001/0 prefixes, 320001/10000 paths, scan interval 60 secs

Neighbor        V    AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
172.31.2.2      4 65099     602     322   330002    0    0 03:38:59   310000
Yep. It works.

Friday, October 09, 2009

BGP and BFD (Bidirectional Forwading Detection)

If you have Ethernet uplink to your ISP, the chances are high that it looks like this:



The Layer2 device could be "on the wire" provider or Ethernet over MPLS service. The problem arises when, for example, connection between Layer2 device and ISP router goes down



The BGP session with R1 on ISP router will reset immediately, unless you configured "no bgp fast-external-fallover". But R1 will rely on BGP hello messages to detect if neighbor still alive. It might take R1 up to 3 minutes detect that ISP is not available and for these 3 minutes R1 will be sending traffic to black hole instead of re-converging and sending traffic to your backup link. You have backup link, don't you?
Here are the syslog messages from R1 and ISP routers. To imitate link failure I shutdown interface on ISP

router ISP:
Oct 9 17:55:18 UTC: %LINK-5-CHANGED: Interface GigabitEthernet2/5, changed state to administratively down
Oct 9 17:55:18 UTC: %BGP-5-ADJCHANGE: neighbor 172.31.255.1 Down Interface flap


router R1:
Oct 9 17:57:24 UTC: %BGP-5-ADJCHANGE: neighbor 172.31.255.2 Down BGP Notification sent
Oct 9 17:57:24 UTC: %BGP-3-NOTIFICATION: sent to neighbor 172.31.255.2 4/0 (hold time expired) 0 bytes

Note the timestamps of first and last messages. That's not good, especially if every minute of downtime costs you a bundle. You can adjust bgp timers, but lowest you can go is 1 second and it could be hard on CPU.
BFD protocol allows you to go to microseconds level. It is very lightweight and easy to configure.
On interfaces facing Layer2 device apply command:

bfd interval 100 min_rx 100 multiplier 3
To check if BFD is configured properly:
#sho bfd neighbor

OurAddr NeighAddr LD/RD RH/RS Holddown(mult) State Int
172.31.255.2 172.31.255.1 3/7 Up 0 (3 ) Up Gi2/5

under "router bgp" configuration:

neighbor [neighbor IP] fall-over bfd

Now let's imitate link failure again. Shutdown interface on router ISP:

Oct 9 18:14:27.408 UTC: %LINK-5-CHANGED: Interface GigabitEthernet2/5, changed state to administratively down
Oct 9 18:14:27.408 UTC: %BGP-5-ADJCHANGE: neighbor 172.31.255.1 Down Interface flap


On R1

Oct 9 18:14:27.673 UTC: %BGP-5-ADJCHANGE: neighbor 172.31.255.2 Down BFD adjacency down


The difference now in milliseconds. The hard part is to convince your ISP to configure BFD on their side.
At this moment Cisco supports BFD on Ethernet interfaces only and only for directly connected BGP peers, i.e. no multi-hop BGP.
BFD requires UDP ports 3784 and 3785 to be open in case you have ACL applied to your uplink interface.

Tuesday, October 06, 2009

CCIE R&S v4.0

I just attended webcast about new CCIE R&S written and lab exam. It looks like MPLS portion of the exam is going to be much easier than 642-611 exam which is required for CCIP certification.