Val(config)#

NXOS sandbox

2018-05-03T10:11:00.001-04:00

In my previous post I promised to tell how I setup my own NXOS sandbox to experiment with OpenConfig on NXOS. Cisco's DevNet lab is very useful, but they cut you off after an hour or so and you have to start over. One hour is not nearly enough, so having your own sandbox is big help. You'll need Vagrant and VirtualBox. I used Vagrant 2.0.1 and VirtualBox 5.2.10 on Ubuntu 16.04LTS, but should work on OSX too.
First step - download nxosv-final.7.0.3.I7.3.box from Cisco's website. Add it to the list of your Vagrant boxes:

vagrant box add --name nxos-sdbx nxosv-final.7.0.3.I7.1.box

Clone or download Vagrant file and startup-config from my repo at GitHub.
Create iso file with startup-config (you can find more details here), in the directory where you put Vagrantfile and nxos_config.txt run command:

mkisofs -o nxos-sdbx.iso --iso-level 2 nxos_config.txt

Vagrant will configure VirtualBox to forward sandbox's ssh port to port 5522, http (NX-API) port to 8980 and NETCONF port to 8830.
Now you are ready to power up the switch:

vagrant up

It will take some time to boot, you can monitor boot progress by looking at output of

ncat -U /tmp/nxos-sdbx

Once switch is up (you'll see login prompt) you can login by either running:

vagrant ssh

ssh -p 5522 admin@localhost

In first case you'll be dropped into bash shell, type "su - admin" to get familiar CLI prompt. The password is "admin" (no quotes). In the 2nd case, just type "admin" when prompted.
If you want to test OpenConfig models on NXOS, you also need to download RPM files from Cisco's devhub. The file names are almost self-explanatory and include model's name. For example, mtx-openconfig-if-ip-1.0.0-7.0.3.I7.1.lib32_n9000.rpm contains OpenConfig model for IP interface configuration. The space in bootflash is limited, so if you want to try all available models, install one RPM at a time and delete it.
Copy file to the sandbox:

scp -P 5522 mtx-openconfig-if-ip-1.0.0-7.0.3.I7.1.lib32_n9000.rpm admin@localhost:

Now, log into switch by using one of 2 ways above and run following commands:

run bash

sudo su -

cd /bootflash

yum install -y mtx-openconfig-if-ip-1.0.0-7.0.3.I7.1.lib32_n9000.rpm

now start netconf:

netconfctl start

Now you can use any NETCONF client and push OpenConfig-compliant XML to the sandbox. I wrote my own simple script which you can find here. Standard disclaimer: use at your own risk, no warranty, yadda, yadda, yadda.

YAML, YANG, ETC

2018-04-08T17:39:00.000-04:00

I started reading on YANG models and OpenConfig about 2 years ago. Around that time I also wrote a script to provision Clos IP fabric. The data structure that script uses to populate Jinja2 templates was totally made up. So, I decided to convert that data structure into OpenConfig-compliant one with a distant goal of feeding it to the switches directly via NETCONF, bypassing the templates. Since I have little desire to create XML files manually, the plan was to write data in YAML, convert into JSON with a simple python script and then use pyang to validate data structure and generate model-compliant XML.
Before doing all that I completed Cisco's DevNet "NETCONF/YANG on Nexus" lab parts 1, 2 and 3. Part 3 of the lab uses OpenConfig YANG models, so you can skip part 2 entirely. XML in Cisco's script looked simple enough, but it took me few short days of reading RFCs and data models, trials, errors and reading data models again to come up with YAML file that pyang finally validated and converted into XML. Here is what I got:

---
"openconfig-interfaces:interfaces":
  interface:   
   - name: eth1/1
     config: 
        name: eth1/1
        type: ethernetCsmacd
     subinterfaces:
         subinterface:
            - index: 0
              openconfig-if-ip:ipv4:
                addresses:
                    address:
                       - ip: 172.16.1.0
                         config:
                            ip: 172.16.1.0
                            prefix-length: 31

You can see XML file that pyang generated here. Pyang writes everything in one line, so I edited it for readability. I used another script to push resulted XML to virtual Nexus switch. And it did not work. The error message said that namespace was empty. (Since Cisco gives access to NX-OS sandbox only for one hour, I had to setup my own sandbox. I'll write another post about it). Another few short hours and manual XML editing - nobody should be editing XML manually - I came up with something that my virtual Nexus accepted (github).

<config>
    <interfaces xmlns="http://openconfig.net/yang/interfaces">
        <interface>
            <name>eth1/1</name>
            <config>
                <name>eth1/1</name>
                <description>OpenConfig</description>
                <type>ianaift:ethernetCsmacd</type>
            </config>
            <subinterfaces>
                <subinterface>
                    <index>0</index>
                    <ipv4>
                        <addresses>
                            <address>
                                <ip>172.16.1.0</ip>
                                <config>
                                    <ip>172.16.1.0</ip>
                                    <prefix-length>31</prefix-length>
                                </config>
                            </address>
                        </addresses>
                    </ipv4>
                </subinterface>
            </subinterfaces>
        </interface>
    </interfaces>
</config>

But I had to add "no switchport" command to Eth1/1 manually first. I could not find anything in OpenConfig or IETF models to make switch do it.
Now is time to save my hard-won changes to startup config - see commented line 29 in the script. It's commented, because virtual switch complained that "startup" is incorrect datastore, while RFC6241 says otherwise. It became clear why switch and RFC disagreed after I looked at switch's NETCONF capabilities:

urn:ietf:params:netconf:capability:writable-running:1.0
urn:ietf:params:netconf:capability:rollback-on-error:1.0
urn:ietf:params:netconf:capability:confirmed-commit:1.1
urn:ietf:params:netconf:capability:validate:1.1
http://cisco.com/ns/yang/cisco-nx-os-device?revision=2017-08-31&module=Cisco-NX-OS-device&deviations=Cisco-NX-OS-device-deviations
urn:ietf:params:netconf:base:1.0
urn:ietf:params:netconf:base:1.1
urn:ietf:params:netconf:capability:candidate:1.0
http://openconfig.net/yang/bgp?revision=2016-06-06&module=openconfig-bgp&deviations=openconfig-bgp-deviations
http://openconfig.net/yang/interfaces?revision=2016-05-26&module=openconfig-interfaces&deviations=openconfig-interfaces-deviations
http://openconfig.net/yang/interfaces/ip?revision=2016-05-26&module=openconfig-if-ip&deviations=openconfig-if-ip-deviations

Capability ":startup" is missing. So, no way to save configuration via NETCONF, I guess.
I am going to check the extent of OpenConfig support on Juniper and Arista devices and for now implement 1st step of my plan - convert totally bogus data structure into OpenConfig-compliant one

DIY routing to the host

2018-03-13T12:51:00.000-04:00

Cumulus Networks promotes routing to the host via Host Pack software package as a way to provide host network redundancy without using proprietary MLAG or mostly incompatible EVPN ESI multihoming solutions from switch vendors. While Host Pack seems to be geared towards hosts running Linux containers, it got me thinking how can I do routing to bare metal host. The routing protocol of choice is BGP. Now I need an IP address on the interface that never goes down and make sure that my server and client applications use that IP. That same IP will be advertised via BGP from the host. Loopback interface is obvious choice for this kind of interface.

srv1 and srv2 are Vagrant minimal/xenial64 boxes. srv1, tor1, tor2 and tor3 run BGP, srv2 is connected to 172.16.99.0/24 network hosted on tor3. Let's configure "always-up" IP address on srv1:

sudo ip addr add 100.100.100.100/32 dev lo:100

While binding server application like Apache to specific IP address or interface is pretty straightforward task, selecting source address for outgoing connection is a bit more complicated. Here is how Linux selects source IP address:

The application can request a particular IP ^[20], the kernel will use the src hint from the chosen route path ^[21], or, lacking this hint, the kernel will choose the first address configured on the interface which falls in the same network as the destination address or the nexthop router.

I want it to be transparent for the applications and left on its own, Linux most likely will select IP address of one of the physical interfaces. The only option left is to make sure that route to 172.16.99.0/24 on srv1 is programmed with src 100.100.100.100.
In this lab I am using BIRD 1.6 to run BGP on srv1, but Free Rang Routing will work too.

router id 100.100.100.100;

filter my_vip
{
        if net = 100.100.100.100/32 then accept;
        reject;
}
filter remote_site
{
        if net ~ [ 172.16.99.0/24 ] then
        {
           krt_prefsrc = 100.100.100.100; #set src
           accept;
        }
        reject;
}

protocol kernel {
        scan time 60;
        import none;
        export filter remote_site;
        persist;     # routes stay even if bird is down
        merge paths on; # ECMP
}

protocol device {
        scan time 60;
}

protocol direct {
        interface "enp0s[8|9]", "lo*";
}
protocol bgp host_2rtr1 {
        local as 65499;
        neighbor 192.168.11.3 as 64900;
        export filter my_vip;
        import filter remote_site;
}
protocol bgp host_2rtr2 {
        local as 65499;
        neighbor 192.168.22.3 as 64920;
        export filter my_vip;
        import filter remote_site;
}

Let's see BGP routes we get from tor1 and tor2:
bird> show route 172.16.99.0/24
172.16.99.0/24     via 192.168.11.3 on enp0s8 [host_2rtr1 16:15:49] * (100) [AS65000i]
                   via 192.168.22.3 on enp0s9 [host_2rtr2 16:15:49] (100) [AS65000i]

Only one route is marked as primary, I could not find "bestpath as-path multipath-relax" equivalent in BIRD. It's required because tor1 and tor2 have different AS numbers. But no worries, "merge path on" under "protocol kernel" will take care of this. Indeed:
vagrant@srv1:~$ ip route show 172.16.99.0/24
172.16.99.0/24 proto bird src 100.100.100.100
        nexthop via 192.168.11.3 dev enp0s8 weight 1
        nexthop via 192.168.22.3 dev enp0s9 weight 1
both routes are installed and claim to use 100.100.100.100 as a source IP address to reach srv2 network.
Let's verify. I start pinging srv2 from srv1 and run tcpdump on srv2 side.

vagrant@srv1:~$ ping 172.16.99.200
Here is tcpdump output on srv2:
02:22:09.753864 IP 100.100.100.100 > 172.16.99.200: ICMP echo request, id 5024, seq 1, length 64
02:22:09.753906 IP 172.16.99.200 > 100.100.100.100: ICMP echo reply, id 5024, seq 1, length 64
02:22:10.750884 IP 100.100.100.100 > 172.16.99.200: ICMP echo request, id 5024, seq 2, length 64
02:22:10.750920 IP 172.16.99.200 > 100.100.100.100: ICMP echo reply, id 5024, seq 2, length 64

As you can see, packets are coming from 100.100.100.100, even though I did not specify source IP address for the ping.
Similar test with ssh
vagrant@srv1:~$ ssh 172.16.99.200

02:31:37.652419 IP 100.100.100.100.54634 > 172.16.99.200.ssh: Flags [S], seq 3479345726, win 29200, options [mss 1460,sackOK,TS val 2387063 ecr 0,nop,wscale 7], length 0
02:31:37.652473 IP 172.16.99.200.ssh > 100.100.100.100.54634: Flags [S.], seq 2929355359, ack 3479345727, win 28960, options [mss 1460,sackOK,TS val 2404414 ecr 2387063,nop,wscale 7], length 0
02:31:37.665081 IP 100.100.100.100.54634 > 172.16.99.200.ssh: Flags [.], ack 1, win 229, options [nop,nop,TS val 2387066 ecr 2404414], length 0
02:31:37.666605 IP 100.100.100.100.54634 > 172.16.99.200.ssh: Flags [P.], seq 1:42, ack 1, win 229, options [nop,nop,TS val 2387066 ecr 2404414], length 41
02:31:37.666621 IP 172.16.99.200.ssh > 100.100.100.100.54634: Flags [.], ack 42, win 227, options [nop,nop,TS val 2404418 ecr 2387066], length 0

And last test - failover. Since my lab setup is entirely virtual, the goal was to test if failover works at all and not how fast it does. You need real hardware to check the speed of failover.

vagrant@srv1:~$ iperf -s -B 100.100.100.100

vagrant@srv2:~$ iperf -M 1000 -b 80K -i 1 -c 100.100.100.100 -t 120

In my case traffic took srv2 -> tor3 -> tor1 -> srv1 path. While iperf was running, I shutdown BGP session between tor1 and srv1. Here are the results from iperf:
[ 3] 46.0-47.0 sec   384 KBytes 3.15 Mbits/sec
[ 3] 47.0-48.0 sec   512 KBytes 4.19 Mbits/sec
[ 3] 48.0-49.0 sec   384 KBytes 3.15 Mbits/sec
[ 3] 49.0-50.0 sec 0.00 Bytes 0.00 bits/sec
[ 3] 50.0-51.0 sec 0.00 Bytes 0.00 bits/sec
[ 3] 51.0-52.0 sec 0.00 Bytes 0.00 bits/sec
[ 3] 52.0-53.0 sec   256 KBytes 2.10 Mbits/sec
[ 3] 53.0-54.0 sec   512 KBytes 4.19 Mbits/sec

That 3-second interval of 0.00bits/sec is failover time. Again, since it's virtual environment, your mileage may vary.

Interface uptime time format

2018-03-05T10:47:00.000-05:00

I was looking into why my NAPALM-based script could not validate state of SVI interface on Cisco Nexus and decided to dig into NAPALM source code. I found something amusing in nxos.py module line 230:

def _compute_timestamp(stupid_cisco_output):

The code that follows after that tries to convert Cisco's way of reporting uptime into epoch. I totally understand the frustration. Let's say you want to find out when interface flapped last time. Here are the few examples:

Last link flapped 5d02h
Last link flapped 16week(s) 5day(s)
Last link flapped never
Last link flapped 23:39:41

I understand, that "show" command output is intended for human consumption and it's easy to read. Unfortunately, Cisco provides same kind of time format in XML output, which is supposed to be consumed by some kind of automation. Good luck parsing it. While Arista and Juniper also display interface uptime in similar fashion in plain text output, they do much better job in structured output. Here is JUNOS output in JSON:

"interface-flapped" : [
{
"data" : "2017-09-13 14:39:29 EDT (24w0d 20:45 ago)",
"attributes" : {"junos:seconds" : "14589956"}
}
],

or XML:

<interface-flapped junos:seconds="14590210" > 2017-09-13 14:39:29 EDT (24w0d 20:50 ago)</interface-flapped>

Arista's JSON:

"Ethernet5/1": {
"lastStatusChangeTimestamp": 1519771449.121221,

IP fabric over unnumbered interfaces

2018-02-24T14:08:00.000-05:00

So, you read the industry websites, know that IP fabric is the next best thing in data center networking and decided to take a plunge and build your own. Nothing big to start with: 2 spine and 8 leaf switches. Now you realize that your IPAM system does not have API and you have to assign 16 IP addresses for transit links and 10 IPs for loopback interfaces manually. While not insurmountable task, it's tedious. Fortunately, Cisco's NXOS and Juniper's JUNOS let you configure ip unnumbered ethernet interface and now you need only 10 IPs for loopbacks.
I created a small virtual lab of 3 NXOSv switches - 2 spines and 2 leafs - to test the concept. I could not make BGP work directly over unnumbered interfaces, so I configured OSPF to advertise loopbacks and BGP peering between loopbacks. To simplify configuration even more, I configured dynamic BGP peering on the spines.
Why would you need BGP if you already have OSPF? You might want to run another next best thing - VXLAN with EVPN control plane.
To run the lab you'll need Vagrant, VirtualBox, Linux machine with 32GB of memory and Vagrant package of NXOS from Cisco. You need to have CCO account and may be maintenance contract to download the image. Do not ask me to provide the image. I used NXOSv image nxosv-final.7.0.3.I7.1.box, Vagrant 2.0.1, VirtualBox 5.2.6 and Ubuntu Linux 16.04LTS. Although the lab worked with earlier versions of Vagrant and VirtualBox and should run on any Linux distro.

Clone or download from git configuration files
Run "create_iso.sh" script to build ISO files with configuration for each NXOSv switch
run "vagrant up". Depending on the resources, it might take up 10 minutes for all 4 switches to come up fully.

You can see boot progress by connecting to consoles: "ncat -U /tmp/", where are leaf1, leaf2, spine1 or spine2. After switches are up, you can log in by running "vagrant ssh ". You'll be presented with bash shell, to get to NXOS prompt, type "su - admin", password is admin.

Disclaimer: this is in no way shape or form production-ready configuration and was not tested for any side effects. Use it at your own risk.

Happy Labbing!

Cattle vs Pets rant

2018-01-27T18:56:00.000-05:00

Ok, by now pretty much everybody in IT knows about cattle and pets meme. Your IT infrastructure should be disposable cattle and not precious pets. While I agree that IT infrastructure should be disposable, I have a problem with this specific choice of words. Only people who never set foot in a cattle ranch can say that. Any farm roaming bovine actually brings money, so treating one dead cow like it's nothing to worry about is going to cost you. One sick cow must get as much, if not more, attention than pet cat or your entire operations can be in jeopardy.
Let's come up with different analogy.

It does not take that long

2015-07-13T18:56:00.001-04:00

When some overly enthusiastic SDN neophyte or sysadmin-turned-devops-engineer tells me about how long it used to take to create a VLAN and how quick it can be done with new magic SDN controller or automation tool of the day, I have to assume that this person does not really know what she or he is talking about.
What takes long time is change control procedure and no amount of automation or SDN is going to change it.
Since we are talking about VLANs here, in modern data center network you do not need to create a VLAN on more than one switch.

Manage network devices with Ansible

2014-12-15T22:00:00.003-05:00

Ansible is one of the best technologies we took from buggers after the war. I loved "Ender's game" book, not the movie.

Inspired by excellent posts by Kirk Byers I decided to try Ansible not only to generate configuration for network switches, but to make configuration changes. I have virtual Arista switch running in VirtualBox, so this is where I ran my tests, but it's easy to replicate with Juniper or Cisco Nexus switches. I used user "root", although any user with priviledge level 15 will do.

First, enable root user on Arista switch:

Arista-5#(conf) aaa root secret SecretPassword

Next step is to go to managemnent server and generate ssh key without password. Resulted public key should be added to /root/authorized_keys file on Arista switch.

Now, to Ansible.

My ansible.cfg:

[defaults]

host_key_checking=False

hostfile=/home/user1/ansible/hosts

log_path=~/ansible.log

Let's do very simple task: copy new OS image file and update boot variable. Here is my very simple playbook upgrade.yml:

---
- hosts: arista
remote_user: root
tasks:
- name: Push image
copy: src=/home/user1/Documents/ansible/vEOS-1.swi dest=/mnt/flash/vEOS-1.swi
- name: Change boot variable
command: FastCli -p15 -c "install source vEOS-1.swi now"

Really simple inventory file:

[arista]

arista-5

Let's run it:

It worked, boot variable now points to vEOS-1.swi file.

What happens if you you use RADIUS for authentication and have to enter

password to log into your switch? In this case Ansible uses sshpass which stores your password in the memory. From sshpass man page:

It is close to impossible to securely store the password, and users of sshpass should consider whether ssh's public key authentication provides the same end-user experience, while involving less hassle and being more secure.

If you are willing to take this risk, insert "ask_pass=True" line into your ansible.cfg to be prompted for password or run ansible_playbook command with -k option.

Before you finish your next automation script

2014-11-22T23:03:00.000-05:00

An excellent article from Wall Street Journal about pitfalls of automation:

This philosophy traps people in a vicious cycle of de-skilling. By isolating them from hard work, it dulls their skills and increases the odds that they will make mistakes. When those mistakes happen, designers respond by seeking to further restrict people’s responsibilities—spurring a new round of de-skilling.

Something to consider when you hand over that new automation script to your network operations center.

Al-too-na! Al-too-na!

2014-11-19T21:35:00.000-05:00

There is a lot of press these days about new Facebook datacenter and how it (almost) means death of Cisco-Juniper-Arista. Here are the questions:
Did somebody try to run multicast on white-box switch?
Where do you place Rendezvous Point in this network?
There is only handful of companies with Facebook- or Google-size network, but hundreds of companies requiring multicast support in their data center. Until white box switches learn to run IGMP/PIM/MSDP Cisco and Juniper will have plenty of customers with deep pockets.

Auto Provisioning

2014-06-18T15:54:00.001-04:00

Network automation is hot topic lately and has been my favorite past time for quite a while. While vendors provided tools to make changes on the switches and routers - Cisco Works come to mind - getting initial configuration on the switch was still manual process. Network operators would come up with some kind of template in Excel or text file and then replace host name and management IP address. To get initial configuration on the switch, one would use copy/paste, a process prone to errors, or upload text file to the switch and copy it to startup-config or candidate configuration. Latter process required configured management IP address and default gateway. In addition to that operators had to upgrade/downgrade operating system.
Fast forward to today's era of Chef/Puppet/Ansible/Other appropriated household word. Cisco came up with (and Arista duly replicated) Power-on Auto Provisioning for its Nexus switches, POAP for short. In DHCP option 67 you need to provide Python script name that switch will download and execute. Arista also supports Bash scripts. Cisco even provides a script that can upgrade NX-OS and download corresponding configuration file based on one of the following parameters: switch name, management interface MAC address, serial number or CDP neighbors. You can write your own script to generate configuration on the fly.
Juniper came up with somewhat catchier name - Zero Touch Provisioning. It does not allow you run scripts. Juniper touted Junos as first network operating system that allowed scripts long before Cisco came up with EEM. Also, you have to use DHCP vendor options to encode configuration file name and Junos image file name. So, in your DHCP server you have to put something like this: 0x00306a696e7374616c6c2d7166782d332d31332e325835302d4431352e332d646f6d65737469
632d7369676e65642e74677a0111646e6a722d6c61622d716678312e636667. Not very informative. Juniper promised to implement SLAX script support in near future and abandon DHCP vendor option in favor of option 67. Python support might come later.

Python on Nexus

2014-04-23T12:02:00.002-04:00

If you are relying on handy "cisco" python module on your Nexus 5500 or 6K switches - stop. Cisco made considerable changes in version 7.0(1)N1(1).
Here what it looked like in 6.0(2)N2(3):
switch# python
Python 2.7.2 (default, Nov 27 2012, 17:50:33)
[GCC 4.3.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Loaded cisco NxOS lib!
>>> import cisco
>>> dir(cisco)
['BGPSession', 'BufferDepthMonitor', 'CLI', 'CheckPortDiscards', 'CiscoSecret', 'CiscoSocket', 'Feature', 'History', 'IPv4ACL', 'IPv6ACL', 'Interface', 'Key', 'LineParser', 'MacAddressTable', 'OSPFSession', 'Routes', 'SectionParser', 'VRF', 'Vlan', '__all__', '__builtins__', '__doc__', '__file__', '__name__', '__package__', '__path__', 'acl', 'bfd', 'bgp', 'buffer_depth_monitor', 'check_port_discards', 'cisco_secret', 'cisco_socket', 'cli', 'dhcp', 'eigrp', 'feature', 'get_global_vrf', 'get_valid_port', 'history', 'hsrp', 'interface', 'interface-vlan', 'key', 'lacp', 'line_parser', 'mac_address_table', 'md5sum', 'msdp', 'ospf', 'ospfv3', 'pim', 'private-vlan', 'ptp', 'rip', 'routes', 'scheduler', 'section_parser', 'set_global_vrf', 'show_queues', 'show_run', 'ssh', 'tacacs', 'telnet', 'transfer', 'udld', 'vlan', 'vpc', 'vrf', 'vrrp', 'vtp']
>>>

In 7.0(1)N1(1)
>>> import cisco
>>> dir(cisco)
['__doc__', '__name__', '__package__', 'cli', 'cli_execution_error', 'cli_syntax_error', 'clid', 'clip']

According to Cisco representative "Starting 7.0(1)N1(1), python interpreter on N6K has been modified to look like python interpreter on N7K."
WHY DID NOT THEY DO OTHER WAY AROUND: MODIFY PYTHON ON N7K TO MATCH N6K?

Stop using "ships in the night"

2013-09-10T15:19:00.001-04:00

It seems that every month brings news of yet another overlay network. STT, OTV, VXLAN, NVGRE and may be many more I've not heard of. People often use "ships in the night" words in the same sentence referring to the fact that these overlay network know nothing about underlying physical network infrastructure. However, nowadays even the smallest sea or ocean going vessel not only knows about any ship nearby, but acutely aware of a few satellites thanks to radar and GPS systems.
So, to keep up with pace of time and advances in technologies (after all, we are in tech business) I propose new analogy for overlay and physical networks: polar bear and penguin. Those 2 definitely do not meet in real life. I am not considering corner case like zoo.

Rant

2013-09-04T10:02:00.000-04:00

Any vendor requiring registration in order to access online documentation should be condemned to untangle network cables for the rest of the their product life.

2013-08-07T16:33:00.000-04:00

Very interesting and popular explanation of Shennon limit: Part 1 and Part 2.

LACP timer and what it means

2013-07-19T12:31:00.000-04:00

LACP (IEEE 802.3ad)is protocol used to bundle several physical interfaces to form single logical channel. It has a timer which defines how often devices inter-connected via this bundle exchange LACP PDUs or control messages. Currently, this timer can be set to either "rate fast" - 1 second, or "rate normal" - 30 seconds. What is not always clear is that when you configure "lacp rate " on Cisco or "set interfaces ae1 aggregated-ether-options lacp periodic fast" on Juniper, you do not configure how often this switch will send LACP PDUs. This command means that switch where this command is applied will expect to receive LACP PDUs with this frequency from the partner on the other side of logical channel.
Here is quick test. I have Nexus5500 connected to Cat6500. Let's configure port-channel between them with one physical member interface.

Cat6500#show run interface TenGigabitEthernet 1/5
!
interface TenGigabitEthernet1/5
switchport
switchport trunk encapsulation dot1q
switchport mode trunk
lacp rate fast
channel-group 5 mode active

Nexus5500# show running-config interface Ethernet 1/1
!
interface Ethernet1/1
switchport mode trunk
channel-group 1 mode active

"lacp rate normal" is default setting on Nexus, so this command does not show up in the output, but we can confirm:

Nexus5500# show running-config interface Ethernet 1/1 all | include lacp
lacp port-priority 32768
lacp rate normal

Cat6500 is configured with rate fast and Nexus5500 - with rate normal. Let's see what's going on behind the scene.
On Catalyst:
Cat6500#show lacp internal
Flags: S - Device is requesting Slow LACPDUs
F - Device is requesting Fast LACPDUs
A - Device is in Active mode P - Device is in Passive mode

Channel group 3
LACP port Admin Oper Port Port
Port Flags State Priority Key Key Number State
Te1/5 FA bndl 32768 0x3 0x3 0x106 0x3F

F flags says that Cat6500 requesting fast LACP PDUs from its partner.

On Nexus it's a little bit backwards, the "show" command tells you partner status, not its own.

Nexus5500# show lacp neighbor interface port-channel 1
Flags: S - Device is sending Slow LACPDUs F - Device is sending Fast LACPDUs
A - Device is in Active mode P - Device is in Passive mode
port-channel1 neighbors
Partner's information
Partner Partner Partner
Port System ID Port Number Age Flags
Eth1/1 20,0-13-5f-20-63-80 0x106 910 SA

LACP Partner Partner Partner
Port Priority Oper Key Port State
32768 0x3 0x3f

Nexus5500 says, that its partner - Cat6500 - is sending LACP PDUs every 30 seconds.

Find MAC address for IPv4 Multicast group

2013-07-15T09:11:00.000-04:00

When troubleshooting multicast problem I find myself checking if IGMP snooping works as intended. "show mac address-table multicast" on Cisco switches shows MAC addresses of multicast groups. Tired of converting Multicast IPs into MACs with pencil and paper, I wrote my first ever script in Python which does just that. It takes IP address of multicast group as a parameter. Although I did some testing there might be bugs, so beware.
See Cisco's white paper for explanation how the conversion is done.

The wait is over

2013-07-13T14:06:00.002-04:00

Finally, in NX-OS 6.0(2) for Nexus 5000 platform Cisco implemented "default interface" command which lets you return interface to its factory default configuration. It is very-very-very useful feature in the lab environment when one has to do a lot of re-configuration and something does not work as expected simply because of left-over configuration from the previous test.

This command has been available in IOS since 11.1 and in NX-OS for Nexus 7K since 5.1(1)

Juniper haiku

2013-05-29T10:49:00.000-04:00

No wonder Juniper software is so bloated :).
admin@switch> show version and haiku
Hostname: switch
Model: ex4200-48t
JUNOS Base OS boot [10.4R12.1]
JUNOS Base OS Software Suite [10.4R12.1]
JUNOS Kernel Software Suite [10.4R12.1]
JUNOS Crypto Software Suite [10.4R12.1]
JUNOS Online Documentation [10.4R12.1]
JUNOS Enterprise Software Suite [10.4R12.1]
JUNOS Packet Forwarding Engine Enterprise Software Suite [10.4R12.1]
JUNOS Routing Software Suite [10.4R12.1]
JUNOS Web Management [10.4R12.1]

        Now that Zion's safe,
        can they find a better place
        to get some sweaters?

admin@switch> show version and haiku

        Juniper babies
        The next generation starts
        Gotta get more sleep

admin@switch> show version and haiku

        3am; darkness;
        Maintenance window closing.
        Safety net: rollback.

admin@switch> show version and haiku

        Shiny leather pants
        Why don't they squeak when they kick?
        Is _that_ the secret?

PACL and MAC address learning

2013-04-20T13:25:00.000-04:00

I have unenviable task to drag legacy application to 21st century. I am talking about 80-the legacy and some of its functions do not even use IP protocols. One of the proposed solution included 2 servers with the same

IP and MAC addresses (I know, but splitting network in 2 separate VLANs was not an option) connected to different switches, but in the same VLAN. ClientA and ClientB should be able to talk to each other and server connected to the same switch as client. ServerA and ServerB should not even know about each other's existence, so they won't complain about duplicate IP address. The switches are Cisco 6500s. One of the obvious solutions is to put Port Access Control List on either side of the inter-switch link. PACL successfully blocked the traffic between servers, but switches still learned MAC address of the blocked traffic source and placed it MAC address table. This would cause MAC address flapping on the switch every time clients send ARP query for server's MAC or when both servers need to send traffic to their clients. Why would switch need to keep MAC address of the discarded traffic? Oh, well. Another network mystery.

"private-vlan syncronize" pitfall

2013-04-12T17:44:00.003-04:00

Let's say you need to configure Private VLAN on Cisco Nexus switch with MSTP. You create new VLAN, make it isolated and map it to primary VLAN. Your newly created VLAN is automatically mapped to MST0 and if your primary happened to be in any other MST instance there is a possibility that your primary and secondary VLANs ended up with different L2 paths. As a bonus, you get
annoying message "These secondary vlans are not mapped to the same instance as their primary" every time you run "show spanning-tree configuration". Not a big deal, but things like these irritate me. Not to worry, "private-vlan syncronize" under "spanning-tree mst configuration" will automatically map all secondary VLANs to the same MST instance as primary VLAN. The moment it's done, you MST digest changes and boom, you have brand new MST region and STP convergence on top of it. So, either pick exiting VLAN mapped to the same instance and convert to secondary community or isolated VLAN, or have all your VLANs mapped to MST0 and use other methods to load-share traffic between transit links.
I know what you are thinking. No, I did it in the lab, so you won't have to.

How complex systems fail

2013-03-05T09:09:00.000-05:00

A must read paper for every system architect. Also recording of the talk given at Velocity conference by Richard Cook. The main take-away for me is: don't build reliable system, build resilient system.

Port-security side effect

2013-02-11T12:33:00.001-05:00

I discovered interesting side effect of configuring port-security which can prevent its deployment in certain circumstances. Let's have a look. Here is part of pertaining interface configuration:

switchport port-security maximum 20
switchport port-security
switchport port-security aging time 1440
switchport port-security violation restrict
switchport port-security aging type inactivity

You can find explanations of what each command does here.

switch# show port-security interface gi 0/45
Port Security                : Enabled
Port Status                      : Secure-up
Violation Mode            : Restrict
Aging Time                        : 1440 mins
Aging Type                        : Inactivity
SecureStatic Address Aging : Disabled
Maximum MAC Addresses      : 20
Total MAC Addresses              : 1
Configured MAC Addresses    : 0
Sticky MAC Addresses            : 0
Last Source Address:Vlan : abcd.ef12.3456:1234
Security Violation Count          : 0

In case you were wondering, the MAC address above is completely made up, but it associated with "floating" IP assigned to active device in cluster. Let's see what happens when active IP has to be moved to other device in cluster due to fail-over:

port_security-2-psecure_violation: security violation occurred, caused by mac address abcd.ef12.3456 on port gigabitethernet0/46.
Oops. Even though port-security configuration allows to learn up to 20 MAC addresses and there are no MAC addresses on Gi0/46, we got port-security violation. Why? Let's see debug port-security:

PSECURE: psecure_add_addr_check: Found duplicate mac-address abcd.ef12.3456, It is already secured on Gi0/45
%PORT_SECURITY-2-PSECURE_VIOLATION: Security violation occurred, caused by MAC address abcd.ef12.3456 on port GigabitEthernet0/46.
PSECURE: Security violation, TrapCount:346
PSECURE: Read:2830, Write:2831
PSECURE: swidb = GigabitEthernet0/46 mac_addr = abcd.ef12.3456 vlanid = 1234
PSECURE: Adding abcd.ef12.3456 as dynamic on port Gi0/46 for vlan 1234
PSECURE: Violation/duplicate detected upon receiving abcd.ef12.3456 on vlan 1234: port_num_addrs 1 port_max_addrs 20 vlan_addr_ct 1: vlan_addr_max 20 total_addrs 4: max_total_addrs 6144

Port-security violation happened because MAC address has not been deleted from original port yet, hence "duplicate mac-address" message. To mitigate, but not completely alleviate the problem, we can reduce aging timer to 1 minute minimum. It still means that in case of fail-over, the floating IP address will not be accessible for another minute, which could be 1 minute too long.

Nexus: peer-switch and STP Bridge ID

2012-09-28T10:04:00.001-04:00

With release of NX-OS 5.2, Cisco started supporting peer-switch feature on Nexus 5K. When peer-switch is enabled, both VPC primary and secondary switches originate STP BPDUs on vPC ports and use the same designated bridge ID on vPC ports. This got me wandering what brige ID vPC primary switch uses when peer-switch is not enabled. I set up vPC switch-pair with downstream switch connected via vPC port-channel. The switches are running MST. Here is partial BPDU captured on downstream Nexus switch with command:
ethanalyzer local interface inbound-hi display-filter "stp" limit-captured-frames 20

Spanning Tree Protocol
    Protocol Identifier: Spanning Tree Protocol (0x0000)
    Protocol Version Identifier: Multiple Spanning Tree (3)
    BPDU Type: Rapid/Multiple Spanning Tree (0x02)
    BPDU flags: 0x7c (Agreement, Forwarding, Learning, Port Role: Designated)
    Root Identifier: 8192 / 0 / 54:7f:ee:01:15:81
    Root Path Cost: 0
    Bridge Identifier: 8192 / 0 / 54:7f:ee:01:15:81
    Port identifier: 0x9063
    Message Age: 0
    Max Age: 20
    Hello Time: 2
    Forward Delay: 15
    Version 1 Length: 0
    Version 3 Length: 96
    MST Extension
        MST Config ID format selector: 0
        MST Config name: blp-mst-Region-1
        MST Config revision: 2
        MST Config digest: d7e7e4984e26acd301b955c5289031ad
        CIST Internal Root Path Cost: 0
        CIST Bridge Identifier: 8192 / 0 / 00:23:04:ee:be:01
            CIST Bridge Priority: 8192
            CIST Bridge Identifier System ID Extension: 0
            CIST Bridge Identifier System ID: 00:23:04:ee:be:01
        CIST Remaining hops: 20
        MSTID 1, Regional Root Identifier 8192 / 54:7f:ee:01:15:81
        MSTID 2, Regional Root Identifier 8192 / 54:7f:ee:01:15:81

Note "Bridge Identifier" and "CIST Bridge Identifier". They are different. The former is "vPC local system-mac" and latter is "vPC system-mac". They can be found in "show vpc role" output:

nexus-primary# show vpc role

vPC Role status
----------------------------------------------------
vPC role                        : primary
Dual Active Detection Status    : 0
vPC system-mac                  : 00:23:04:ee:be:01
vPC system-priority             : 32667
vPC local system-mac            : 54:7f:ee:01:15:81
vPC local role-priority         : 8192

Here we can see, that without peer-switch enabled Nexus switch uses 2 different bridge IDs in the same BPDU. Why does it do it? I reached out to Cisco and will update when I hear anything.
When peer-switch is enabled, both vPC primary and secondary switches originate BPDUs on vPC ports and "Bridge Identifier" and "CIST Bridge Identifier" are the same and equal to "vPC system-mac"

IPExpert, it's 2012.

2012-02-21T22:54:00.002-05:00

After reading excellent sample chapter from IPExpert's "IPv4/IPv6 Multicast Operation and Troubleshooting" book I decided to pre-order it. Today I got the pdf file and was very disappointed. The content is still great, but it can only be read on the PC or MAC - no iPad, Kindle or smartphone. Now, my commute is relatively long and I do most of my reading on the bus. Not be able to read documentation on mobile device is major problem for me. If greedy Hollywood studios found a way to provide their content on mobile platforms, so should IPExpert. Especially given the fact that INE provide PDF files DRM-free. This was my first and last purchase of IPExpert product.
Update: FileOpen released iPad/iPhone app, so now I can read that PDF on my iPad.