Val:~$ whoami

I am Val Glinskiy, network engineer specializing in data center networks. TIME magazine selected me as Person of the Year in 2006.

Search This Blog

Sunday, April 08, 2018

YAML, YANG, ETC

    I started reading on YANG models and OpenConfig about 2 years ago. Around that time I also wrote a script to provision Clos IP fabric. The data structure that script uses to populate Jinja2 templates was totally made up. So, I decided to convert that data structure into OpenConfig-compliant one with a distant goal of feeding it to the switches directly via NETCONF, bypassing the templates. Since I have little desire to create XML files manually, the plan was to write data in YAML, convert into JSON with a simple python script and then use pyang to validate data structure and generate model-compliant XML.
   Before doing all that I completed Cisco's DevNet "NETCONF/YANG on Nexus" lab parts 1, 2 and 3.  Part 3 of the lab uses OpenConfig YANG models, so you can skip part 2 entirely.  XML in Cisco's script looked simple enough, but it took me couple of short days of reading RFCs and data models,   trials, errors and reading data models again to come up with YAML file that pyang finally validated and converted into XML. Here is what I got:
---
"openconfig-interfaces:interfaces":
  interface:   
   - name: eth1/1
     config: 
        name: eth1/1
        type: ethernetCsmacd
     subinterfaces:
         subinterface:
            - index: 0
              openconfig-if-ip:ipv4:
                addresses:
                    address:
                       - ip: 172.16.1.0
                         config:
                            ip: 172.16.1.0
                            prefix-length: 31
You can see XML file that pyang generated here. Pyang writes everything in one line, so I edited it for readability. I used another script to push resulted XML to virtual Nexus switch. And it did not work. The error message said that namespace was empty. (Since Cisco gives access to NX-OS sandbox only for one hour, I had to setup my own sandbox. I'll write another post about it). Another few short hours and manual XML editing - nobody should be editing XML manually - I came up with something that my virtual Nexus accepted (github).
<config>
    <interfaces xmlns="http://openconfig.net/yang/interfaces">
        <interface>
            <name>eth1/1</name>
            <config>
                <name>eth1/1</name>
                <description>OpenConfig</description>
                <type>ianaift:ethernetCsmacd</type>
            </config>
            <subinterfaces>
                <subinterface>
                    <index>0</index>
                    <ipv4>
                        <addresses>
                            <address>
                                <ip>172.16.1.0</ip>
                                <config>
                                    <ip>172.16.1.0</ip>
                                    <prefix-length>31</prefix-length>
                                </config>
                            </address>
                        </addresses>
                    </ipv4>
                </subinterface>
            </subinterfaces>
        </interface>
    </interfaces>
</config>

But I had to add "no switchport" command to Eth1/1 manually first. I could not find anything in OpenConfig or IETF models to make switch do it.
Now is time to save my hard-won changes to startup config - see commented line 29 in the script. It's commented, because virtual switch complained that "startup" is incorrect datastore, while RFC6241 says otherwise. It became clear why switch and RFC disagreed after I looked at switch's NETCONF capabilities:
urn:ietf:params:netconf:capability:writable-running:1.0
urn:ietf:params:netconf:capability:rollback-on-error:1.0
urn:ietf:params:netconf:capability:confirmed-commit:1.1
urn:ietf:params:netconf:capability:validate:1.1
http://cisco.com/ns/yang/cisco-nx-os-device?revision=2017-08-31&module=Cisco-NX-OS-device&deviations=Cisco-NX-OS-device-deviations
urn:ietf:params:netconf:base:1.0
urn:ietf:params:netconf:base:1.1
urn:ietf:params:netconf:capability:candidate:1.0
http://openconfig.net/yang/bgp?revision=2016-06-06&module=openconfig-bgp&deviations=openconfig-bgp-deviations
http://openconfig.net/yang/interfaces?revision=2016-05-26&module=openconfig-interfaces&deviations=openconfig-interfaces-deviations
http://openconfig.net/yang/interfaces/ip?revision=2016-05-26&module=openconfig-if-ip&deviations=openconfig-if-ip-deviations

Capability ":startup" is missing. So, no way to save configuration via NETCONF, I guess.
I am going to check the extent of OpenConfig support on Juniper and Arista devices and for now implement 1st step of my plan - convert totally bogus data structure into OpenConfig-compliant.

Tuesday, March 13, 2018

DIY routing to the host

   Cumulus Networks promotes routing to the host via Host Pack software package as a way to provide host network redundancy without using proprietary MLAG or mostly incompatible EVPN ESI multihoming solutions from switch vendors. While Host Pack seems to be geared towards hosts running Linux containers, it got me thinking how can I do routing to bare metal host. The routing protocol of choice is BGP. Now I need an IP address on the interface that never goes down and make sure that my server and client applications use that IP. That same IP will be advertised via BGP from the host. Loopback interface is obvious choice for this kind of interface.
   srv1 and srv2 are Vagrant minimal/xenial64 boxes. srv1, tor1, tor2 and tor3 run BGP, srv2 is connected to 172.16.99.0/24 network hosted on tor3. Let's configure "always-up" IP address on srv1:
sudo ip addr add 100.100.100.100/32 dev lo:100
    While binding server application like Apache to specific IP address or interface is pretty straightforward task, selecting source address for outgoing connection is a bit more complicated. Here is how Linux selects source IP address:
The application can request a particular IP [20], the kernel will use the src hint from the chosen route path [21], or, lacking this hint, the kernel will choose the first address configured on the interface which falls in the same network as the destination address or the nexthop router.
I want it to be transparent for the applications and left on its own, Linux most likely will select IP address of one of the physical interfaces. The only option left is to make sure that route to 172.16.99.0/24 on srv1 is programmed with src 100.100.100.100.
In this lab I am using BIRD 1.6 to run BGP on srv1, but Free Rang Routing will work too.

router id 100.100.100.100;

filter my_vip
{
        if net = 100.100.100.100/32 then accept;
        reject;
}
filter remote_site
{
        if net ~ [ 172.16.99.0/24 ] then
        {
           krt_prefsrc = 100.100.100.100; #set src
           accept;
        }
        reject;
}

protocol kernel {
        scan time 60;
        import none;
        export filter remote_site;
        persist;     # routes stay even if bird is down
        merge paths on;  # ECMP
}

protocol device {
        scan time 60;
}

protocol direct {
        interface "enp0s[8|9]", "lo*";
}
protocol bgp host_2rtr1 {
        local as 65499;
        neighbor 192.168.11.3 as 64900;
        export filter my_vip;
        import filter remote_site;
}
protocol bgp host_2rtr2 {
        local as 65499;
        neighbor 192.168.22.3 as 64920;
        export filter my_vip;
        import filter remote_site;
}

Let's see BGP routes we get from tor1 and tor2:
bird> show route 172.16.99.0/24
172.16.99.0/24     via 192.168.11.3 on enp0s8 [host_2rtr1 16:15:49] * (100) [AS65000i]
                   via 192.168.22.3 on enp0s9 [host_2rtr2 16:15:49] (100) [AS65000i]

Only one route is marked as primary, I could not find "bestpath as-path multipath-relax" equivalent in BIRD. It's required because tor1 and tor2 have different AS numbers. But no worries, "merge path on" under "protocol kernel" will take care of this. Indeed:
vagrant@srv1:~$ ip route show 172.16.99.0/24         
172.16.99.0/24  proto bird  src 100.100.100.100       
        nexthop via 192.168.11.3  dev enp0s8 weight 1 
        nexthop via 192.168.22.3  dev enp0s9 weight 1
both routes are installed and claim to use 100.100.100.100 as a source IP address to reach srv2 network.
Let's verify. I start pinging srv2 from srv1 and run tcpdump on srv2 side.



vagrant@srv1:~$ ping 172.16.99.200
Here is tcpdump output on srv2:
02:22:09.753864 IP 100.100.100.100 > 172.16.99.200: ICMP echo request, id 5024, seq 1, length 64
02:22:09.753906 IP 172.16.99.200 > 100.100.100.100: ICMP echo reply, id 5024, seq 1, length 64
02:22:10.750884 IP 100.100.100.100 > 172.16.99.200: ICMP echo request, id 5024, seq 2, length 64
02:22:10.750920 IP 172.16.99.200 > 100.100.100.100: ICMP echo reply, id 5024, seq 2, length 64


As you can see, packets are coming from 100.100.100.100, even though I did not specify source IP address for the ping.
Similar test with ssh
vagrant@srv1:~$ ssh 172.16.99.200

02:31:37.652419 IP 100.100.100.100.54634 > 172.16.99.200.ssh: Flags [S], seq 3479345726, win 29200, options [mss 1460,sackOK,TS val 2387063 ecr 0,nop,wscale 7], length 0
02:31:37.652473 IP 172.16.99.200.ssh > 100.100.100.100.54634: Flags [S.], seq 2929355359, ack 3479345727, win 28960, options [mss 1460,sackOK,TS val 2404414 ecr 2387063,nop,wscale 7], length 0
02:31:37.665081 IP 100.100.100.100.54634 > 172.16.99.200.ssh: Flags [.], ack 1, win 229, options [nop,nop,TS val 2387066 ecr 2404414], length 0
02:31:37.666605 IP 100.100.100.100.54634 > 172.16.99.200.ssh: Flags [P.], seq 1:42, ack 1, win 229, options [nop,nop,TS val 2387066 ecr 2404414], length 41
02:31:37.666621 IP 172.16.99.200.ssh > 100.100.100.100.54634: Flags [.], ack 42, win 227, options [nop,nop,TS val 2404418 ecr 2387066], length 0




And last test - failover. Since my lab setup is entirely virtual, the goal was to test if failover works at all and not how fast it does. You need real hardware to check the speed of failover.

vagrant@srv1:~$ iperf -s -B 100.100.100.100

vagrant@srv2:~$ iperf  -M 1000 -b 80K -i 1 -c 100.100.100.100 -t 120

In my case traffic took srv2 -> tor3 -> tor1 -> srv1 path. While iperf was running, I shutdown BGP session between tor1 and srv1. Here are the results from iperf:
[  3] 46.0-47.0 sec   384 KBytes  3.15 Mbits/sec
[  3] 47.0-48.0 sec   512 KBytes  4.19 Mbits/sec
[  3] 48.0-49.0 sec   384 KBytes  3.15 Mbits/sec
[  3] 49.0-50.0 sec  0.00 Bytes  0.00 bits/sec
[  3] 50.0-51.0 sec  0.00 Bytes  0.00 bits/sec
[  3] 51.0-52.0 sec  0.00 Bytes  0.00 bits/sec
[  3] 52.0-53.0 sec   256 KBytes  2.10 Mbits/sec
[  3] 53.0-54.0 sec   512 KBytes  4.19 Mbits/sec


That 3-second interval of 0.00bits/sec is failover time. Again, since it's virtual environment, your mileage may vary.

Monday, March 05, 2018

Interface uptime time format

I was looking into why my NAPALM-based script could not validate state of SVI interface on Cisco Nexus and decided to dig into NAPALM source code. I found something amusing in nxos.py module line 230:
def _compute_timestamp(stupid_cisco_output):
The code that follows after that tries to convert Cisco's way of reporting uptime into epoch. I totally understand the frustration. Let's say you want to find out when interface flapped last time. Here are the few examples:
Last link flapped 5d02h           
Last link flapped 16week(s) 5day(s)
Last link flapped never
Last link flapped 23:39:41
I understand, that "show" command output is intended for human consumption and it's easy to read. Unfortunately, Cisco provides same kind of time format in XML output, which is supposed to be consumed by some kind of automation. Good luck parsing it. While Arista and Juniper also display interface uptime in similar fashion in plain text output, they do much better job in structured output. Here is JUNOS output in JSON:

"interface-flapped" : [                                 
{                                                       
    "data" : "2017-09-13 14:39:29 EDT (24w0d 20:45 ago)",
    "attributes" : {"junos:seconds" : "14589956"}       
}                                                       
],   
or XML:

<interface-flapped junos:seconds="14590210" > 2017-09-13 14:39:29 EDT (24w0d 20:50 ago)</interface-flapped>
Arista's JSON:

"Ethernet5/1": {                                 
    "lastStatusChangeTimestamp": 1519771449.121221,


Saturday, February 24, 2018

IP fabric over unnumbered interfaces

  So, you read the industry websites, know that IP fabric is the next best thing in data center networking and decided to take a plunge and build your own. Nothing big to start with: 2 spine and 8 leaf switches. Now you realize that your IPAM system does not have API and you have to assign 16 IP addresses for transit links and 10 IPs for loopback interfaces manually. While not insurmountable task, it's tedious. Fortunately, Cisco's NXOS and Juniper's JUNOS let you configure ip unnumbered ethernet interface and now you need only 10 IPs for loopbacks.
  I created a small virtual lab of 3 NXOSv switches - 2 spines and 2 leafs - to test the concept. I could not make BGP work directly over unnumbered interfaces, so I configured OSPF to advertise loopbacks and BGP peering between loopbacks. To simplify configuration even more, I configured dynamic BGP peering on the spines.
   Why would you need BGP if you already have OSPF? You might want to run another next best thing - VXLAN with EVPN control plane.
  To run the lab you'll need Vagrant, VirtualBox, Linux machine with 32GB of memory and Vagrant package of NXOS from Cisco. You need to have CCO account and may be maintenance contract to download the image. Do not ask me to provide the image. I used NXOSv image nxosv-final.7.0.3.I7.1.boxVagrant 2.0.1, VirtualBox 5.2.6 and Ubuntu Linux 16.04LTS. Although the lab worked with earlier versions of Vagrant and VirtualBox and should run on any Linux distro.

  • Clone or download from git configuration files
  • Run "create_iso.sh" script to build ISO files with configuration for each NXOSv switch
  • run "vagrant up". Depending on the resources, it might take up 10 minutes for all 4 switches to come up fully.
You can see boot progress by connecting to consoles: "ncat -U /tmp/", where are leaf1, leaf2, spine1 or spine2. After switches are up, you can log in by running "vagrant ssh ". You'll be presented with bash shell, to get to NXOS prompt, type "su - admin", password is admin.

Disclaimer: this is in no way shape or form production-ready configuration and was not tested for any side effects. Use it at your own risk.

Happy Labbing!

Saturday, January 27, 2018

Cattle vs Pets rant

Ok, by now pretty much everybody in IT knows about cattle and pets meme. Your IT infrastructure should be disposable cattle and not precious pets. While I agree that IT infrastructure should be disposable, I have a problem with this specific choice of words. Only people who never set foot in a cattle ranch can say that. Any farm roaming bovine actually brings money, so treating one dead cow like it's nothing to worry about is going to cost you. One sick cow must get as much, if not more, attention than pet cat or your entire operations can be in jeopardy.
Let's come up with different analogy.

Monday, July 13, 2015

It does not take that long

When some overly enthusiastic SDN neophyte or sysadmin-turned-devops-engineer tells me about how long it used to take to create a VLAN and how quick it can be done with new magic SDN controller or automation tool of the day, I have to assume that this person does not really know what she or he is talking about.
What takes long time is change control procedure and no amount of automation or SDN is going to change it.
Since we are talking about VLANs here, in modern data center network you do not need to create a VLAN on more than one switch.

Monday, December 15, 2014

Manage network devices with Ansible

Ansible is one of the best technologies we took from buggers after the war.  I loved "Ender's game" book, not the movie.
Inspired by excellent posts  by Kirk Byers I decided to try Ansible not only to generate configuration for network switches, but to make configuration changes. I have virtual Arista switch running in VirtualBox, so this is where I ran my tests, but it's easy to replicate with Juniper or Cisco Nexus switches. I used user "root", although any user with priviledge level 15 will do.
First, enable root user on Arista switch:
Arista-5#(conf) aaa root secret SecretPassword

Next step is to go to managemnent server and generate ssh key without password. Resulted public key should be added to /root/authorized_keys file on Arista switch.

Now, to Ansible.

My ansible.cfg:
[defaults]
host_key_checking=False
hostfile=/home/user1/ansible/hosts
log_path=~/ansible.log

Let's do very simple task: copy new OS image file and update boot variable. Here is my very simple playbook upgrade.yml:
---
- hosts: arista
  remote_user: root
  tasks:
  - name: Push image
     copy: src=/home/user1/Documents/ansible/vEOS-1.swi dest=/mnt/flash/vEOS-1.swi
  - name: Change boot variable
    command: FastCli -p15 -c "install source vEOS-1.swi now"


Really simple inventory file:
[arista]
arista-5

Let's run it:













It worked, boot variable now points to vEOS-1.swi file.



What happens if you you use RADIUS for authentication and have to enter 
password to log into your switch? In this case Ansible uses sshpass which stores your password in the memory. From sshpass man page:

It is close to impossible to securely store the password, and users of sshpass should consider whether ssh's public key authentication provides the same end-user experience, while involving less hassle and being more secure.


If you are willing to take this risk, insert "ask_pass=True" line into your ansible.cfg to be prompted for password or run ansible_playbook command with -k option.