srv1 and srv2 are Vagrant minimal/xenial64 boxes. srv1, tor1, tor2 and tor3 run BGP, srv2 is connected to 172.16.99.0/24 network hosted on tor3. Let's configure "always-up" IP address on srv1:
sudo ip addr add 100.100.100.100/32 dev lo:100While binding server application like Apache to specific IP address or interface is pretty straightforward task, selecting source address for outgoing connection is a bit more complicated. Here is how Linux selects source IP address:
The application can request a
particular IP
[20],
the kernel will use the src
hint from the chosen
route path
[21],
or, lacking this hint, the kernel will choose the first address
configured on the interface which falls in the same network as the
destination address or the nexthop router.
I want it to be transparent for the applications and left on its own, Linux most likely will select IP address of one of the physical interfaces. The only option left is to make sure that route to 172.16.99.0/24 on srv1 is programmed with src 100.100.100.100.In this lab I am using BIRD 1.6 to run BGP on srv1, but Free Rang Routing will work too.
router id 100.100.100.100;
filter my_vip
{
if net = 100.100.100.100/32 then accept;
reject;
}
filter remote_site
{
if net ~ [ 172.16.99.0/24 ] then
{
krt_prefsrc = 100.100.100.100; #set src
accept;
}
reject;
}
protocol kernel {
scan time 60;
import none;
export filter remote_site;
persist; # routes stay even if bird is down
merge paths on; # ECMP
}
protocol device {
scan time 60;
}
protocol direct {
interface "enp0s[8|9]", "lo*";
}
protocol bgp host_2rtr1 {
local as 65499;
neighbor 192.168.11.3 as 64900;
export filter my_vip;
import filter remote_site;
}
protocol bgp host_2rtr2 {
local as 65499;
neighbor 192.168.22.3 as 64920;
export filter my_vip;
import filter remote_site;
}
Let's see BGP routes we get from tor1 and tor2:
bird> show route 172.16.99.0/24
172.16.99.0/24 via 192.168.11.3 on enp0s8 [host_2rtr1 16:15:49] * (100) [AS65000i]
via 192.168.22.3 on enp0s9 [host_2rtr2 16:15:49] (100) [AS65000i]
Only one route is marked as primary, I could not find "bestpath as-path multipath-relax" equivalent in BIRD. It's required because tor1 and tor2 have different AS numbers. But no worries, "merge path on" under "protocol kernel" will take care of this. Indeed:
vagrant@srv1:~$ ip route show 172.16.99.0/24
172.16.99.0/24 proto bird src 100.100.100.100
nexthop via 192.168.11.3 dev enp0s8 weight 1
nexthop via 192.168.22.3 dev enp0s9 weight 1
both routes are installed and claim to use 100.100.100.100 as a source IP address to reach srv2 network.
Let's verify. I start pinging srv2 from srv1 and run tcpdump on srv2 side.
vagrant@srv1:~$ ping 172.16.99.200
Here is tcpdump output on srv2:
02:22:09.753864 IP 100.100.100.100 > 172.16.99.200: ICMP echo request, id 5024, seq 1, length 64
02:22:09.753906 IP 172.16.99.200 > 100.100.100.100: ICMP echo reply, id 5024, seq 1, length 64
02:22:10.750884 IP 100.100.100.100 > 172.16.99.200: ICMP echo request, id 5024, seq 2, length 64
02:22:10.750920 IP 172.16.99.200 > 100.100.100.100: ICMP echo reply, id 5024, seq 2, length 64
As you can see, packets are coming from 100.100.100.100, even though I did not specify source IP address for the ping.
Similar test with ssh
vagrant@srv1:~$ ssh 172.16.99.200
02:31:37.652419 IP 100.100.100.100.54634 > 172.16.99.200.ssh: Flags [S], seq 3479345726, win 29200, options [mss 1460,sackOK,TS val 2387063 ecr 0,nop,wscale 7], length 0
02:31:37.652473 IP 172.16.99.200.ssh > 100.100.100.100.54634: Flags [S.], seq 2929355359, ack 3479345727, win 28960, options [mss 1460,sackOK,TS val 2404414 ecr 2387063,nop,wscale 7], length 0
02:31:37.665081 IP 100.100.100.100.54634 > 172.16.99.200.ssh: Flags [.], ack 1, win 229, options [nop,nop,TS val 2387066 ecr 2404414], length 0
02:31:37.666605 IP 100.100.100.100.54634 > 172.16.99.200.ssh: Flags [P.], seq 1:42, ack 1, win 229, options [nop,nop,TS val 2387066 ecr 2404414], length 41
02:31:37.666621 IP 172.16.99.200.ssh > 100.100.100.100.54634: Flags [.], ack 42, win 227, options [nop,nop,TS val 2404418 ecr 2387066], length 0
And last test - failover. Since my lab setup is entirely virtual, the goal was to test if failover works at all and not how fast it does. You need real hardware to check the speed of failover.
vagrant@srv1:~$ iperf -s -B 100.100.100.100
vagrant@srv2:~$ iperf -M 1000 -b 80K -i 1 -c 100.100.100.100 -t 120
In my case traffic took srv2 -> tor3 -> tor1 -> srv1 path. While iperf was running, I shutdown BGP session between tor1 and srv1. Here are the results from iperf:
[ 3] 46.0-47.0 sec 384 KBytes 3.15 Mbits/sec
[ 3] 47.0-48.0 sec 512 KBytes 4.19 Mbits/sec
[ 3] 48.0-49.0 sec 384 KBytes 3.15 Mbits/sec
[ 3] 49.0-50.0 sec 0.00 Bytes 0.00 bits/sec
[ 3] 50.0-51.0 sec 0.00 Bytes 0.00 bits/sec
[ 3] 51.0-52.0 sec 0.00 Bytes 0.00 bits/sec
[ 3] 52.0-53.0 sec 256 KBytes 2.10 Mbits/sec
[ 3] 53.0-54.0 sec 512 KBytes 4.19 Mbits/sec
That 3-second interval of 0.00bits/sec is failover time. Again, since it's virtual environment, your mileage may vary.