In this article I want to show you my recent topology that I have worked while studying VxLAN. As seen below, there are 2 VxLAN networks completely seperated from each other by using VRFs. The goal of this scenario is extending L2 reachability between members of a single VxLAN network. For example PC2 & PC4 which are connected to Nexus 1 and Nexus 3 respectively, need to exchange L2 traffic but they are completely separated from each other with a L3 network. Extending L2 network over L3 infrastructure is something that needs special procedure and nowadays there are a few of them available, including Cisco OTV, VxLAN, etc.
A VxLAN tunnel runs over an existent network which is called “underlay network”. In this case the VxLAN network is called “Overlay network”. In this topology, I’m running OSPF 1 between Nexus 1, 2 and 3 and enabled it on a loopback interface on each Nexus with the following IP addresses:
-
Loopback 0 (Nexus 1): 1.1.1.1/32
Loopback 0 (Nexus 2): 2.2.2.2/32
Loopback 0 (Nexus 3): 3.3.3.3/32
After verifying basic reachability between loopback interfaces of the Nexus devices, we need to create two VRFs, one for every vxlan network:
vrf context A vni 4 rd auto address-family ipv4 unicast route-target both auto route-target both auto evpn vrf context B vni 5 rd auto address-family ipv4 unicast route-target both auto route-target both auto evpn ! interface Vlan2 description customer A no shutdown vrf member A ip address 99.1.1.1/29 ! interface Vlan3 description customer B no shutdown vrf member B ip address 99.1.1.20/29 ! interface Ethernet1/2 switchport access vlan 2 ! interface Ethernet1/4 switchport access vlan 3
To verify reachability between PC devices and the interface VLANs on VTEP devices (N1 & N3), use ping on PCs. For instance, the ping test on PC2 towards 99.1.1.20 on each VTEP should be successful, but PC2 cannot ping PC4 yet.
The next step is configuring basic VxLAN. For this we need to correspond or map any local vlan on VTEP devices to a VxLAN network. In the other words, we need to create a “VLAN ID to VxLAN VID” table on nexus 1 & 3. Because the reachability between left and right portion of the topology will be handled by VxLAN tunnels and VxLAN IDs, then local vlans on N1 and N3 have only local meaning. For example PC2 can be a member of vlan 3 on N1 and member of vlan 100 on the N3 and still both PCs can be member of a single VxLAN or “L2 network”. But to reduce complexity, I used the same numbers for vlans and vxlans (vlan2/vxlan2 and vlan3/vxlan3).
The mapping between vlan id and vxlan id is done as shown below:
vlan 2 name customer_A_vlan vn-segment 2 vlan 3 name customer_B_vlan vn-segment 3 vlan 4 name customer_A_L3_routing_vxlan_vlan vn-segment 4 vlan 5 name customer_B_L3_routing_vxlan_vlan vn-segment 5
You can see two other vlans created and mapped but you can ignore them by now, as those are needed while configuring inter-vxlan connectivity. Because we have created two different VRFs for two vxlan networks, we need to associate each VRF with a vxlan network by mentioning vxlan id (VNI) inside every VRF which we’ve done already (take a look at previous outputs). The RD and RT values were set to auto to let the switch generates them automatically for each VRF. These values as well as some others are going to be sent to other VTEPs by BGP extended communities. So we need to configure BGP as another “underlay” network to convey additional information about networks.
As you probably know the BGP is capable of transporting multiple different information (NLRI reachability info) via address families so it took the name of MP-BGP. By the way, there is an special address family for VxLAN networks while using BGP as transport protocol which is called “EVPN address family. The requirements for this address family is IPv4 address family, but enabling it on VTEPs is enough.
The shared BGP configuration on VTEP switches looks like this:
router bgp 2 address-family ipv4 unicast address-family l2vpn evpn neighbor 2.2.2.2 remote-as 2 update-source loopback0 address-family ipv4 unicast send-community send-community extended address-family l2vpn evpn send-community send-community extended
In this example I configured N2 as a BGP Route Reflector which is recommended solution in a big production network. You can set BGP AS numbers on underlay networks to be different and hence use eBGP between switches, but in this example I’ve used the same BGP AS number on all switches.
n2# sh run | sec bgp feature bgp router bgp 2 address-family l2vpn evpn retain route-target all neighbor 1.1.1.1 remote-as 2 update-source loopback0 address-family ipv4 unicast address-family l2vpn evpn send-community send-community extended route-reflector-client neighbor 3.3.3.3 remote-as 2 update-source loopback0 address-family ipv4 unicast address-family l2vpn evpn send-community send-community extended route-reflector-client
The “retain route-target all” command is used while we are using eBGP connections, but I didn’t removed it in this example just to explain its application.
At this point we have basic underlay config and need to proceed to overlay part. First we need to configure EVPN in Global config mode on VTEP switches (N1 & N3). N2 switch belongs only to underlay network, so it has no any information about details of vxlan network.
evpn vni 2 L2 rd auto route-target import auto route-target export auto vni 3 L2 rd auto route-target import auto route-target export auto ! router bgp 2 vrf A address-family ipv4 unicast vrf B address-family ipv4 unicast
Like any other tunneling technology, vxlan tunnels use special interface “NVE” to establish tunnels. You can only create just one NVE interface on every Nexus switch, even if you have multiple vxlan networks with different VNIs. All of those tunnels will be establish between those single interfaces of VTEP switches.
interface nve1 no shutdown host-reachability protocol bgp source-interface loopback0 member vni 2 mcast-group 224.1.1.1 member vni 3 mcast-group 224.1.1.3
Speaking of multicast, don’t forget to configure multicast, as it is necessary on vxlan networks to handle BUM traffic (Broadcast, Multicast, Unknown Unicast) between sites. So it’s better to configure it now if you haven’t yet. To do this, enable PIM on every underlay network (including loopback interfaces):
On N1:
interface Ethernet1/3 no switchport ip pim sparse-mode ! interface loopback0 ip pim sparse-mode ! ip pim rp-address 2.2.2.2
On N2:
interface Ethernet1/1 no switchport ip pim sparse-mode ! interface Ethernet1/2 no switchport ip pim sparse-mode ! interface loopback0 ip pim sparse-mode ! ip pim rp-address 2.2.2.2
On N3:
interface Ethernet1/3 no switchport ip pim sparse-mode ! interface loopback0 ip pim sparse-mode ! ip pim rp-address 2.2.2.2
I used very simple method of determining multicast RP, but you can make it more complex by using dynamic/automatic ones. If you don’t have working multicast network in place as another underlay infrastructure, except unicast (L2 and L3) traffic, other types (e.g ARP, DHCP, etc) will not pass between sites, so you will have problems.
You can use a single multicast infrastructure (single multicast group) to handle all BUM traffic for all of vxlans and that’s probably a good idea. But for the sake of simplicity, I used multicast group 224.1.1.1 on vxlan VNI 2 and group 224.1.1.3 on vxlan VNI 3.
One other thing remains; If you have noticed, gateway for vxlan network a (VRF A) should be set as 99.1.1.1 and for vxlan network 2 (VRF B) sould be as 99.1.1.20. For this we have created two vlan interfaces on each VTEP with these IP addresses. I’m not going to dive deep on the background, but there is a feature named Distributed gateway” in vxlan which is something like NHRP in enterprise networks. With this feature, VTEP devices assume a shared virtual IP and shared virtual MAC besides their unique addresses. Clients on both sites use that VIP and VMAC as their default gateway, so moving between sites on the fly, will not put them in trouble. For example, while using a vMotion to move a working VM on left site to the right network, the VM will use the same VIP & VMAC before and after relocation.
For this, we need to add a few additional lines of configuration on VTEPs:
fabric forwarding anycast-gateway-mac 1234.5678.90ab ! interface Vlan2 description customer A fabric forwarding mode anycast-gateway ! interface Vlan3 description customer B fabric forwarding mode anycast-gateway
With this configuration, PC1 & PC3 will use 99.1.1.1 and PC2 & PC4 will use 99.1.1.20 as their default gateways. The MAC address of both of these addresses is the same value of 1234.5678.90ab. This will not create a problem, because MAC addresses have meaning inside a single L3 network.
Up to this point we should have reachability inside a single vxlan network, that is, PC1 & PC3 inside vxlan network 1 and PC2 & PC4 inside vxlan network 2 should be able to ping each other.
On PC2 (I’m using a CSR 1000v router as a simple host in this example):
csr-1#ping 99.1.1.17 Type escape sequence to abort. Sending 5, 100-byte ICMP Echos to 99.1.1.17, timeout is 2 seconds: !!!!! Success rate is 100 percent (5/5), round-trip min/avg/max = 5/6/7 ms csr-1# ! ! csr-1#show arp Protocol Address Age (min) Hardware Addr Type Interface Internet 99.1.1.17 15 0000.0000.9413 ARPA GigabitEthernet6
On PC4 (I’m using another Nexus device as a single host in this example):
n4# ping 99.1.1.18 PING 99.1.1.18 (99.1.1.18): 56 data bytes 64 bytes from 99.1.1.18: icmp_seq=0 ttl=254 time=6.985 ms 64 bytes from 99.1.1.18: icmp_seq=1 ttl=254 time=5.308 ms 64 bytes from 99.1.1.18: icmp_seq=2 ttl=254 time=5.731 ms 64 bytes from 99.1.1.18: icmp_seq=3 ttl=254 time=6.321 ms 64 bytes from 99.1.1.18: icmp_seq=4 ttl=254 time=6.11 ms --- 99.1.1.18 ping statistics --- 5 packets transmitted, 5 packets received, 0.00% packet loss round-trip min/avg/max = 5.308/6.091/6.985 ms ! ! n4# sh inter eth1/3 Ethernet1/3 is up admin state is up, Dedicated Interface Hardware: 100/1000/10000 Ethernet, address: 0000.0000.9413 (bia 0050.5600.000e) Internet Address is 99.1.1.17/29 ! ! n4# show ip arp IP ARP Table for context default Total number of entries: 2 Address Age MAC Address Interface Flags 99.1.1.18 00:00:17 0050.568f.4975 Ethernet1/3 99.1.1.20 00:07:42 1234.5678.90ab Ethernet1/3
If you compare the MAC and IP addresses on PC2 and PC4, you will see that both devices are inside a single L2 and L3 network and use ARP to resolve each other’s IP address to respective MACs, despite they are physically separated from each other and each is behind a different L3 network.
In the part 2 of this topic I will discuss inter-vxlan & external L3 reachability in vxlan networks.