In this article I want to show you my recent topology that I have worked while studying VxLAN. As seen below, there are 2 VxLAN networks completely seperated from each other by using VRFs. The goal of this scenario is extending L2 reachability between members of a single VxLAN network. For example PC2 & PC4 which are connected to Nexus 1 and Nexus 3 respectively, need to exchange L2 traffic but they are completely separated from each other with a L3 network. Extending L2 network over L3 infrastructure is something that needs special procedure and nowadays there are a few of them available, including Cisco OTV, VxLAN, etc.

Cisco VxLAN Topology

A VxLAN tunnel runs over an existent network which is called “underlay network”. In this case the VxLAN network is called “Overlay network”. In this topology, I’m running OSPF 1 between Nexus 1, 2 and 3 and enabled it on a loopback interface on each Nexus with the following IP addresses:

    Loopback 0 (Nexus 1): 1.1.1.1/32
    Loopback 0 (Nexus 2): 2.2.2.2/32
    Loopback 0 (Nexus 3): 3.3.3.3/32

After verifying basic reachability between loopback interfaces of the Nexus devices, we need to create two VRFs, one for every vxlan network:

vrf context A
  vni 4
  rd auto
  address-family ipv4 unicast
    route-target both auto
    route-target both auto evpn
vrf context B
  vni 5
  rd auto
  address-family ipv4 unicast
    route-target both auto
    route-target both auto evpn
!
interface Vlan2
  description customer A 
  no shutdown
  vrf member A
  ip address 99.1.1.1/29
!
interface Vlan3
  description customer B 
  no shutdown
  vrf member B
  ip address 99.1.1.20/29
!
interface Ethernet1/2
  switchport access vlan 2
!
interface Ethernet1/4
  switchport access vlan 3

To verify reachability between PC devices and the interface VLANs on VTEP devices (N1 & N3), use ping on PCs. For instance, the ping test on PC2 towards 99.1.1.20 on each VTEP should be successful, but PC2 cannot ping PC4 yet.
The next step is configuring basic VxLAN. For this we need to correspond or map any local vlan on VTEP devices to a VxLAN network. In the other words, we need to create a “VLAN ID to VxLAN VID” table on nexus 1 & 3. Because the reachability between left and right portion of the topology will be handled by VxLAN tunnels and VxLAN IDs, then local vlans on N1 and N3 have only local meaning. For example PC2 can be a member of vlan 3 on N1 and member of vlan 100 on the N3 and still both PCs can be member of a single VxLAN or “L2 network”. But to reduce complexity, I used the same numbers for vlans and vxlans (vlan2/vxlan2 and vlan3/vxlan3).
The mapping between vlan id and vxlan id is done as shown below:

vlan 2
  name customer_A_vlan
  vn-segment 2
vlan 3
  name customer_B_vlan
  vn-segment 3
vlan 4
  name customer_A_L3_routing_vxlan_vlan
  vn-segment 4
vlan 5
  name customer_B_L3_routing_vxlan_vlan
  vn-segment 5

You can see two other vlans created and mapped but you can ignore them by now, as those are needed while configuring inter-vxlan connectivity. Because we have created two different VRFs for two vxlan networks, we need to associate each VRF with a vxlan network by mentioning vxlan id (VNI) inside every VRF which we’ve done already (take a look at previous outputs). The RD and RT values were set to auto to let the switch generates them automatically for each VRF. These values as well as some others are going to be sent to other VTEPs by BGP extended communities. So we need to configure BGP as another “underlay” network to convey additional information about networks.
As you probably know the BGP is capable of transporting multiple different information (NLRI reachability info) via address families so it took the name of MP-BGP. By the way, there is an special address family for VxLAN networks while using BGP as transport protocol which is called “EVPN address family. The requirements for this address family is IPv4 address family, but enabling it on VTEPs is enough.
The shared BGP configuration on VTEP switches looks like this:

router bgp 2
  address-family ipv4 unicast
  address-family l2vpn evpn
  neighbor 2.2.2.2
    remote-as 2
    update-source loopback0
    address-family ipv4 unicast
      send-community
      send-community extended
    address-family l2vpn evpn
      send-community
      send-community extended

In this example I configured N2 as a BGP Route Reflector which is recommended solution in a big production network. You can set BGP AS numbers on underlay networks to be different and hence use eBGP between switches, but in this example I’ve used the same BGP AS number on all switches.

n2# sh run | sec bgp
feature bgp
router bgp 2
  address-family l2vpn evpn
    retain route-target all
  neighbor 1.1.1.1
    remote-as 2
    update-source loopback0
    address-family ipv4 unicast
    address-family l2vpn evpn
      send-community
      send-community extended
      route-reflector-client
  neighbor 3.3.3.3
    remote-as 2
    update-source loopback0
    address-family ipv4 unicast
    address-family l2vpn evpn
      send-community
      send-community extended
      route-reflector-client

The “retain route-target all” command is used while we are using eBGP connections, but I didn’t removed it in this example just to explain its application.
At this point we have basic underlay config and need to proceed to overlay part. First we need to configure EVPN in Global config mode on VTEP switches (N1 & N3). N2 switch belongs only to underlay network, so it has no any information about details of vxlan network.

evpn
  vni 2 L2
    rd auto
    route-target import auto
    route-target export auto
  vni 3 L2
    rd auto
    route-target import auto
    route-target export auto
!
router bgp 2
  vrf A
    address-family ipv4 unicast
  vrf B
    address-family ipv4 unicast

Like any other tunneling technology, vxlan tunnels use special interface “NVE” to establish tunnels. You can only create just one NVE interface on every Nexus switch, even if you have multiple vxlan networks with different VNIs. All of those tunnels will be establish between those single interfaces of VTEP switches.

interface nve1
  no shutdown
  host-reachability protocol bgp
  source-interface loopback0
  member vni 2
    mcast-group 224.1.1.1
  member vni 3
    mcast-group 224.1.1.3

Speaking of multicast, don’t forget to configure multicast, as it is necessary on vxlan networks to handle BUM traffic (Broadcast, Multicast, Unknown Unicast) between sites. So it’s better to configure it now if you haven’t yet. To do this, enable PIM on every underlay network (including loopback interfaces):

On N1:

interface Ethernet1/3
  no switchport
  ip pim sparse-mode
!
interface loopback0
  ip pim sparse-mode
!
ip pim rp-address 2.2.2.2 

On N2:

interface Ethernet1/1
  no switchport
  ip pim sparse-mode
!
interface Ethernet1/2
  no switchport
  ip pim sparse-mode
!
interface loopback0
  ip pim sparse-mode
!
ip pim rp-address 2.2.2.2 

On N3:

interface Ethernet1/3
  no switchport
  ip pim sparse-mode
!
interface loopback0
  ip pim sparse-mode
!
ip pim rp-address 2.2.2.2 

I used very simple method of determining multicast RP, but you can make it more complex by using dynamic/automatic ones. If you don’t have working multicast network in place as another underlay infrastructure, except unicast (L2 and L3) traffic, other types (e.g ARP, DHCP, etc) will not pass between sites, so you will have problems.
You can use a single multicast infrastructure (single multicast group) to handle all BUM traffic for all of vxlans and that’s probably a good idea. But for the sake of simplicity, I used multicast group 224.1.1.1 on vxlan VNI 2 and group 224.1.1.3 on vxlan VNI 3.
One other thing remains; If you have noticed, gateway for vxlan network a (VRF A) should be set as 99.1.1.1 and for vxlan network 2 (VRF B) sould be as 99.1.1.20. For this we have created two vlan interfaces on each VTEP with these IP addresses. I’m not going to dive deep on the background, but there is a feature named Distributed gateway” in vxlan which is something like NHRP in enterprise networks. With this feature, VTEP devices assume a shared virtual IP and shared virtual MAC besides their unique addresses. Clients on both sites use that VIP and VMAC as their default gateway, so moving between sites on the fly, will not put them in trouble. For example, while using a vMotion to move a working VM on left site to the right network, the VM will use the same VIP & VMAC before and after relocation.

For this, we need to add a few additional lines of configuration on VTEPs:

fabric forwarding anycast-gateway-mac 1234.5678.90ab
!
interface Vlan2
  description customer A 
  fabric forwarding mode anycast-gateway
!
interface Vlan3
  description customer B 
  fabric forwarding mode anycast-gateway

With this configuration, PC1 & PC3 will use 99.1.1.1 and PC2 & PC4 will use 99.1.1.20 as their default gateways. The MAC address of both of these addresses is the same value of 1234.5678.90ab. This will not create a problem, because MAC addresses have meaning inside a single L3 network.

Up to this point we should have reachability inside a single vxlan network, that is, PC1 & PC3 inside vxlan network 1 and PC2 & PC4 inside vxlan network 2 should be able to ping each other.

On PC2 (I’m using a CSR 1000v router as a simple host in this example):

csr-1#ping 99.1.1.17
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 99.1.1.17, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 5/6/7 ms
csr-1#
!
!
csr-1#show arp
Protocol  Address          Age (min)  Hardware Addr   Type   Interface
Internet  99.1.1.17              15   0000.0000.9413  ARPA   GigabitEthernet6

On PC4 (I’m using another Nexus device as a single host in this example):

n4# ping 99.1.1.18
PING 99.1.1.18 (99.1.1.18): 56 data bytes
64 bytes from 99.1.1.18: icmp_seq=0 ttl=254 time=6.985 ms
64 bytes from 99.1.1.18: icmp_seq=1 ttl=254 time=5.308 ms
64 bytes from 99.1.1.18: icmp_seq=2 ttl=254 time=5.731 ms
64 bytes from 99.1.1.18: icmp_seq=3 ttl=254 time=6.321 ms
64 bytes from 99.1.1.18: icmp_seq=4 ttl=254 time=6.11 ms

--- 99.1.1.18 ping statistics ---
5 packets transmitted, 5 packets received, 0.00% packet loss
round-trip min/avg/max = 5.308/6.091/6.985 ms
!
!
n4# sh inter eth1/3
Ethernet1/3 is up
admin state is up, Dedicated Interface
  Hardware: 100/1000/10000 Ethernet, address: 0000.0000.9413 (bia 0050.5600.000e)
  Internet Address is 99.1.1.17/29
!
!
n4# show ip arp
IP ARP Table for context default
Total number of entries: 2
Address         Age       MAC Address     Interface       Flags
99.1.1.18       00:00:17  0050.568f.4975  Ethernet1/3     
99.1.1.20       00:07:42  1234.5678.90ab  Ethernet1/3   

If you compare the MAC and IP addresses on PC2 and PC4, you will see that both devices are inside a single L2 and L3 network and use ARP to resolve each other’s IP address to respective MACs, despite they are physically separated from each other and each is behind a different L3 network.
In the part 2 of this topic I will discuss inter-vxlan & external L3 reachability in vxlan networks.

Deploying VxLAN with Cisco Nexus 9000v (Part 1)
Tagged on:                                         

Leave a Reply

Your email address will not be published. Required fields are marked *