Configuring L3 eBGP connectivity - Mellanox ONYX OS Switches

Prev Next

Summary

VAST uses Multi-active gateway protocol (MAGP) and Border Gateway Protocol (BGP) protocols on the VAST Mellanox Switches to implement L3 external connectivity to the customer network switches.

In this document, as an example, we will configure  L3 connectivity with the following networking architecture.

What info is needed for VAST to complete BGP setup?

  • ASNs for the customer and VAST switches (both 16-bit and 32-bit are supported)

  • Port mappings and /31 Subnet IPs for VAST external ports and Customer switches ports.

  • 1 or more VIP Pools Subnets

  • 3 IPs in each VIP Pool - Gateway/Router, VAST Switch #1, VAST Switch #2. 

  • MTU to set on VAST external ports

  • Any additional info - e.g. password encryption 

Setup

The setup includes 4 switches:

  • VAST Mellanox ONYX OS Switches - VAST Switch 1 and VAST Switch 2

  • Customer Switches - Customer Switch 1 and Customer Switch 2

BGP Network Design

The eBGP network design is as follows:

  • The Mellanox ONYX OS switches will be configured with one AS number (AS 100), while the Customer switches will be configured with another AS number (AS 101).

  • ECMP will be configured for load balancing.

  • Router interfaces are used as Layer 3 interfaces

  • VAST VIP Pools subnets - 10.101.32.0/24 , 10.100.122.0/24

  • Customer Client IP subnet - 10.101.4.0/24  

Setup Architecture

Setup Architecture

Prerequisites

Mellanox switch version 3.8.2204. Use the following commands to validate the switch version:

# show version concise 
X86_64 3.8.2204 2019-12-29 16:11:11 x86_64

 

Backup the current configuration:

configuration write to backup_l2 no-switch

Configuration Step 1: MAGP (a.k.a VRRP on other vendors)

MAGP is a Mellanox proprietary protocol that implements active-active VRRP (Virtual Router Redundancy Protocol). Each VAST Mellanox Switch is configured as a switch router (SR).
MAGP aims to resolve the default gateway problem when VAST C-nodes are connected to a set of SRs by defining a Virtual Router address which runs on both Vast Mellanox Switches.

Reference: https://docs.nvidia.com/networking/display/onyxv3104006/magp

The configuration during this step includes configuration of: 

  • VLAN interfaces (Layer 3 interfaces) configuration with one or more local router IP

  • Virtual Router with unique MAC address and one or more IP addresses (MAGP).

Note: The following should be repeated for every VLAN+subnet defined for a VIP Pool.

UNIQUE_IP: A Router IP Address for the respective switches. Use primary only for the first IP in the interface VLAN.

MAGP_VIP: A Virtual Router (MAGP) IP address. This should be used as a gateway for our VIP Pools. Use secondary for any other virtual-router address on the same interface VLAN)

MAC_ADDRESS: Virtual Address Mac Address

VLAN: The VIP Pool VLAN. Use 1 for untagged traffic.

MTU: The MTU used internally.

 

no cli default prefix-modes enable

##
## MAGP configuration
##
interface vlan $VLAN
interface vlan $VLAN mtu $MTU
interface vlan $VLAN ip address $UNIQUE_IP [primary]

protocol magp
interface vlan $VLAN magp $VLAN
interface vlan $VLAN magp $VLAN ip virtual-router address $MAGP_VIP [secondary]
interface vlan $VLAN magp $VLAN ip virtual-router mac-address $MAC_ADDRESS 

 

 Reference: https://docs.nvidia.com/networking/display/onyxv3104006/bgp

Example:

Per Figure 1 we configure the following:

  • VLAN 1 (Access) Interface with Router IPs for each switch.

    Note: The commands below should be run on each of the VAST Switches
    
    no cli default prefix-modes enable
    
    interface vlan 1 mtu 9216
    
    ### Switch 1 - 252 , Switch 2 - 253
    interface vlan 1 ip address 10.101.32.[252/253]/24 primary
    
    ### for additional VIP pool on vlan 1 (non-tagged)
    ### Switch 1 - 252 , Switch 2 - 253
    interface vlan 1 ip address 10.100.222.[252/253]/24 
    

     

  • A Virtual router with unique mac-address and two IP addresses, one for each VIP Pool. These addresses are the ones to be configured as Gateway in VAST WebUI.

    Note: The commands below should be run on each of the VAST Switches

    protocol magp
    interface vlan 1 magp 1
    interface vlan 1 magp 1 ip virtual-router address 10.101.32.254
    interface vlan 1 magp 1 ip virtual-router mac-address [unique-mac-address]
    
    ### for additional VIP pool on vlan 1 (non-tagged)
    interface vlan 1 magp 1 ip virtual-router address 10.100.222.254 secondary

Configuration Step 2: Routing Interfaces

At this step, we configure Layer 3 Router Interfaces on with /31 subnet IP addresses. As /31 subnet contains only two IPs, it enables a single router interface on VAST Mellanox Switches to work uniquely with a single interface on the customer switches. The customer should provide a range of IPs to use based on the number of uplinks connected.

Choosing the IPs should be done as follows:

  1. VAST uses /31 subnets to allocate two IP addresses per local and remote port.

  2. Start with 0 (x.x.x.0+x.x.x.1 are a pair, etc.).

The following should be applied to every port connecting to the customer switches.

PORT: port number.

IP: an IP address used by the switch to pass eBGP traffic.

SUBNET: /31

EXTERNAL_MTU:  MTU to be aligned with the customer switches interfaces MTU

interface ethernet $PORT mtu $EXTERNAL_MTU
interface ethernet $PORT no switchport force
interface ethernet $PORT flowcontrol receive off force
interface ethernet $PORT flowcontrol send off force
interface ethernet $PORT dcb priority-flow-control mode off
interface ethernet $PORT traffic-class 0 congestion-control ecn minimum-absolute 150 maximum-absolute 1500
interface ethernet $PORT ip address $IP $SUBNET
interface ethernet $PORT no shut

 

Example

Per Figure 1, on each VAST Mellanox Switch, we configure 4 L3 Router interfaces and assigning an IP in /31 subnet.

  1. Configure router ports on the VAST Switches:

    • MTU

    • Set as Router (Layer 3) interfaces

    • Disable L2 flow control

    • Disable Priority Flow Control

    • Setup ECN 

      #VAST Switch 1 
      
      interface ethernet 1/13-1/16 mtu 9216
      interface ethernet 1/13-1/16 no switchport force
      interface ethernet 1/13-1/16 flowcontrol receive off force
      interface ethernet 1/13-1/16 flowcontrol send off force
      interface ethernet 1/13-1/16 dcb priority-flow-control mode off
      interface ethernet 1/13-1/16 traffic-class 0 congestion-control ecn minimum-absolute 150 maximum-absolute 1500
      
      #VAST Switch2
      
      interface ethernet 1/13-1/16 mtu 9216
      interface ethernet 1/13-1/16 no switchport force
      interface ethernet 1/13-1/16 flowcontrol receive off force
      interface ethernet 1/13-1/16 flowcontrol send off force
      interface ethernet 1/13-1/16 dcb priority-flow-control mode off
      interface ethernet 1/13-1/16 traffic-class 0 congestion-control ecn minimum-absolute 150 maximum-absolute 1500
  2. Configure IP address on the router ports and make them online

#Switch 1 
interface ethernet 1/13 ip address 172.18.1.1 /31
interface ethernet 1/14 ip address 172.18.1.3 /31
interface ethernet 1/15 ip address 172.18.1.5 /31
interface ethernet 1/16 ip address 172.18.1.7 /31
interface ethernet 1/13-1/16 no shut

#Switch2
interface ethernet 1/13 ip address 172.18.1.9 /31
interface ethernet 1/14 ip address 172.18.1.11 /31
interface ethernet 1/15 ip address 172.18.1.13 /31
interface ethernet 1/16 ip address 172.18.1.15 /31
interface ethernet 1/13-1/16 no shut

 Configuration Step 3: eBGP

The following should be applied once per switch:

LOOPBACK_IP: a unique IP address that the switch derives a router-id from. e.g. 1.1.1.1

ASN: autonomous system number assigned for our cluster (both spines). Validate it's unique in the network. 

REMOTE_ASN: autonomous system number assigned to the customer environment.

NEIGHBOR_ADDRESS: IP address. invoke per neighbor router port address.

IPL_NEIGHBOR: address of the IPL neighbor (sibling switch in MLAG).

PROTOCOL_SUBNET: e.g. 10.10.10.0. The subnet used for customer protocol data such as NFS/S3 (should be repeated for every VIPPOOL subnet)

PROTOCOL_SUBNET_CIDR: e.g. /24.

 

interface loopback 0 ip address $LOOPBACK_IP /32
no spanning-tree # not mandatory
lldp # not mandatory
ip routing
protocol bgp
router bgp $ASN
router bgp $ASN shut
router bgp $ASN maximum-paths 64
router bgp $ASN bestpath as-path multipath-relax
router bgp $ASN bgp fast-external-fallover
router bgp $ASN neighbor customer peer-group
router bgp $ASN neighbor $NEIGHBOR_ADDRESS remote-as $REMOTE_ASN
router bgp $ASN neighbor $NEIGHBOR_ADDRESS peer-group customer
router bgp $ASN neighbor $IPL_NEIGHBOR remote-as $ASN
router bgp $ASN neighbor $IPL_NEIGHBOR next-hop-self
router bgp $ASN network $PROTOCOL_SUBNET $PROTOCOL_SUBNET_CIDR
router bgp $ASN no shut

 

 Reference: https://docs.nvidia.com/networking/display/onyxv3104006/bgp

Example

Note: the interface loopback definition must be unique as it impacts the router ID

VAST Switch 1

  1. Setting interface loopback 

  2. Enable routing and BGP protocols on the switch

  3. Configure the VAST Mellanox Switch with eBGP ASN 100
    Note: In case the ASN number is be provided in the following format (32-bit ASN):

  4. X.Y

     We need to shift X by 16 bits and add Y. Convert it by running the below on any Linux machine, and use the result as the ASN number on VAST Mellanox Switches:

    python
    >>> (x<<16)+y
    
    Example:
    100.100
    python
    (100<<16)+100
    6553700 -<<<<<<< use this as ASN number
  5. Configure eBGP to use ECMP for load balancing and High Availability

    interface loopback 0 ip address 99.99.99.1 /32
    no spanning-tree
    lldp
    ip routing
    protocol bgp
    router bgp 100
    router bgp 100 shut
    router bgp 100 maximum-paths 64
    router bgp 100 bestpath as-path multipath-relax
    router bgp 100 bgp fast-external-fallover 
  6. Configure Peer Group named "Customer" and add the eBGP neighbor IPs (should be configured on the customer switches)

    router bgp 100 neighbor customer peer-group
    router bgp 100 neighbor 172.18.1.0 remote-as 101
    router bgp 100 neighbor 172.18.1.0 peer-group customer
    router bgp 100 neighbor 172.18.1.2 remote-as 101
    router bgp 100 neighbor 172.18.1.2 peer-group customer
    router bgp 100 neighbor 172.18.1.4 remote-as 101
    router bgp 100 neighbor 172.18.1.4 peer-group customer
    router bgp 100 neighbor 172.18.1.6 remote-as 101
    router bgp 100 neighbor 172.18.1.6 peer-group customer
  7. Add the second VAST Mellanox Switch in the MLAG to support Customer Switch HA scenario

    router bgp 100 neighbor 10.10.10.2 remote-as 100
    router bgp 100 neighbor 10.10.10.2 next-hop-self
    router bgp 100 network 10.101.32.0 /24
    router bgp 100 no shut
    
    ### for additional VIP pool on vlan 1
    router bgp 100 network 10.100.222.0 /24
  8. Repeat the same steps (with the relevant changes) for VAST Mellanox Switch #2

    #VAST Switch 2 
    
    interface loopback 0 ip address 99.99.99.2 /32
    no spanning-tree
    lldp
    ip routing
    protocol bgp
    router bgp 100
    router bgp 100 shut
    router bgp 100 maximum-paths 64
    router bgp 100 bestpath as-path multipath-relax
    router bgp 100 bgp fast-external-fallover
    router bgp 100 neighbor customer peer-group
    router bgp 100 neighbor 172.18.1.8 remote-as 101
    router bgp 100 neighbor 172.18.1.8 peer-group customer
    router bgp 100 neighbor 172.18.1.10 remote-as 101
    router bgp 100 neighbor 172.18.1.10 peer-group customer
    router bgp 100 neighbor 172.18.1.12 remote-as 101
    router bgp 100 neighbor 172.18.1.12 peer-group customer
    router bgp 100 neighbor 172.18.1.14 remote-as 101
    router bgp 100 neighbor 172.18.1.14 peer-group customer
    router bgp 100 neighbor 10.10.10.1 remote-as 100
    router bgp 100 neighbor 10.10.10.1 next-hop-self
    router bgp 100 network 10.101.32.0 /24
    router bgp 100 no shut
    
    ### for additional VIP pool on vlan 1
    router bgp 100 network 10.100.222.0 /24

 

Configuration Step 4: Validation

  1. Make sure to save the configuration

    configuration write 
    configuration write 
  2. Validate the MAGP configuration. Referring to our configuration, we are expecting to see two Gateway IP address on each of our switches.

    (config) # show magp
    MAGP 1:
     Interface vlan: 1
     Admin state : Enabled
     State : Master
     Virtual IP : 10.101.32.254
     Virtual MAC : 38:5E:32:EA:BD:00
    
    Associated IP Addresses:
     10.100.222.254
  3. Validate the BGP connections. 
    In our setup, we expect to see 4 paths towards AS 101 (the customer switches) and one path toward AS 100 (the peer VAST Mellanox Switch)

    Note: make sure the router identifier is different between the VAST Mellanox Switches. 

    #VAST Switch 1
    
    (config) # show ip bgp summary
    
    VRF name : default
    BGP router identifier : 99.99.99.1
    local AS number : 100
    BGP table version : 7
    Main routing table version: 7
    IPV4 Prefixes : 7
    IPV6 Prefixes : 0
    L2VPN EVPN Prefixes : 0
    
    ---------------------------------------------------------------------
    Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd
    ---------------------------------------------------------------------
    10.10.10.2 4 100 14897 14869 7 0 0 8:23:37:53 ESTABLISHED/3
    172.18.1.0 4 101 14891 14877 7 0 0 8:23:38:21 ESTABLISHED/1
    172.18.1.2 4 101 14881 14898 7 0 0 8:23:38:21 ESTABLISHED/1
    172.18.1.4 4 101 14874 14887 7 0 0 8:23:38:21 ESTABLISHED/1
    172.18.1.6 4 101 14889 14883 7 0 0 8:23:38:21 ESTABLISHED/1
    
    #VAST Switch 2
    (config) # show ip bgp summary
    
    VRF name : default
    BGP router identifier : 99.99.99.2
    local AS number : 100
    BGP table version : 11
    Main routing table version: 11
    IPV4 Prefixes : 7
    IPV6 Prefixes : 0
    L2VPN EVPN Prefixes : 0
    
    ---------------------------------------------------------------------
    Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd
    ---------------------------------------------------------------------
    10.10.10.1 4 100 15055 15084 11 0 0 8:23:37:34 ESTABLISHED/3
    172.18.1.8 4 101 15064 15062 11 0 0 9:02:17:48 ESTABLISHED/1
    172.18.1.10 4 101 15073 15069 11 0 0 9:02:17:48 ESTABLISHED/1
    172.18.1.12 4 101 15062 15065 11 0 0 9:02:17:47 ESTABLISHED/1
    172.18.1.14 4 101 15071 15062 11 0 0 9:02:17:48 ESTABLISHED/1
  4. Use the following command to validate the customer networks are exposed through eBGP toward the VAST Mellanox Switches.
    In our example, we can see the customer client network(10.101.4.0/24) is exposed through 4 ECMP paths and one internal path (from our MLAG peer switch)

    (config) # show ip bgp
    
    BGP table version: 11
    Local router ID : 99.99.99.2
    
    Status codes:
     s: suppressed
     d: damped
     h: history
     *: valid
     >: best
     i: internal
     r: RIB-failure
     S: Stale
     m: multipath
     b: backup-path
     x: best-external
    
    Origin codes:
     i: IGP
     e: EGP
     ?: incomplete
    
    ---------------------------------------------------------------------
    Network         Next Hop    Status   Metric   LocPrf  Weight  Path
    ---------------------------------------------------------------------
    10.100.222.0/24 10.10.10.1  i*       0        100     0       i
    10.100.222.0/24 0.0.0.0     *>       0        100     32768   i
    10.101.4.0/24   10.10.10.1  i*       0        100     0 101   i
    10.101.4.0/24   172.18.1.10 m*       0        100     0 101   i
    10.101.4.0/24   172.18.1.12 m*       0        100     0 101   i
    10.101.4.0/24   172.18.1.14 m*       0        100     0 101   i
    10.101.4.0/24   172.18.1.8  m*>      0        100     0 101   i
    10.101.32.0/24  10.10.10.1  i*       0        100     0       i
    10.101.32.0/24  0.0.0.0     *>       0        100     32768   i
  5.  Use the following command to validate the L3 Routing towards the customer networks.
    In our example, we can see the customer network (10.101.4.0) is routed through 4  eBGP connections from each of the VAST Mellanox switches.

  6. #Switch1
    
    (config) # show ip route
    
    Flags:
     F: Failed to install in H/W
     B: BFD protected (static route)
     i: BFD session initializing (static route)
     x: protecting BFD session failed (static route)
     c: consistent hashing
     p: partial programming in H/W
    
    VRF Name default:
     --------------------------------------------------------------------
     Destination   Mask         Flag   Gateway   Interface   Source AD/M
     --------------------------------------------------------------------
     default      0.0.0.0              10.100.0.254 mgmt0     static 1/1
     10.100.0.0   255.255.0.0          0.0.0.0      mgmt0     direct 0/0
     10.10.10.0   255.255.255.0        0.0.0.0      vlan4000  direct 0/0
     10.100.222.0 255.255.255.0        0.0.0.0      vlan1     direct 0/0
     10.101.4.0   255.255.255.0        172.18.1.0   eth1/13   bgp    20/0
                                       172.18.1.2   eth1/14   bgp    20/0
                                       172.18.1.4   eth1/15   bgp    20/0
                                       172.18.1.6   eth1/16   bgp    20/0
     10.101.32.0  255.255.255.0        0.0.0.0      vlan1     direct 0/0
     99.99.99.1   255.255.255.255      0.0.0.0      loopback0 direct 0/0
     172.18.1.0   255.255.255.254      0.0.0.0      eth1/13   direct 0/0
     172.18.1.2   255.255.255.254      0.0.0.0      eth1/14   direct 0/0
     172.18.1.4   255.255.255.254      0.0.0.0      eth1/15   direct 0/0
     172.18.1.6   255.255.255.254      0.0.0.0      eth1/16   direct 0/0
    
    
    #Switch #2
    (config) # show ip route
    
    Flags:
     F: Failed to install in H/W
     B: BFD protected (static route)
     i: BFD session initializing (static route)
     x: protecting BFD session failed (static route)
     c: consistent hashing
     p: partial programming in H/W
    
    VRF Name default:
     -----------------------------------------------------------------------
     Destination  Mask           Flag    Gateway      Interface Source AD/M
     -----------------------------------------------------------------------
     default      0.0.0.0                10.100.0.254 mgmt0     static 1/1
     10.100.0.0   255.255.0.0            0.0.0.0      mgmt0     direct 0/0
     10.10.10.0   255.255.255.0          0.0.0.0      vlan4000  direct 0/0
     10.100.222.0 255.255.255.0          0.0.0.0      vlan1     direct 0/0
     10.101.4.0   255.255.255.0          172.18.1.8   eth1/13   bgp 20/0
                                         172.18.1.10  eth1/14   bgp 20/0
                                         172.18.1.12  eth1/15   bgp 20/0
                                         172.18.1.14  eth1/16   bgp 20/0
     10.101.32.0  255.255.255.0          0.0.0.0      vlan1     direct 0/0
     99.99.99.2   255.255.255.255        0.0.0.0      loopback0 direct 0/0
     172.18.1.8   255.255.255.254        0.0.0.0      eth1/13   direct 0/0
     172.18.1.10  255.255.255.254        0.0.0.0      eth1/14   direct 0/0
     172.18.1.12  255.255.255.254        0.0.0.0      eth1/15   direct 0/0
     172.18.1.14  255.255.255.254        0.0.0.0      eth1/16   direct 0/0